You are viewing gnap_an

pyparsing library,a completely new expierence of text processing

Last winter vacation I wrote sqldap for eyou.com,implementing access to LDAP direcotries in SQL-like statements.It was one project but I required to write two seperated version, each in python or c.

In the C implementation I picked lex/yacc to parse a subset of SQL grammer,and in the python implementation I chosed Pyparing,which helped me finished all features in less than 15 days.Below was the hellow world example from the offical site.

# greeting.py
#
# Demonstration of the parsing module, on the prototypical "Hello, World!" example
#
# Copyright 2003, by Paul McGuire
#
from pyparsing import Word, alphas

# define grammar
greet = Word( alphas ) + "," + Word( alphas ) + "!"

# input string
hello = "Hello, World!"

# parse input string
print hello, "->", greet.parseString( hello )

and here was part of my own work:
#!/usr/bin/env python

# ------------------------------------------------------------------------------
# pysqldap.py
#
# SQL to LDAP parser, parses LDAP like SQL statements.
# $Id: pysqldap.py,v 1.10 2006/02/05 10:16:55 gnap Exp $
# $Date: 2006/02/05 10:16:55 $
# $Log: pysqldap.py,v $
# Revision 1.10  2006/02/05 10:16:55  gnap
# added some BUG fixes.these was done couple of days ago,but not commited.
#
# Revision 1.9  2006/01/13 14:43:46  gnap
# WHERE statement operators was completed,LDAP filter operator `~=' was added
# for fuctionality, although it's not in SQL.
#
# Revision 1.8  2006/01/12 12:09:04  gnap
# select limit implemented, with very few code.
#
# Revision 1.7  2006/01/11 14:21:57  gnap
# readline support was added to the tool. it is much easier to test and use,
# have fun.
#
# Revision 1.6  2006/01/11 10:10:25  gnap
# where statements requred fully inplemented,but maybe not robust yet.well
# tests are needed.now currently the only job to do is LIMIT implementation,
# which should be a piece of cake.
#
# Revision 1.5  2006/01/11 06:39:42  gnap
# Sorting was supported in this commit,so only a little damn job to do on
# the rest of the features required.
#
# Revision 1.4  2006/01/08 06:38:03  gnap
# where statements partially supported,not completed yet!however,it still can do
# some job if you want to use it to surf some ldap sever.
#
# Revision 1.3  2006/01/06 06:29:44  gnap
# This the first excutable version of this project.Currently only the SELECT
# and the FROM statement was supported.future WHERE grammer will be added
# soon.
#
#
# ------------------------------------------------------------------------------

from pyparsing import Literal, CaselessLiteral, Word, delimitedList, Optional, \
    Combine, Group, alphas, nums, alphanums, ParseException, Forward, oneOf, quotedString, \
    ZeroOrMore, restOfLine, Keyword, removeQuotes
# define SQL tokens
selectStmt   = Forward()
selectToken  = Keyword("select", caseless=True)
fromToken    = Keyword("from", caseless=True)
whereToken   = Keyword("where", caseless=True)
intoToken    = Keyword("into", caseless=True)
orderbyToken = Keyword("order by", caseless=True)
limitToken   = Keyword("limit", caseless=True)

ident          = Word( alphas, alphanums + "_$" ).setName("identifier")
columnName     = delimitedList( ident, ".", combine=True )
columnNameList = Group( delimitedList( columnName ) )

#baseIdnet as replacement of tables,this can match LDAP DN syntax sucessfully,but not surely be robust
baseIdent      = Word( alphas, alphanums + "=." ).setName("identifier") 
baseName       = delimitedList( baseIdent, ",", combine=True)
baseNameList   = Group( delimitedList( baseName ) )

#likeExpression fore SQL LIKE expressions
likeExpr       = Word( alphas + alphanums + "_$%" ).setName("identifier")

#whereExpression will be modified later...
whereExpression = Forward()
and_ = Keyword("and", caseless=True)
or_ = Keyword("or", caseless=True)
not_ = Keyword("not", caseless=True)

E      = CaselessLiteral("E")
binop  = oneOf("= >= <= ~= < > <> !=", caseless=True)
likeop = oneOf("like", caseless=True)
lpar   = Literal("(").suppress()
rpar   = Literal(")").suppress()

arithSign = Word("+-",exact=1)
realNum = Combine( Optional(arithSign) + ( Word( nums ) + "." + Optional( Word(nums) )  |
                                                         ( "." + Word(nums) ) ) + 
            Optional( E + Optional(arithSign) + Word(nums) ) )
intNum = Combine( Optional(arithSign) + Word( nums ) + 
            Optional( E + Optional("+") + Word(nums) ) )

columnRval = realNum | intNum | quotedString.setParseAction( removeQuotes ) | columnName # need to add support for alg expressions
whereCondition = Group(
    ( columnName + binop + columnRval ) |
		( columnName + likeop + likeExpr  ) |
    ( lpar + whereExpression + rpar   )
    )
whereExpression << ZeroOrMore( not_ ) + (whereCondition + ZeroOrMore( ( and_ | or_ ) + whereExpression ) )

fileName    = quotedString.setParseAction( removeQuotes )

orderseq    = oneOf("asc desc", caseless=True)
limitoffset = intNum
limitcount  = intNum

fucName     = Word( alphas, alphanums ).setName("identifier")

selectExpr  = Group(
		columnNameList |
		'*'
		)

# define the grammar
selectStmt      << ( selectToken + 
                   ( 'count(*)' | selectExpr ).setResultsName( "columns" ) + 
                   fromToken + 
                   baseNameList.setResultsName( "basedn" ) + 
		   Optional( whereToken + Group(whereExpression).setResultsName("where")) +
		   Optional( intoToken + Group(fileName).setResultsName("into")) +
		   Optional( orderbyToken + Group( columnName + Optional(orderseq) ).setResultsName("orderby") ) + 
		   Optional( limitToken +Group( Optional(limitoffset + ",") + limitcount ).setResultsName("limit") )
		   )



simpleSQL = selectStmt

# define Oracle comment format, and ignore them
oracleSqlComment = "--" + restOfLine
simpleSQL.ignore( oracleSqlComment )

def quoted_str( str ):
	return "(" + str + ")"


def gen_filter_in( where_token ):
	state = ""
	if type(where_token) == str:
		if where_token == "":
			state = quoted_str("objectclass=*")
		else:
			state = where_token
		return state
	if len(where_token) == 1:
		state = gen_filter_in( where_token[0] )
	if len(where_token) == 3:
		token1 = gen_filter_in(where_token[0])
		token2 = gen_filter_in(where_token[2])
		if where_token[1] == 'and':
			state = "&" + quoted_str(token1) +  quoted_str(token2)
		if where_token[1] == 'or':
			state = "|" + quoted_str(token1) + quoted_str(token2)
		if where_token[1] in ['=','~=',]:
			state = token1 + where_token[1] + token2
		if where_token[1] in ['<','<=']:
			state = token1 + '<=' + token2
		if where_token[1] in ['>','>=']:
			state = token1 + '>=' + token2
		if where_token[1] in ['<>', '!=']:
			state = '!' + quoted_str(token1 + '=' + token2)
		if where_token[1] == 'like':
			state = token1 + '=' + token2.replace("%","*")
	if len(where_token) == 2:
		token2 = gen_filter_in(where_token[1])
		if where_token[0] == 'not':
			state = "!" + quoted_str(token2)

	return state 

def gen_filter( where_token ):
	if len(where_token) == 0:
		return "(objectclass=*)"
	return quoted_str( gen_filter_in( where_token ))

def test( str ):
	print str,"->"
	try:
		tokens = simpleSQL.parseString( str )
#        print "tokens = ",        tokens
	        print "tokens.columns =", tokens.columns
	        print "tokens.basedn =",  tokens.basedn[0]
		filter = gen_filter( tokens.where )
		print "tokens.where =", tokens.where
		print "filter = ",      filter
		print "into =   ",      tokens.into
		print "orderby = ",     tokens.orderby
		print "limit = ",       tokens.limit
	except ParseException, err:
		print " "*err.loc + "^\n" + err.msg
		print err
		print

def main():
	import readline
	readline.parse_and_bind("tab: complete")
	while 1:
		try:
			s = raw_input('>')
			if s == "exit":
				break
		except EOFError:
			print
			break
#		s = "select * from dc=aa " + s
		test( s )

if __name__ == "__main__":
	main()

Comments

(Anonymous)

Nice work!

Do you mind if I link to your example from the pyparsing "Who's Using Pyparsing?" page?

Also, look at the latest release's feature of "operatorPrecedence", which should simplify your inclusion of arithmetic expression support (demonstrated in the simpleArith.py example script).

Thanks for using pyparsing - I'm glad it was helpful!

-- Paul McGuire

Re: Nice work!

Of cource,be in honor.And pyparsing is a very excellenct job!!

Re: Nice work!

lol.

I guess I should say "of course not",which was extactly what I meant :-)

Just a polular asian english learners' mistake.:-P

(Anonymous)

Re: Nice work!

Thanks - pyparsing seems to be catching on. :)

Could you please e-mail me (or post) some sample queries and corresponding output, to include with the description of your parser? The code is good, but I like to be able to show examples of just what it does.

-- Paul

Re: Nice work!

This module parsed a SQL-like statements,and can convert Where expressions into LDAP queries filters. The only different was that the FROM statements, was supposed to be a DN string in LDAP queries, but not tables.
>SELECT abc FROM cn=fakename,dn=fakedn.org WHERE name=gnap INTO 'output.txt' ORDER BY cn ASC LIMIT 3

the output would be:
SELECT abc FROM cn=fakename,dn=fakedn.org WHERE name=gnap INTO 'output.txt' ORDER BY cn ASC LIMIT 3 ->
tokens.columns = [['abc']]
tokens.basedn = cn=fakename,dn=fakedn.org
tokens.where = [['name', '=', 'gnap']]
filter =  (name=gnap)
into =    ['output.txt']
orderby =  ['cn', 'asc']
limit =  ['3']

The complete programme was in eyou.com's release schedule, So I can only post the content of pysqldap.py here and Some output of it. I also took a screenshot of it, which was put in the top of the topic.

Re: Nice work!

lol.

I guess I should say "of course not",which was extactly what I meant :-)

Just a polular asian english learners' mistake.:-P

(Anonymous)

Ldaptor may be of interest, too

Also, check out Tommi Virtanen's ldaptor package (http://packages.qa.debian.org/l/ldaptor.html), which uses parse actions to compile an LDAP query into a Python-callable structure.

-- Paul

Re: Ldaptor may be of interest, too

Thank you ,Paul.Very much.