pyparsing library,a completely new expierence of text processing
Last winter vacation I wrote sqldap for eyou.com,implementing access to LDAP direcotries in SQL-like statements.It was one project but I required to write two seperated version, each in python or c.

In the C implementation I picked lex/yacc to parse a subset of SQL grammer,and in the python implementation I chosed Pyparing,which helped me finished all features in less than 15 days.Below was the hellow world example from the offical site.
# greeting.py
#
# Demonstration of the parsing module, on the prototypical "Hello, World!" example
#
# Copyright 2003, by Paul McGuire
#
from pyparsing import Word, alphas
# define grammar
greet = Word( alphas ) + "," + Word( alphas ) + "!"
# input string
hello = "Hello, World!"
# parse input string
print hello, "->", greet.parseString( hello )
and here was part of my own work:
In the C implementation I picked lex/yacc to parse a subset of SQL grammer,and in the python implementation I chosed Pyparing,which helped me finished all features in less than 15 days.Below was the hellow world example from the offical site.
# greeting.py
#
# Demonstration of the parsing module, on the prototypical "Hello, World!" example
#
# Copyright 2003, by Paul McGuire
#
from pyparsing import Word, alphas
# define grammar
greet = Word( alphas ) + "," + Word( alphas ) + "!"
# input string
hello = "Hello, World!"
# parse input string
print hello, "->", greet.parseString( hello )
and here was part of my own work:
#!/usr/bin/env python
# ------------------------------------------------------------------------------
# pysqldap.py
#
# SQL to LDAP parser, parses LDAP like SQL statements.
# $Id: pysqldap.py,v 1.10 2006/02/05 10:16:55 gnap Exp $
# $Date: 2006/02/05 10:16:55 $
# $Log: pysqldap.py,v $
# Revision 1.10 2006/02/05 10:16:55 gnap
# added some BUG fixes.these was done couple of days ago,but not commited.
#
# Revision 1.9 2006/01/13 14:43:46 gnap
# WHERE statement operators was completed,LDAP filter operator `~=' was added
# for fuctionality, although it's not in SQL.
#
# Revision 1.8 2006/01/12 12:09:04 gnap
# select limit implemented, with very few code.
#
# Revision 1.7 2006/01/11 14:21:57 gnap
# readline support was added to the tool. it is much easier to test and use,
# have fun.
#
# Revision 1.6 2006/01/11 10:10:25 gnap
# where statements requred fully inplemented,but maybe not robust yet.well
# tests are needed.now currently the only job to do is LIMIT implementation,
# which should be a piece of cake.
#
# Revision 1.5 2006/01/11 06:39:42 gnap
# Sorting was supported in this commit,so only a little damn job to do on
# the rest of the features required.
#
# Revision 1.4 2006/01/08 06:38:03 gnap
# where statements partially supported,not completed yet!however,it still can do
# some job if you want to use it to surf some ldap sever.
#
# Revision 1.3 2006/01/06 06:29:44 gnap
# This the first excutable version of this project.Currently only the SELECT
# and the FROM statement was supported.future WHERE grammer will be added
# soon.
#
#
# ------------------------------------------------------------------------------
from pyparsing import Literal, CaselessLiteral, Word, delimitedList, Optional, \
Combine, Group, alphas, nums, alphanums, ParseException, Forward, oneOf, quotedString, \
ZeroOrMore, restOfLine, Keyword, removeQuotes
# define SQL tokens
selectStmt = Forward()
selectToken = Keyword("select", caseless=True)
fromToken = Keyword("from", caseless=True)
whereToken = Keyword("where", caseless=True)
intoToken = Keyword("into", caseless=True)
orderbyToken = Keyword("order by", caseless=True)
limitToken = Keyword("limit", caseless=True)
ident = Word( alphas, alphanums + "_$" ).setName("identifier")
columnName = delimitedList( ident, ".", combine=True )
columnNameList = Group( delimitedList( columnName ) )
#baseIdnet as replacement of tables,this can match LDAP DN syntax sucessfully,but not surely be robust
baseIdent = Word( alphas, alphanums + "=." ).setName("identifier")
baseName = delimitedList( baseIdent, ",", combine=True)
baseNameList = Group( delimitedList( baseName ) )
#likeExpression fore SQL LIKE expressions
likeExpr = Word( alphas + alphanums + "_$%" ).setName("identifier")
#whereExpression will be modified later...
whereExpression = Forward()
and_ = Keyword("and", caseless=True)
or_ = Keyword("or", caseless=True)
not_ = Keyword("not", caseless=True)
E = CaselessLiteral("E")
binop = oneOf("= >= <= ~= < > <> !=", caseless=True)
likeop = oneOf("like", caseless=True)
lpar = Literal("(").suppress()
rpar = Literal(")").suppress()
arithSign = Word("+-",exact=1)
realNum = Combine( Optional(arithSign) + ( Word( nums ) + "." + Optional( Word(nums) ) |
( "." + Word(nums) ) ) +
Optional( E + Optional(arithSign) + Word(nums) ) )
intNum = Combine( Optional(arithSign) + Word( nums ) +
Optional( E + Optional("+") + Word(nums) ) )
columnRval = realNum | intNum | quotedString.setParseAction( removeQuotes ) | columnName # need to add support for alg expressions
whereCondition = Group(
( columnName + binop + columnRval ) |
( columnName + likeop + likeExpr ) |
( lpar + whereExpression + rpar )
)
whereExpression << ZeroOrMore( not_ ) + (whereCondition + ZeroOrMore( ( and_ | or_ ) + whereExpression ) )
fileName = quotedString.setParseAction( removeQuotes )
orderseq = oneOf("asc desc", caseless=True)
limitoffset = intNum
limitcount = intNum
fucName = Word( alphas, alphanums ).setName("identifier")
selectExpr = Group(
columnNameList |
'*'
)
# define the grammar
selectStmt << ( selectToken +
( 'count(*)' | selectExpr ).setResultsName( "columns" ) +
fromToken +
baseNameList.setResultsName( "basedn" ) +
Optional( whereToken + Group(whereExpression).setResultsName("where")) +
Optional( intoToken + Group(fileName).setResultsName("into")) +
Optional( orderbyToken + Group( columnName + Optional(orderseq) ).setResultsName("orderby") ) +
Optional( limitToken +Group( Optional(limitoffset + ",") + limitcount ).setResultsName("limit") )
)
simpleSQL = selectStmt
# define Oracle comment format, and ignore them
oracleSqlComment = "--" + restOfLine
simpleSQL.ignore( oracleSqlComment )
def quoted_str( str ):
return "(" + str + ")"
def gen_filter_in( where_token ):
state = ""
if type(where_token) == str:
if where_token == "":
state = quoted_str("objectclass=*")
else:
state = where_token
return state
if len(where_token) == 1:
state = gen_filter_in( where_token[0] )
if len(where_token) == 3:
token1 = gen_filter_in(where_token[0])
token2 = gen_filter_in(where_token[2])
if where_token[1] == 'and':
state = "&" + quoted_str(token1) + quoted_str(token2)
if where_token[1] == 'or':
state = "|" + quoted_str(token1) + quoted_str(token2)
if where_token[1] in ['=','~=',]:
state = token1 + where_token[1] + token2
if where_token[1] in ['<','<=']:
state = token1 + '<=' + token2
if where_token[1] in ['>','>=']:
state = token1 + '>=' + token2
if where_token[1] in ['<>', '!=']:
state = '!' + quoted_str(token1 + '=' + token2)
if where_token[1] == 'like':
state = token1 + '=' + token2.replace("%","*")
if len(where_token) == 2:
token2 = gen_filter_in(where_token[1])
if where_token[0] == 'not':
state = "!" + quoted_str(token2)
return state
def gen_filter( where_token ):
if len(where_token) == 0:
return "(objectclass=*)"
return quoted_str( gen_filter_in( where_token ))
def test( str ):
print str,"->"
try:
tokens = simpleSQL.parseString( str )
# print "tokens = ", tokens
print "tokens.columns =", tokens.columns
print "tokens.basedn =", tokens.basedn[0]
filter = gen_filter( tokens.where )
print "tokens.where =", tokens.where
print "filter = ", filter
print "into = ", tokens.into
print "orderby = ", tokens.orderby
print "limit = ", tokens.limit
except ParseException, err:
print " "*err.loc + "^\n" + err.msg
print err
print
def main():
import readline
readline.parse_and_bind("tab: complete")
while 1:
try:
s = raw_input('>')
if s == "exit":
break
except EOFError:
print
break
# s = "select * from dc=aa " + s
test( s )
if __name__ == "__main__":
main()
(Anonymous)
Nice work!
Also, look at the latest release's feature of "operatorPrecedence", which should simplify your inclusion of arithmetic expression support (demonstrated in the simpleArith.py example script).
Thanks for using pyparsing - I'm glad it was helpful!
-- Paul McGuire
Re: Nice work!
Re: Nice work!
I guess I should say "of course not",which was extactly what I meant :-)
Just a polular asian english learners' mistake.:-P
(Anonymous)
Re: Nice work!
Could you please e-mail me (or post) some sample queries and corresponding output, to include with the description of your parser? The code is good, but I like to be able to show examples of just what it does.
-- Paul
Re: Nice work!
the output would be:
The complete programme was in eyou.com's release schedule, So I can only post the content of pysqldap.py here and Some output of it. I also took a screenshot of it, which was put in the top of the topic.
Re: Nice work!
I guess I should say "of course not",which was extactly what I meant :-)
Just a polular asian english learners' mistake.:-P
(Anonymous)
Ldaptor may be of interest, too
-- Paul
Re: Ldaptor may be of interest, too