- From: Uche Ogbuji <uche@ogbuji.net>
- Date: Sat, 29 Sep 2012 23:59:47 -0600
- To: public-microxml@w3.org
- Message-ID: <CAPJCua1ucjs+jEcQx97Wg546+vHa7koZ8kRJweywcEywEmRLSQ@mail.gmail.com>
Folks, It has ben a maniacal couple of weeks, with a major project deadline last week leading into a long trip to San Jose this week. I've seen a flurry of activity in MicroXML, which is great. Here's my little bit. On the plane back home I made progress the most of the way on a lexer for MicroXML for the PLY parser generator on Python 3 (Requires Python 3.3 for fixes to Unicode handling, currently in its third release candidate). https://github.com/uogbuji/amara3/tree/master/lib/uxml A brief example: $ cat test1.uxml <a b="1&2">3<!--4-->5<b>spam</b></a> $ python3 ~/dev/amara3/lib/uxml/lex.py "`cat test1.uxml`" LexToken(STARTTAG_LEAD,'<a ',1,0) LexToken(NAME,'b',1,3) LexToken(EQ,'=',1,4) LexToken(DBL_QUOTE,'"',1,5) LexToken(CHARDATA,'1',1,6) LexToken(AMP_ENT,'&',1,7) LexToken(CHARDATA,'2',1,12) LexToken(DBL_QUOTE,'"',1,13) LexToken(GT,'>',1,14) LexToken(CHARDATA,'3',1,15) LexToken(COMMENT,'<!--4-->',1,16) LexToken(CHARDATA,'5',1,24) LexToken(STARTTAG_LEAD,'<b',1,25) LexToken(GT,'>',1,27) LexToken(CHARDATA,'spam',1,28) LexToken(ENDTAG,'</b>',1,32) LexToken(ENDTAG,'</a>',1,36) I already have a simple parser that wraps that lexer and completes the picture and I should have that checked in as well, but I figure this might be useful to others, especially the set of token regexes worked up from the spec. A couple of notes: * It's definitely experimental and there are a couple of bugs I'm aware of. We should start putting together a test suite we can all use to bring up compliance across the various implementations. * Error messages are rather imprecise, as not unusual for regex-base lexers * Performance is likely to be so-so for large input. I hope to switch to a DFA-based lexer soon to address this * PLY is here: http://www.dabeaz.com/ply/ -- Uche Ogbuji http://uche.ogbuji.net Founding Partner, Zepheira http://zepheira.com http://wearekin.org http://www.thenervousbreakdown.com/author/uogbuji/ http://copia.ogbuji.net http://www.linkedin.com/in/ucheogbuji http://twitter.com/uogbuji
Received on Sunday, 30 September 2012 06:00:15 UTC