- From: Alex Hall <alexhall@revelytix.com>
- Date: Thu, 23 Feb 2012 15:47:32 -0500
- To: Henry Story <henry.story@bblfish.net>
- Cc: public-rdf-comments@w3.org
- Message-ID: <CAFq2bizKFhaGc3d09GTpNo+eFBPzT2T+bFWznqArM4WTyYY7tQ@mail.gmail.com>
On Thu, Feb 23, 2012 at 12:01 PM, Henry Story <henry.story@bblfish.net>wrote: > I can't quite work out what the delimiters between tokens are. > Well, that's because there are no delimiters between tokens, per se. You can always use whitespace to separate tokens that would otherwise be parsed as a single token, but this isn't necessary if there's only one way to parse a given character sequence. > > The following seems to be correct N3 (cwm parses it) > > @prefix : <>. > :me</knows>:her,:him. > Yes, this is correct N3 (and Turtle). Without getting into the gory details of parsing theory, what the parser is doing is: 1. Starting with the first character in the input sequence, determine what type of token it's looking at, by looking ahead as many characters as necessary to eliminate any alternatives. 2. Once it determines the token type, it then goes back to the beginning of the input and advances character by character, building up an internal state along the way, until it reaches the end of the token or finds a character that is not allowed as part of the current token. 3. If the current token is well-formed, then emit it and start over at (1) to find the next token. If the token is malformed, throw an exception and stop parsing. So, for line 2 above, the parser guesses from looking at the ':' that it's looking at a prefixed name (because no other token can start with a ':'). It reads up to the '<', which is not allowed as part of a prefixed name, so it emits ':me' as a prefixed name. Then, from the '<' it knows it's looking at an IRI, so it reads forward to the '>' (which signals the end of an IRI token) and emits '</knows>' as an IRI, and so on. > > cwm even is able to parse > > @prefix foaf: <http://xmlns.com/foaf/0.1> . > :me foaf:knows:her. > > Anyway, it's not so easy to implement some of this. > For some context, I have been working on this in Scala, and writing up > some thoughts here: > > https://bitbucket.org/pchiusano/nomo/issue/6/complex-ebnf-rules Yes, parsers are notoriously difficult to implement. I'm no expert on parsing theory, but I see that you're using a parser combinator to implement yours. I think the grammar was designed to be parsed using more traditional LL(1) parsers with a separate lexical analyzer to handle the terminal productions. This might account for some of the difficulty that you're having. FWIW, I had trouble implementing the same PN_PREFIX rule that you cite above using Antlr, and had to use Antlr's predicated production feature to work around the greediness. So I rewrote the rule as: fragment PN_LOCAL_CHARS : '.' | PN_CHARS ; fragment PN_CHARS_SEQ : ( ('.' PN_LOCAL_CHARS)=> '.' // '.' is not allowed at the end -- only match them if they're followed by another valid char | PN_CHARS )* ; fragment PN_PREFIX : PN_CHARS_BASE PN_CHARS_SEQ ; Regards, Alex > > Henry > > Social Web Architect > http://bblfish.net/ > > >
Received on Thursday, 23 February 2012 20:48:21 UTC