- From: David Robillard <d@drobilla.net>
- Date: Sun, 17 Feb 2013 17:43:53 -0500
- To: public-rdf-comments@w3.org
- Message-ID: <1361141033.16176.30.camel@verne.drobilla.net>
Hi, I recently got a bug report from a user who's encountered dots in prefixed names in "Turtle" found in the wild which my parser does not yet support. So, I looked at the draft towards implementing this. Unfortunately it looks like a can of worms for a simple recursive-descent parser. The previous specification could be implemented with 1 character of lookahead, but I don't think this one can. Since a PrefixedName can contain a dot, while reading a PrefixedName if the next character is a dot, it is ambiguous whether or not the dot is part of the PrefixedName or the end of a statement. To determine this, you need to check whether or not the next-next character is a valid PrefixedName character, and until this is known, neither the dot nor the next character can be 'eaten'. The significance is that *1* character of "lookahead" isn't really lookahead, you just need a peek(). Anything greater requires some kind of real lookahead implementation, or at least some crafty case-specific kludges to get around it. This is not necessarily a spec problem, and two character lookahead is not an onerous requirement in general, but compared to 1 it is. I just thought it was worth mentioning that there is a considerable new implementation requirement here. I will have to pay a price in throughput for this as well. It's clear, though, that dots in prefixed names are desirable. Ideally, tokens, including the delimeters (i.e. '.' and ';'), would be whitespace delimited, so reading a PrefixedName would simply stop when whitespace is encountered and this problem would not exist. Perhaps not realistic given existing practice, but it would certainly be nice. Cheers, -dr
Received on Sunday, 17 February 2013 22:44:21 UTC