- From: Andy Seaborne <andy.seaborne@epimorphics.com>
- Date: Tue, 15 May 2012 16:17:21 +0100
- To: public-rdf-wg@w3.org
On 15/05/12 16:01, Gavin Carothers wrote: > On Tue, May 15, 2012 at 7:46 AM, Peter F. Patel-Schneider > <pfpschneider@gmail.com> wrote: >> The Turtle editor's draft says that WS is needed to prevent >> mis-recognition of tokens, but doesn't explicitly define a token. > > token, terminal, whatever ;) > >> >> If the parsing Turtle depends on tokenizing, then there needs to be >> an explicit definition of what a token is, and, further, what >> mis-recognizing a token means. > > Preposed new language: > > White space (production WS) is used to separate two terminals which > would otherwise be (mis-)recognized as one terminal. Rule names > below in capitals indicate where white space is significant; these > form a possible choice of terminals for constructing a Turtle > parser. > > White space is significant in terminal IRIREF and the production > String. > > --- > > Also, split that grammar table and identify all terminals with more > than just all caps. > >> >> For example, is >> >> @prefixprefix:<foo>. >> >> a valid Turtle statement? > > Yes. Err - somewhat tricky to integrate in with common tokenizer/parser approaches given the wonders of language tags. Please ban it ("it" = a lack of WS after @prefix). [disclosure - my parsers don't care but, for speed, they are handwritten and that means content sensitive tokenizing or messing with pushback onto the input stream are doable - using tokenizer/parser toolkits may make this messy] I note that the Turtle submission bans it. [4] prefixID ::= '@prefix' ws+ prefixName? ':' uriref > >> >> Things would get even worse if the @ was allowed to be dropped, >> which is a good reason to vote against allowing dropping of @. Not true! The present of the ":" is enough to distinguish bareword keywords and prefixes. If @ were not also used for language tags, single leading char might make a parser writers life easier. But we have language tags starting with @ and @prefix is a legal language tag (just unregistered). Pragmatically, WS after @prefix and @base. Then can have a token type that is "@alpha-alphanumericsanddash". Andy > Yes, while it is possible to create a grammar that allows this in > general removing @ is likely to reduce human readability. > >> >> peter >> >> >
Received on Tuesday, 15 May 2012 15:17:51 UTC