Re: simplified patches against last night's grammar from Eric Prud'hommeaux on 2012-06-16 (public-rdf-wg@w3.org from June 2012)

From: Eric Prud'hommeaux <eric@w3.org>
Date: Sat, 16 Jun 2012 13:59:22 -0400
To: Gavin Carothers <gavin@carothers.name>
Cc: Gregg Kellogg <gregg@kellogg-assoc.com>, RDF-WG WG <public-rdf-wg@w3.org>
Message-ID: <20120616175921.GB29385@w3.org>
* Gavin Carothers <gavin@carothers.name> [2012-06-16 09:37-0700]
> Now on list to elicit more feedback.
> 
> On Sat, Jun 16, 2012 at 7:20 AM, Eric Prud'hommeaux <eric@w3.org> wrote:
> > typo, perhaps:
> > -[12]    object                ::= iri | blank | predicateObjectList | literal
> > +[12]    object                ::= iri | blank | blankNodePropertyList | literal
> >
> > string misallignment:
> > -[155s]  STRING_LITERAL1       ::= '"' ([^#x22#x5C#xA#xD] | ECHAR | UCHAR)* '"'
> > -[156s]  STRING_LITERAL2       ::= "'" ([^#x27#x5C#xA#xD] | ECHAR | UCHAR)* "'"
> > -[157s]  STRING_LITERAL_LONG1  ::= "'''" (("'" | "''")? [^'\] | ECHAR | UCHAR)* "'''"
> > -[158s]  STRING_LITERAL_LONG2  ::= '"""' (('"' | '""')? [^"\] | ECHAR | UCHAR)* '"""'
> 
> Okay, now I'm just going crazy. That's the way there were BEFORE when
> someone said they were reversed.

There have been three changes of late:
  1 align the LITERAL1/2 with single/double quote in SPARQL.
  2 make sure that the excluded characters #x22 and #x27 correspond to the single and double quote respectively.
  3 preserve a grouping around "[^"\] | ECHAR | UCHAR"

> > +[155s]  STRING_LITERAL1       ::= "'" ([^#x27#x5C#xA#xD] | ECHAR | UCHAR)* "'"
> > +[156s]  STRING_LITERAL2       ::= '"' ([^#x22#x5C#xA#xD] | ECHAR | UCHAR)* '"'
> > +[157s]  STRING_LITERAL_LONG1  ::= "'''" (("'" | "''")? ([^'\] | ECHAR | UCHAR))* "'''"
> > +[158s]  STRING_LITERAL_LONG2  ::= '"""' (('"' | '""')? ([^"\] | ECHAR | UCHAR))* '"""'
> 
> No, that can't be right. Those are aren't what is in there now. The
> current grammar has [23] [24] numbered productions. Check
> http://dvcs.w3.org/hg/rdf/raw-file/default/rdf-turtle/turtle.bnf
> before I go totally crazy (perhaps we shouldn't call them 1,2 and make
> clearer what's going on here as you, me, Andy, and Greg Kellogg all
> seem to have at one point or another confused this.
> 
> >
> > two \s in IRIREF:
> > -[138s]  IRIREF                ::= '<' ([^#x00-#x20<>\"{}|^`\] | UCHAR)* '>'
> > +[138s]  IRIREF                ::= '<' ([^#x00-#x20<>"{}|^`\] | UCHAR)* '>' # no UCHAR in SPARQL
> 
> Changed!
> 
> >
> > simplification and whitespace:
> > -[24]    DECIMAL               ::= [+-]? ([0-9]* '.' [0-9]+)
> > +[24]    DECIMAL               ::= [+-]? [0-9]* '.' [0-9]+
> > -[168s]  PN_LOCAL              ::= (PN_CHARS_U | [0-9] | PLX) ((PN_CHARS | '.' | PLX)* PN_CHARS | PLX)?
> > +[168s]  PN_LOCAL              ::= (PN_CHARS_U | [0-9] | PLX) ((PN_CHARS | '.' | PLX)* (PN_CHARS | PLX))?
> 
> These are hopeless and the result of the method being used in bnf2html
> unless you have a VERY VERY strong opinion I'm going to leave these
> alone as last time I tried to fix them I broke most of the other
> nesting/precedence rules. They are correct but have slightly too many
> ()s

fair enough. we can tweak them by hand for PR and REC.

> > the usual prefix/base thing:
> > -[4]     prefixID              ::= '@prefix' PNAME_NS IRIREF
> > +[4]     prefixID              ::= PREFIX PNAME_NS IRIREF
> > -[5]     base                  ::= '@base' IRIREF
> > +[5]     base                  ::= BASE IRIREF
> > -[128s]  RDFLiteral            ::= String (LANGTAG | '^^' iri)?
> > +[17]    RDFLiteral            ::= String (LanguageTag | '^^' iri)?
> > +[18]    LanguageTag           ::= BASE | PREFIX | LANGTAG
> > +[20]    BASE                  ::= '@base'
> > +[21]    PREFIX                ::= '@prefix'
> 
> Ugh, I'm not sure this is any better. This clearly doesn't solve the
> issue as this is what we had before and what Greg used to create the
> RDF.rb turtle parser, which didn't work correctly :( Also the same as
> what was used to create Raptor which again has the same issue. Need to
> be clearer somehow on what should happen with "literal"@base and
> "literal"@prefix.

I believe that
  [18]    LanguageTag           ::= BASE | PREFIX | LANGTAG
makes it explicit that the tokens for BASE and PREFIX can be interpreted as a language tag, if that it indeed our intention.
<http://w3.org/brief/MjY2> shows a standard lexer returning tokens for "@base" and "@prefix" and the parser accepting them as directives and as language tags.
-- 
-ericP
Received on Saturday, 16 June 2012 17:59:53 UTC