Re: SPARQL and Turtle Prefix Placement from Eric Prud'hommeaux on 2012-06-15 (public-rdf-wg@w3.org from June 2012)

From: Eric Prud'hommeaux <eric@w3.org>
Date: Fri, 15 Jun 2012 12:48:29 -0400
To: Andy Seaborne <andy.seaborne@epimorphics.com>
Cc: public-rdf-wg@w3.org
Message-ID: <20120615164829.GA16424@w3.org>
* Eric Prud'hommeaux <eric@w3.org> [2012-06-15 10:00-0400]
> * Eric Prud'hommeaux <eric@w3.org> [2012-06-15 05:32-0400]
> > * Andy Seaborne <andy.seaborne@epimorphics.com> [2012-06-15 09:25+0100]
> > > 
> > > >btw, i've been updating the grammar to deal with some LL(1).LALR(1) and other conflicts. should be synched soon.
> > > 
> > > As this is very close to LC, could you point out the changes being made?
> > 
> > Indeed. There are three kinds of changes:
> >   1 get rid of extra ()s, à la "(statement)*"
> >   2 make explicit that turtle parses '"ab"@base' as a literal with a language tag.
> >   3 fix lalr(1)/ll(1) conflict in
> >     [6] triples ::= subject predicateObjectList | blankNodePropertyList predicateObjectList?
> >     by moving blankNodePropertyList from [14] blank to 12 [object].
> 
> r442 (just committed) removed some spurious \s (there's no escaping in <http://www.w3.org/TR/REC-xml/#sec-notation>) and entity-encoded the <>s in the excluded lists in IRIREF.
> 
> Andy, rq25's IRIREF
>   [138] IRIREF ::= '<' ([^<>"{}|^`\]-[#x00-#x20])* '>'
> uses the A - B notation. When we added "| UCHAR", we ran into an ambiguity in the notation because there's no relative precedence between - and | in A - B | C. This could have been solved with ()s, but we decided to collapse the excluded range minus a range into a larger excluded range. Is SPARQL going to add UCHARs (\uXXXX notation)? If so, we can share that production.
> 
> =Turtle re-using SPARQL productions=
> The intro currently says "The two grammars share production and terminal names where possible." To make this absolutely true, we used to use SPARQL's:
> 
> NumericLiteral         ::= NumericLiteralUnsigned
>            | NumericLiteralPositive
>     | NumericLiteralNegative
> NumericLiteralUnsigned ::= <INTEGER>
>            | <DECIMAL>
>     | <DOUBLE>
> NumericLiteralPositive ::= <INTEGER_POSITIVE>
>            | <DECIMAL_POSITIVE>
>            | <DOUBLE_POSITIVE>
> NumericLiteralNegative ::= <INTEGER_NEGATIVE>
>            | <DECIMAL_NEGATIVE>
>     | <DOUBLE_NEGATIVE>
> 
> <INTEGER>              ::= ([0-9])+
> <DECIMAL>              ::= ([0-9])* "." ([0-9])+
> <DOUBLE>               ::= ([0-9])+ "." ([0-9])* EXPONENT
>            | "." ([0-9])+ EXPONENT
>          | ([0-9])+ EXPONENT
> <INTEGER_POSITIVE>     ::= "+" INTEGER
> <DECIMAL_POSITIVE>     ::= "+" DECIMAL
> <DOUBLE_POSITIVE>      ::= "+" DOUBLE
> <INTEGER_NEGATIVE>     ::= "-" INTEGER
> <DECIMAL_NEGATIVE>     ::= "-" DECIMAL
> <DOUBLE_NEGATIVE>      ::= "-" DOUBLE
> 
> A just slightly terser but non-parallel representation is:
> 
> NumericLiteral ::= <INTEGER> | <DECIMAL> | <DOUBLE>
> <INTEGER>      ::= [+-]? [0-9]+
> <DECIMAL>      ::= [+-]? ([0-9]* '.' [0-9]+)
> <DOUBLE>       ::= [+-]? (([0-9]+ '.' [0-9]* <EXPONENT>)
>                  | ('.' [0-9]+ <EXPONENT>)
>                  | ([0-9]+ <EXPONENT>))
> 
> I presume the intention is the latter but I'd like to confirm before I irradicate this complex markup:
>   [61s] NumericLiteral         ::= NumericLiteralUnsigned | NumericLiteralPositive | NumericLiteralNegative
>   [62s] NumericLiteralUnsigned ::= INTEGER | DECIMAL | DOUBLE
>   [63s] NumericLiteralPositive ::= INTEGER_POSITIVE | DECIMAL_POSITIVE | DOUBLE_POSITIVE
>   [64s] NumericLiteralNegative ::= INTEGER_NEGATIVE | DECIMAL_NEGATIVE | DOUBLE_NEGATIVE

timed out and implemented this:
[[
@@ -14,26 +14,22 @@
 [14]  blank  ::=  BlankNode | collection
 [15]  blankNodePropertyList  ::=  '[' predicateObjectList ']'
 [16]  collection  ::=  '(' object* ')'
-[60s]  RDFLiteral  ::=  String (LANGTAG | '^^' iri)?
-[61s]  NumericLiteral  ::=  NumericLiteralUnsigned | NumericLiteralPositive | NumericLiteralNegative
-[62s]  NumericLiteralUnsigned  ::=  INTEGER | DECIMAL | DOUBLE
-[63s]  NumericLiteralPositive  ::=  INTEGER_POSITIVE | DECIMAL_POSITIVE | DOUBLE_POSITIVE
-[64s]  NumericLiteralNegative  ::=  INTEGER_NEGATIVE | DECIMAL_NEGATIVE | DOUBLE_NEGATIVE
+[17]  RDFLiteral  ::=  INTEGER | DECIMAL | DOUBLE
 [65s]  BooleanLiteral  ::=  'true' | 'false'
 [66s]  String  ::=  STRING_LITERAL1 | STRING_LITERAL2 | STRING_LITERAL_LONG1 | STRING_LITERAL_LONG2
 [67s]  iri  ::=  IRIREF | PrefixedName
 [68s]  PrefixedName  ::=  PNAME_LN | PNAME_NS
 [69s]  BlankNode  ::=  BLANK_NODE_LABEL | ANON
-[17]  BASE  ::=  '@base'
-[18]  PREFIX  ::=  '@prefix'
-[132s]  IRIREF  ::=  '<' ([^#x00-#x20<>\"{}|^`\\] | UCHAR)* '>'
+[18]  BASE  ::=  '@base'
+[19]  PREFIX  ::=  '@prefix'
+[132s]  IRIREF  ::=  '<' ([^#x00-#x20<>"{}|^`\] | UCHAR)* '>'
 [133s]  PNAME_NS  ::=  PN_PREFIX? ':'
 [134s]  PNAME_LN  ::=  PNAME_NS PN_LOCAL
 [135s]  BLANK_NODE_LABEL  ::=  '_:' (PN_CHARS_U | [0-9]) ((PN_CHARS | '.')* PN_CHARS)?
-[19]  LANGTAG  ::=  BASE | PREFIX | '@' [a-zA-Z]+ ('-' [a-zA-Z0-9]+)*
-[20]  INTEGER  ::=  [+-]? [0-9]+
-[21]  DECIMAL  ::=  [+-]? ([0-9]* '.' [0-9]+)
-[22]  DOUBLE  ::=  [+-]? (([0-9]+ '.' [0-9]* EXPONENT) | ('.' [0-9]+ EXPONENT) | ([0-9]+ EXPONENT))
+[20]  LANGTAG  ::=  BASE | PREFIX | '@' [a-zA-Z]+ ('-' [a-zA-Z0-9]+)*
+[21]  INTEGER  ::=  [+-]? [0-9]+
+[22]  DECIMAL  ::=  [+-]? ([0-9]* '.' [0-9]+)
+[23]  DOUBLE  ::=  [+-]? (([0-9]+ '.' [0-9]* EXPONENT) | ('.' [0-9]+ EXPONENT) | ([0-9]+ EXPONENT))
 [148s]  EXPONENT  ::=  [eE] [+-]? [0-9]+
 [149s]  STRING_LITERAL1  ::=  '"' ([^#x27#x5C#xA#xD] | ECHAR | UCHAR)* '"'
 [150s]  STRING_LITERAL2  ::=  "'" ([^#x22#x5C#xA#xD] | ECHAR | UCHAR)* "'"
]]
tested with
[[
[] <p> <o1>, "o2", [ <p2> _:o3 ] ;
   <p3> ( <o4> "o5"@base "o5"@prefix _:o6 [ <p4> <o8> ] ), <o9> .
[ <p5> """o10
line 2""", '''o11
line 3'''^^<integer> ;
  <p6> 12, +12, -12,                   # [+-]? [0-9]+
       13.0, +13.0, -13.0,             # [+-]? [0-9]* '.' [0-9]+ with *=2
       .0, +.0, -.0,                   # [+-]? [0-9]* '.' [0-9]+ with *=0
       14.E0, +14.E0, -14.E0,          # [+-]? [0-9]+ '.' [0-9]* EXPONENT with *=0
       14.0E0, +14.0E0,                # [+-]? [0-9]+ '.' [0-9]* EXPONENT with *=1
       .14E2, +.14E2, -.14E2, -14.0E0, # [+-]? '.' [0-9]+ EXPONENT
       1.4E1, +1.4E1, -1.4E1,          # [+-]? [0-9]+ EXPONENT)
       14e0, 14e+0, 14e-0              # [eE] [+-]? [0-9]+
].
]]

> Another issue is that by stating that we use the same productions as SPARQL, we have to synchronize with SPARQL. In principle, this is easily resolve by a bit of friendly competition on the part of the editors: which ever makes it to PR second has to tweak their foreign production numbers (e.g. "[132s]" or "[17t]") to reference the winner. (Some specs simply don't include referenced productions, e.g. Namespaces in XML's reference to XML in "[4] NCName ::= Name - (Char* ':' Char*)", but for something as intimate as SPARQL and Turtle, I think that would be hard on readers.) I'm comfortable with changing the production numbers up to PR as it is clearly not a change to the language. Maybe a little at-risk-like text could readers of this volatility.
> 
> 
> > 3 is the biggest change, necessitated by the addtion of " | blankNodePropertyList predicateObjectList?" to [6] triples. I believe I properly chased down the grammar combos and tested them with <http://w3.org/brief/MjY0>, but I'd like a second.
> > 
> > [[
> > -[1]     turtleDoc              ::= (statement)*
> > +[1]     turtleDoc              ::= statement*
> > -[2]     statement              ::= (directive '.') | (triples '.')
> > +[2]     statement              ::= directive '.' | triples '.'
> >  [3]     directive              ::= prefixID | base
> > -[4]     prefixID               ::= '@prefix' PNAME_NS IRIREF
> > +[4]     prefixID               ::= PREFIX PNAME_NS IRIREF
> > -[5]     base                   ::= '@base' IRIREF
> > +[5]     base                   ::= BASE IRIREF
> > -[6]     triples                ::= (subject predicateObjectList) | (blankNodePropertyList (predicateObjectList)?)
> > +[6]     triples                ::= subject predicateObjectList | blankNodePropertyList predicateObjectList?
> >  [7]     predicateObjectList    ::= verb objectList (';' verb objectList)* (';')?
> >  [8]     objectList             ::= object (',' object)*
> >  [9]     verb                   ::= predicate | 'a'
> >  [10]    subject                ::= iri | blank
> >  [11]    predicate              ::= iri
> > -[12]    object                 ::= iri | blank | literal
> > +[12]    object                 ::= iri | blank | blankNodePropertyList | literal
> >  [13]    literal                ::= RDFLiteral | NumericLiteral | BooleanLiteral
> > -[14]    blank                  ::= BlankNode | blankNodePropertyList | collection
> > +[14]    blank                  ::= BlankNode | collection
> >  [15]    blankNodePropertyList  ::= '[' predicateObjectList ']'
> > -[16]    collection             ::= '(' (object)* ')'
> > +[16]    collection             ::= '(' object* ')'
> > -[60s]   RDFLiteral             ::= String (LANGTAG | ('^^' iri))?
> > +[60s]   RDFLiteral             ::= String (LANGTAG | '^^' iri)?
> >  [61s]   NumericLiteral         ::= NumericLiteralUnsigned | NumericLiteralPositive | NumericLiteralNegative
> >  [62s]   NumericLiteralUnsigned ::= INTEGER | DECIMAL | DOUBLE
> >  [63s]   NumericLiteralPositive ::= INTEGER_POSITIVE | DECIMAL_POSITIVE | DOUBLE_POSITIVE
> > @@ -24,24 +24,26 @@
> >  [67s]   iri                    ::= IRIREF | PrefixedName
> >  [68s]   PrefixedName           ::= PNAME_LN | PNAME_NS
> >  [69s]   BlankNode              ::= BLANK_NODE_LABEL | ANON
> > +[17]    BASE                   ::= '@base'
> > +[18]    PREFIX                 ::= '@prefix'
> >  [132s]  IRIREF                 ::= '<' ([^#x00-#x20<>\"{}|^`\\] | UCHAR)* '>'
> > -[133s]  PNAME_NS               ::= (PN_PREFIX)? ':'
> > +[133s]  PNAME_NS               ::= PN_PREFIX? ':'
> >  [134s]  PNAME_LN               ::= PNAME_NS PN_LOCAL
> >  [135s]  BLANK_NODE_LABEL       ::= '_:' (PN_CHARS_U | [0-9]) ((PN_CHARS | '.')* PN_CHARS)?
> > -[19]    LANGTAG                ::= '@' ([a-zA-Z])+ ('-' ([a-zA-Z0-9])+)*
> > +[19]    LANGTAG                ::= BASE | PREFIX | '@' [a-zA-Z]+ ('-' [a-zA-Z0-9]+)*
> > -[20]    INTEGER                ::= ([+-])? ([0-9])+
> > +[20]    INTEGER                ::= [+-]? [0-9]+
> > -[21]    DECIMAL                ::= ([+-])? (([0-9])* '.' ([0-9])+)
> > +[21]    DECIMAL                ::= [+-]? ([0-9]* '.' [0-9]+)
> > -[22]    DOUBLE                 ::= ([+-])? ((([0-9])+ '.' ([0-9])* EXPONENT) | ('.' ([0-9])+ EXPONENT) | (([0-9])+ EXPONENT))
> > +[22]    DOUBLE                 ::= [+-]? (([0-9]+ '.' [0-9]* EXPONENT) | ('.' [0-9]+ EXPONENT) | ([0-9]+ EXPONENT))
> > -[148s]  EXPONENT               ::= [eE] ([+-])? ([0-9])+
> > +[148s]  EXPONENT               ::= [eE] [+-]? [0-9]+
> >  [149s]  STRING_LITERAL1        ::= '"' ([^#x27#x5C#xA#xD] | ECHAR | UCHAR)* '"'
> >  [150s]  STRING_LITERAL2        ::= "'" ([^#x22#x5C#xA#xD] | ECHAR | UCHAR)* "'"
> >  [151s]  STRING_LITERAL_LONG1   ::= "'''" (("'" | "''")? ([^'\] | ECHAR | UCHAR))* "'''"
> >  [152s]  STRING_LITERAL_LONG2   ::= '"""' (('"' | '""')? ([^"\] | ECHAR | UCHAR))* '"""'
> > -[19]    UCHAR                  ::= ('\u' HEX HEX HEX HEX) | ('\U' HEX HEX HEX HEX HEX HEX HEX HEX)
> > +[23]    UCHAR                  ::= ('\u' HEX HEX HEX HEX) | ('\U' HEX HEX HEX HEX HEX HEX HEX HEX)
> >  [153s]  ECHAR                  ::= '\' [tbnrf\"']
> > -[154s]  NIL                    ::= '(' (WS)* ')'
> > +[154s]  NIL                    ::= '(' WS* ')'
> >  [155s]  WS                     ::= #x20 | #x9 | #xD | #xA
> > -[156s]  ANON                   ::= '[' (WS)* ']'
> > +[156s]  ANON                   ::= '[' WS* ']'
> >  [157s]  PN_CHARS_BASE          ::= [A-Z] | [a-z] | [#00C0-#00D6] | [#00D8-#00F6] | [#00F8-#02FF] | [#0370-#037D] | [#037F-#1FFF] | [#200C-#200D] | [#2070-#218F] | [#2C00-#2FEF] | [#3001-#D7FF] | [#F900-#FDCF] | [#FDF0-#FFFD] | [#10000-#EFFFF]
> >  [158s]  PN_CHARS_U             ::= PN_CHARS_BASE | '_' | ':'
> >  [160s]  PN_CHARS               ::= PN_CHARS_U | '-' | [0-9] | #00B7 | [#0300-#036F] | [#203F-#2040]
> > ]]
> > 
> > I haven't changed
> > -[22]    DOUBLE                 ::= [+-]? (([0-9]+ '.' [0-9]* EXPONENT) | ('.' [0-9]+ EXPONENT) | ([0-9]+ EXPONENT)) to
> > +[22]    DOUBLE                 ::= [+-]? ( [0-9]+ '.' [0-9]* EXPONENT  |  '.' [0-9]+ EXPONENT  |  [0-9]+ EXPONENT )
> > 'cause I wasn't sure if others found the former more readable (though I personally prefer fewer ()s (they get a bit oppressive (when used in excess))).
> > 
> > 
> > > This does not make it very easy to see any material changes:
> > > http://dvcs.w3.org/hg/rdf/rev/8b47a7006c8c
> > > 
> > > The hg log is to changes of the HTML and it's very hard to see the
> > > real changes when it has:
> > > 
> > >  1.7 -    <td>[1]<td>
> > >  1.8 -    <td><code>turtleDoc</code><td>
> > >  1.9 +    <td>[1]</td>
> > > 1.10 +    <td><code>turtleDoc</code></td>
> > > 
> > > - - - - - - - - - - - - -
> > > 
> > > I noticed there are 2 * 17's:
> > > 
> > > [17]  BASE  ::=  '@base'
> > > [17]  PREFIX  ::=  '@prefix'
> > > 
> > >  Thanks
> > >  Andy
> > > 
> > > 
> > 
> > -- 
> > -ericP
> 
> -- 
> -ericP

-- 
-ericP
Received on Friday, 15 June 2012 16:49:02 UTC