- From: Eric Prud'hommeaux <eric@w3.org>
- Date: Fri, 15 Jun 2012 12:48:29 -0400
- To: Andy Seaborne <andy.seaborne@epimorphics.com>
- Cc: public-rdf-wg@w3.org
* Eric Prud'hommeaux <eric@w3.org> [2012-06-15 10:00-0400] > * Eric Prud'hommeaux <eric@w3.org> [2012-06-15 05:32-0400] > > * Andy Seaborne <andy.seaborne@epimorphics.com> [2012-06-15 09:25+0100] > > > > > > >btw, i've been updating the grammar to deal with some LL(1).LALR(1) and other conflicts. should be synched soon. > > > > > > As this is very close to LC, could you point out the changes being made? > > > > Indeed. There are three kinds of changes: > > 1 get rid of extra ()s, à la "(statement)*" > > 2 make explicit that turtle parses '"ab"@base' as a literal with a language tag. > > 3 fix lalr(1)/ll(1) conflict in > > [6] triples ::= subject predicateObjectList | blankNodePropertyList predicateObjectList? > > by moving blankNodePropertyList from [14] blank to 12 [object]. > > r442 (just committed) removed some spurious \s (there's no escaping in <http://www.w3.org/TR/REC-xml/#sec-notation>) and entity-encoded the <>s in the excluded lists in IRIREF. > > Andy, rq25's IRIREF > [138] IRIREF ::= '<' ([^<>"{}|^`\]-[#x00-#x20])* '>' > uses the A - B notation. When we added "| UCHAR", we ran into an ambiguity in the notation because there's no relative precedence between - and | in A - B | C. This could have been solved with ()s, but we decided to collapse the excluded range minus a range into a larger excluded range. Is SPARQL going to add UCHARs (\uXXXX notation)? If so, we can share that production. > > =Turtle re-using SPARQL productions= > The intro currently says "The two grammars share production and terminal names where possible." To make this absolutely true, we used to use SPARQL's: > > NumericLiteral ::= NumericLiteralUnsigned > | NumericLiteralPositive > | NumericLiteralNegative > NumericLiteralUnsigned ::= <INTEGER> > | <DECIMAL> > | <DOUBLE> > NumericLiteralPositive ::= <INTEGER_POSITIVE> > | <DECIMAL_POSITIVE> > | <DOUBLE_POSITIVE> > NumericLiteralNegative ::= <INTEGER_NEGATIVE> > | <DECIMAL_NEGATIVE> > | <DOUBLE_NEGATIVE> > > <INTEGER> ::= ([0-9])+ > <DECIMAL> ::= ([0-9])* "." ([0-9])+ > <DOUBLE> ::= ([0-9])+ "." ([0-9])* EXPONENT > | "." ([0-9])+ EXPONENT > | ([0-9])+ EXPONENT > <INTEGER_POSITIVE> ::= "+" INTEGER > <DECIMAL_POSITIVE> ::= "+" DECIMAL > <DOUBLE_POSITIVE> ::= "+" DOUBLE > <INTEGER_NEGATIVE> ::= "-" INTEGER > <DECIMAL_NEGATIVE> ::= "-" DECIMAL > <DOUBLE_NEGATIVE> ::= "-" DOUBLE > > A just slightly terser but non-parallel representation is: > > NumericLiteral ::= <INTEGER> | <DECIMAL> | <DOUBLE> > <INTEGER> ::= [+-]? [0-9]+ > <DECIMAL> ::= [+-]? ([0-9]* '.' [0-9]+) > <DOUBLE> ::= [+-]? (([0-9]+ '.' [0-9]* <EXPONENT>) > | ('.' [0-9]+ <EXPONENT>) > | ([0-9]+ <EXPONENT>)) > > I presume the intention is the latter but I'd like to confirm before I irradicate this complex markup: > [61s] NumericLiteral ::= NumericLiteralUnsigned | NumericLiteralPositive | NumericLiteralNegative > [62s] NumericLiteralUnsigned ::= INTEGER | DECIMAL | DOUBLE > [63s] NumericLiteralPositive ::= INTEGER_POSITIVE | DECIMAL_POSITIVE | DOUBLE_POSITIVE > [64s] NumericLiteralNegative ::= INTEGER_NEGATIVE | DECIMAL_NEGATIVE | DOUBLE_NEGATIVE timed out and implemented this: [[ @@ -14,26 +14,22 @@ [14] blank ::= BlankNode | collection [15] blankNodePropertyList ::= '[' predicateObjectList ']' [16] collection ::= '(' object* ')' -[60s] RDFLiteral ::= String (LANGTAG | '^^' iri)? -[61s] NumericLiteral ::= NumericLiteralUnsigned | NumericLiteralPositive | NumericLiteralNegative -[62s] NumericLiteralUnsigned ::= INTEGER | DECIMAL | DOUBLE -[63s] NumericLiteralPositive ::= INTEGER_POSITIVE | DECIMAL_POSITIVE | DOUBLE_POSITIVE -[64s] NumericLiteralNegative ::= INTEGER_NEGATIVE | DECIMAL_NEGATIVE | DOUBLE_NEGATIVE +[17] RDFLiteral ::= INTEGER | DECIMAL | DOUBLE [65s] BooleanLiteral ::= 'true' | 'false' [66s] String ::= STRING_LITERAL1 | STRING_LITERAL2 | STRING_LITERAL_LONG1 | STRING_LITERAL_LONG2 [67s] iri ::= IRIREF | PrefixedName [68s] PrefixedName ::= PNAME_LN | PNAME_NS [69s] BlankNode ::= BLANK_NODE_LABEL | ANON -[17] BASE ::= '@base' -[18] PREFIX ::= '@prefix' -[132s] IRIREF ::= '<' ([^#x00-#x20<>\"{}|^`\\] | UCHAR)* '>' +[18] BASE ::= '@base' +[19] PREFIX ::= '@prefix' +[132s] IRIREF ::= '<' ([^#x00-#x20<>"{}|^`\] | UCHAR)* '>' [133s] PNAME_NS ::= PN_PREFIX? ':' [134s] PNAME_LN ::= PNAME_NS PN_LOCAL [135s] BLANK_NODE_LABEL ::= '_:' (PN_CHARS_U | [0-9]) ((PN_CHARS | '.')* PN_CHARS)? -[19] LANGTAG ::= BASE | PREFIX | '@' [a-zA-Z]+ ('-' [a-zA-Z0-9]+)* -[20] INTEGER ::= [+-]? [0-9]+ -[21] DECIMAL ::= [+-]? ([0-9]* '.' [0-9]+) -[22] DOUBLE ::= [+-]? (([0-9]+ '.' [0-9]* EXPONENT) | ('.' [0-9]+ EXPONENT) | ([0-9]+ EXPONENT)) +[20] LANGTAG ::= BASE | PREFIX | '@' [a-zA-Z]+ ('-' [a-zA-Z0-9]+)* +[21] INTEGER ::= [+-]? [0-9]+ +[22] DECIMAL ::= [+-]? ([0-9]* '.' [0-9]+) +[23] DOUBLE ::= [+-]? (([0-9]+ '.' [0-9]* EXPONENT) | ('.' [0-9]+ EXPONENT) | ([0-9]+ EXPONENT)) [148s] EXPONENT ::= [eE] [+-]? [0-9]+ [149s] STRING_LITERAL1 ::= '"' ([^#x27#x5C#xA#xD] | ECHAR | UCHAR)* '"' [150s] STRING_LITERAL2 ::= "'" ([^#x22#x5C#xA#xD] | ECHAR | UCHAR)* "'" ]] tested with [[ [] <p> <o1>, "o2", [ <p2> _:o3 ] ; <p3> ( <o4> "o5"@base "o5"@prefix _:o6 [ <p4> <o8> ] ), <o9> . [ <p5> """o10 line 2""", '''o11 line 3'''^^<integer> ; <p6> 12, +12, -12, # [+-]? [0-9]+ 13.0, +13.0, -13.0, # [+-]? [0-9]* '.' [0-9]+ with *=2 .0, +.0, -.0, # [+-]? [0-9]* '.' [0-9]+ with *=0 14.E0, +14.E0, -14.E0, # [+-]? [0-9]+ '.' [0-9]* EXPONENT with *=0 14.0E0, +14.0E0, # [+-]? [0-9]+ '.' [0-9]* EXPONENT with *=1 .14E2, +.14E2, -.14E2, -14.0E0, # [+-]? '.' [0-9]+ EXPONENT 1.4E1, +1.4E1, -1.4E1, # [+-]? [0-9]+ EXPONENT) 14e0, 14e+0, 14e-0 # [eE] [+-]? [0-9]+ ]. ]] > Another issue is that by stating that we use the same productions as SPARQL, we have to synchronize with SPARQL. In principle, this is easily resolve by a bit of friendly competition on the part of the editors: which ever makes it to PR second has to tweak their foreign production numbers (e.g. "[132s]" or "[17t]") to reference the winner. (Some specs simply don't include referenced productions, e.g. Namespaces in XML's reference to XML in "[4] NCName ::= Name - (Char* ':' Char*)", but for something as intimate as SPARQL and Turtle, I think that would be hard on readers.) I'm comfortable with changing the production numbers up to PR as it is clearly not a change to the language. Maybe a little at-risk-like text could readers of this volatility. > > > > 3 is the biggest change, necessitated by the addtion of " | blankNodePropertyList predicateObjectList?" to [6] triples. I believe I properly chased down the grammar combos and tested them with <http://w3.org/brief/MjY0>, but I'd like a second. > > > > [[ > > -[1] turtleDoc ::= (statement)* > > +[1] turtleDoc ::= statement* > > -[2] statement ::= (directive '.') | (triples '.') > > +[2] statement ::= directive '.' | triples '.' > > [3] directive ::= prefixID | base > > -[4] prefixID ::= '@prefix' PNAME_NS IRIREF > > +[4] prefixID ::= PREFIX PNAME_NS IRIREF > > -[5] base ::= '@base' IRIREF > > +[5] base ::= BASE IRIREF > > -[6] triples ::= (subject predicateObjectList) | (blankNodePropertyList (predicateObjectList)?) > > +[6] triples ::= subject predicateObjectList | blankNodePropertyList predicateObjectList? > > [7] predicateObjectList ::= verb objectList (';' verb objectList)* (';')? > > [8] objectList ::= object (',' object)* > > [9] verb ::= predicate | 'a' > > [10] subject ::= iri | blank > > [11] predicate ::= iri > > -[12] object ::= iri | blank | literal > > +[12] object ::= iri | blank | blankNodePropertyList | literal > > [13] literal ::= RDFLiteral | NumericLiteral | BooleanLiteral > > -[14] blank ::= BlankNode | blankNodePropertyList | collection > > +[14] blank ::= BlankNode | collection > > [15] blankNodePropertyList ::= '[' predicateObjectList ']' > > -[16] collection ::= '(' (object)* ')' > > +[16] collection ::= '(' object* ')' > > -[60s] RDFLiteral ::= String (LANGTAG | ('^^' iri))? > > +[60s] RDFLiteral ::= String (LANGTAG | '^^' iri)? > > [61s] NumericLiteral ::= NumericLiteralUnsigned | NumericLiteralPositive | NumericLiteralNegative > > [62s] NumericLiteralUnsigned ::= INTEGER | DECIMAL | DOUBLE > > [63s] NumericLiteralPositive ::= INTEGER_POSITIVE | DECIMAL_POSITIVE | DOUBLE_POSITIVE > > @@ -24,24 +24,26 @@ > > [67s] iri ::= IRIREF | PrefixedName > > [68s] PrefixedName ::= PNAME_LN | PNAME_NS > > [69s] BlankNode ::= BLANK_NODE_LABEL | ANON > > +[17] BASE ::= '@base' > > +[18] PREFIX ::= '@prefix' > > [132s] IRIREF ::= '<' ([^#x00-#x20<>\"{}|^`\\] | UCHAR)* '>' > > -[133s] PNAME_NS ::= (PN_PREFIX)? ':' > > +[133s] PNAME_NS ::= PN_PREFIX? ':' > > [134s] PNAME_LN ::= PNAME_NS PN_LOCAL > > [135s] BLANK_NODE_LABEL ::= '_:' (PN_CHARS_U | [0-9]) ((PN_CHARS | '.')* PN_CHARS)? > > -[19] LANGTAG ::= '@' ([a-zA-Z])+ ('-' ([a-zA-Z0-9])+)* > > +[19] LANGTAG ::= BASE | PREFIX | '@' [a-zA-Z]+ ('-' [a-zA-Z0-9]+)* > > -[20] INTEGER ::= ([+-])? ([0-9])+ > > +[20] INTEGER ::= [+-]? [0-9]+ > > -[21] DECIMAL ::= ([+-])? (([0-9])* '.' ([0-9])+) > > +[21] DECIMAL ::= [+-]? ([0-9]* '.' [0-9]+) > > -[22] DOUBLE ::= ([+-])? ((([0-9])+ '.' ([0-9])* EXPONENT) | ('.' ([0-9])+ EXPONENT) | (([0-9])+ EXPONENT)) > > +[22] DOUBLE ::= [+-]? (([0-9]+ '.' [0-9]* EXPONENT) | ('.' [0-9]+ EXPONENT) | ([0-9]+ EXPONENT)) > > -[148s] EXPONENT ::= [eE] ([+-])? ([0-9])+ > > +[148s] EXPONENT ::= [eE] [+-]? [0-9]+ > > [149s] STRING_LITERAL1 ::= '"' ([^#x27#x5C#xA#xD] | ECHAR | UCHAR)* '"' > > [150s] STRING_LITERAL2 ::= "'" ([^#x22#x5C#xA#xD] | ECHAR | UCHAR)* "'" > > [151s] STRING_LITERAL_LONG1 ::= "'''" (("'" | "''")? ([^'\] | ECHAR | UCHAR))* "'''" > > [152s] STRING_LITERAL_LONG2 ::= '"""' (('"' | '""')? ([^"\] | ECHAR | UCHAR))* '"""' > > -[19] UCHAR ::= ('\u' HEX HEX HEX HEX) | ('\U' HEX HEX HEX HEX HEX HEX HEX HEX) > > +[23] UCHAR ::= ('\u' HEX HEX HEX HEX) | ('\U' HEX HEX HEX HEX HEX HEX HEX HEX) > > [153s] ECHAR ::= '\' [tbnrf\"'] > > -[154s] NIL ::= '(' (WS)* ')' > > +[154s] NIL ::= '(' WS* ')' > > [155s] WS ::= #x20 | #x9 | #xD | #xA > > -[156s] ANON ::= '[' (WS)* ']' > > +[156s] ANON ::= '[' WS* ']' > > [157s] PN_CHARS_BASE ::= [A-Z] | [a-z] | [#00C0-#00D6] | [#00D8-#00F6] | [#00F8-#02FF] | [#0370-#037D] | [#037F-#1FFF] | [#200C-#200D] | [#2070-#218F] | [#2C00-#2FEF] | [#3001-#D7FF] | [#F900-#FDCF] | [#FDF0-#FFFD] | [#10000-#EFFFF] > > [158s] PN_CHARS_U ::= PN_CHARS_BASE | '_' | ':' > > [160s] PN_CHARS ::= PN_CHARS_U | '-' | [0-9] | #00B7 | [#0300-#036F] | [#203F-#2040] > > ]] > > > > I haven't changed > > -[22] DOUBLE ::= [+-]? (([0-9]+ '.' [0-9]* EXPONENT) | ('.' [0-9]+ EXPONENT) | ([0-9]+ EXPONENT)) to > > +[22] DOUBLE ::= [+-]? ( [0-9]+ '.' [0-9]* EXPONENT | '.' [0-9]+ EXPONENT | [0-9]+ EXPONENT ) > > 'cause I wasn't sure if others found the former more readable (though I personally prefer fewer ()s (they get a bit oppressive (when used in excess))). > > > > > > > This does not make it very easy to see any material changes: > > > http://dvcs.w3.org/hg/rdf/rev/8b47a7006c8c > > > > > > The hg log is to changes of the HTML and it's very hard to see the > > > real changes when it has: > > > > > > 1.7 - <td>[1]<td> > > > 1.8 - <td><code>turtleDoc</code><td> > > > 1.9 + <td>[1]</td> > > > 1.10 + <td><code>turtleDoc</code></td> > > > > > > - - - - - - - - - - - - - > > > > > > I noticed there are 2 * 17's: > > > > > > [17] BASE ::= '@base' > > > [17] PREFIX ::= '@prefix' > > > > > > Thanks > > > Andy > > > > > > > > > > -- > > -ericP > > -- > -ericP -- -ericP
Received on Friday, 15 June 2012 16:49:02 UTC