- From: Andy Seaborne <andy.seaborne@epimorphics.com>
- Date: Wed, 20 Jun 2012 18:50:09 +0100
- To: RDF-WG <public-rdf-wg@w3.org>
- Message-ID: <4FE20D51.1070001@epimorphics.com>
ACTION-175
== Summary
predicateObjectList ::= verb objectList (";" (predicateObjectList)?)?
== The problem
We want to accept
S P O .
S P O ; P O .
S P O ; .
S P O ; P O ; .
but not
S P O ; ; P O .
S P O ; ; .
S P O ; P O ; ; .
== The details
The rule in the Turtle draft, and in the submission, is:
[7]
predicateObjectList ::= verb objectList ( ';' verb objectList )* ( ';')?
For a parser that works simply from left-to-right, looking no more than
one token ahead, it reads "verb objectList" then sees (lookahead of one)
a ";". What does it do?
Does it go into
( ';' verb objectList )*
and process "verb objectList"
or skip this because * can be zero occurrences, and go on to:
( ';')?
and exit the rule?
The next token is going to be "." in legal Turtle but that is 2 tokens
of lookahead, ";" and ".".
== A solution
predicateObjectList ::= verb objectList (";" (predicateObjectList)?)?
which is recursive on predicateObjectList.
After the first "verb objectList", if the parser sees ";", then it
enters the outer ()?
";" (predicateObjectList)?
consuming one ";" and tries again.
A predicateObjectList can't be empty, (predicateObjectList)? can be.
The next token is going to be "." in legal Turtle.
The recursion can be unrolled into a loop. javacc is an LL(1) does not
do this - it generates a recursive call and on a very large file can run
out of stackspace. There is a test in the turtle test suite for this
test-16.ttl.
== SPARQL
SPARQL does not impose the one trailing semi-colon. Recursion was not
popular.
PropertyListNotEmpty ::= Verb ObjectList ( ';' ( Verb ObjectList )? )*
which allows
S P O ;;;; .
After the first Verb ObjectList, if it sees a ";" it goes into the ()*
and then it may allow another Verb ObjectList. It loops back if it see
a ";" next.
It is not recursive -- it's a loop and compiler generators do generate a
loop so no problems at scale.
It (SPARQLs approach) is different from deployed Turtle. I personally
don't see this as an important issue as there other changes like local
part of prefix names, specifically leading digit, which are far more
likely in new data sent to old parser.
== Experiment
Attached is a JavaCC grammar file (LL(1), the default for javacc) for a
little language covering tokens that are S<number> P<number> O<number>
so you can see what is accepted where.
T := ( <S> POL "." )* <EOF>
VO := <P> <O>
POL := VO ( ";" ( POL )? )?
TOKENs
P:= "P" ["0"-"9"]
S:= "S" ["0"-"9"]
O:= "O" ["0"-"9"]
so VO is a verbObjectList and POL is predicateObjectList
== For Eric
The generated code for POL is: my comments using ##
final public void POL() throws ParseException {
## POL := VO ( ";" ( POL )? )?
## Debug mode - pritns what it is doing.
trace_call("POL");
try {
## unconditionally do VO
VO();
## Look for ";" or "."
switch ((jj_ntk==-1)?jj_ntk():jj_ntk) {
case 2:
## the case of ";"
## Enter ( ";" ( POL )? )?
jj_consume_token(2);
switch ((jj_ntk==-1)?jj_ntk():jj_ntk) {
case P:
## It's a P so it must be POL from (POL)?
POL();
break;
default:
## Not a P - end (POL)?
jj_la1[1] = jj_gen;
;
}
break;
default:
## Not a ";" do not enter ( ";" ... )?
jj_la1[2] = jj_gen;
;
}
} finally {
## Debug.
trace_return("POL");
}
}
I can send the entire parser ...
Attachments
- text/plain attachment: T.jj
Received on Wednesday, 20 June 2012 17:50:39 UTC