W3C home > Mailing lists > Public > semantic-web@w3.org > February 2010

Re: mapping from Turtle grammar to RDF graph

From: Eric Prud'hommeaux <eric@w3.org>
Date: Wed, 3 Feb 2010 10:35:29 -0500
To: Dave Beckett <dave@dajobe.org>
Cc: pfps@research.bell-labs.com, semantic-web@w3.org
Message-ID: <20100203153525.GC32619@w3.org>
* Eric Prud'hommeaux <eric@w3.org> [2010-02-02 16:50-0500]
> * Dave Beckett <dave@dajobe.org> [2010-02-02 07:54-0800]
> > Eric Prud'hommeaux wrote:
> > > Peter, all, anyone interested in debugging a mapping from a turtle
> > > grammar to triple production rules?
> > >   http://www.w3.org/2010/01/31-Turtle#⋈
> > > 
> > > I still need to stick encoding issues in there (like \"),
> > > but this should serve as a start.
> > 
> > I'm interested and it seems the right direction but I'm finding this a
> > little hard to understand.
> I'm certainly sympathetic to that. Any ideas gratefully investigated.
> >                              I'd hope that we can get out a strong
> > mapping (like this) which is sufficiently formal that it addresses the
> > concerns Peter raised in 2008 [1]
> yeah, that's what motivated this. pfps outlines a recipe and i need to
> test my recipe against his. his target is ntriples, while i prefer to
> map to RDF terms and count on the ntriples spec to turn escaped URIs
> into IRIs.

Comparing pfps's recipe [1] aginst the recipe in [2] which unescapes a
set of terminals and defines the production of RDF terms from those
unescaped terminals:

[pfps] 0/ Handle escape characters and white space
[pfps] 0.2/ Turn each uriref into a URI references, handling escaping as in
[pfps]      S3.3 (and removing the enclosing <>).

The characters between "<" and ">" are the unicode string of the
IRI. Relative IRI resolution is then performed per Relative IRI
]] — http://www.w3.org/2010/01/31-Turtle#handle-IRI_REF

[pfps] 0.3/ Turn each quotedString into a Normal Form C Unicode string,
[pfps]      handling escaping as in S3.3 (and removing the enclosing " or """).

(quoting 1 of 4 terminals for lexical forms) [[
The characters between the outermost "'"s are the unicode string of a
lexical form.
]] — http://www.w3.org/2010/01/31-Turtle#handle-STRING_LITERAL1
* as with SPARQL, this does not mandate normalization during parsing.
  A validating parser could, of course, do more.

[pfps] 0.4/ Discard any ws
[pfps] 1/ Turn each qname and URI reference into an RDF URI reference.
[pfps] 1.1/ Turn each URI reference into an RDF URI reference, as in S3.4.

I've copied the relative resolution code from SPARQL into
(quoting 1 of 3 terminals for URI production) [[
Relative IRI resolution is then performed per Relative IRI
]] — http://www.w3.org/2010/01/31-Turtle#handle-IRI_REF

[pfps] 1.2/ Expand each qname into a uriref as in S2.1, which will be an
[pfps]      RDF URI reference (because all relative URIs have been dealt with
[pfps]      already). 
[pfps] 1.3/ Replace each occurence of 'a' as a verb with the RDF URI reference 
[pfps] 	rdf:type

If token matched was "a", curPredicate is bound to the IRI
http://www.w3.org/1999/02/22-rdf-syntax-ns#type (test: aVerb1).
]] — http://www.w3.org/2010/01/31-Turtle#curPredicate

[pfps] 1.4/ Discard any directive and trailing .
[pfps] 2/ Turn each literal into an RDF literal.  The only non-obvious part is
[pfps]    to add the appropriate datatype to integer, double, decimal, and
[pfps]    boolean.

The literal has a lexical form of the input string, and a datatype of
]] — http://www.w3.org/2010/01/31-Turtle#handle-INTEGER
SPARQL parsing doesn't demand either canonicalization or validation.
Similar treatment for DECIMAL, DOUBLE, BooleanLiteral.

[pfps] >From now on the process is working with a sequence of processed
[pfps] occurences of the triples production, i.e., pieces of the occurences may
[pfps] have been replaced with abstract objects.
[pfps] 3/ Handle blank nodes
[pfps] 3.1/ For each name used in a nodeID in the document select a fresh blank
[pfps]      node and replace any occurence of nodeID of the form _:name with
[pfps]      that blank node.  This processes each of the occurences.
[pfps] 3.2/ Recursively, until no unprocessed blank is left in the document,
[pfps]      select an unprocessed blank that does not contain an unprocessed
[pfps]      blank, select a fresh blank node, and process the blank as follows:
[pfps]      a) If blank is of the form [] replace it with the fresh blank node.
[pfps]      b) If blank is of the form [ predicateObjectList ] replace it with
[pfps] 	fresh blank node and add a new triples consisting of the fresh
[pfps] 	blank node (as subject)  and the predicateObjectlist. 
[pfps]      c) If blank is of the form () replace it with the RDF URI
[pfps] 	reference rdf:nil
[pfps]      e) If blank is of the form ( object1 ... objectn ) for n>=1
[pfps] 	- select n fresh nodes, node1, ...., noden, 
[pfps] 	- replace the blank with node1,
[pfps] 	- add 2n-2 triples with triple 2i-1 having subject nodei,
[pfps] 	  verb rdf:first, and object objecti and triple 2i having
[pfps] 	  subject nodei, verb rdf:rest, and object nodei+1, and
[pfps] 	- add two triples with the first having subject noden, verb
[pfps] 	  rdf:first, and object objectn and the second having subject
[pfps] 	  noden, verb rdf:rest, and object rdf:nil  (Yes, this is being
[pfps] 	  a bit sloppy.)
[pfps] 4/ Handle ; constructs
[pfps] 4.1/ Recursively replace any subject verb1 objectlist1 ; verb2 objectlist2
[pfps]    with subject verb1 objectlist1 . subject verb2 objectlist2
[pfps] 4.2/ Remove any remaining ;
[pfps] 5/ Handle , constructs
[pfps] 5.1/ Recursively replace any subject verb object1 , object2
[pfps]      with subject verb object1 . subject verb object2
[pfps] 6/ Turn each subject verb object . into an RDF triple.
[pfps] Selecting a fresh blank node means to select a blank node (from the
[pfps] infinite collection of blank nodes available) that has not yet been used
[pfps] in the process so far.

I took a different path here, specifying productions which generate
the subject, predicate and object of each triple.

Each GraphNode in the document produces an RDF triple of the
curSubject, curPredicate and the GraphNode.
]] — http://www.w3.org/2010/01/31-Turtle#triples
Once we find an acceptable style for this, I'll add list generation.

> > It also might be worth starting to consider whether to align the terminals
> > (qnames) more with sparql first.
> the productions ref'd in http://www.w3.org/2010/01/31-Turtle#⋈ are
> from a yacker mockup of "TurtleS" (Turtle using SPARQL terminals and
> productions, where applicable). it may still be too liberal -- needs
> some thought and testing against bad-\d\d.ttl.
> > Dave
> > 
> > [1] http://lists.w3.org/Archives/Public/semantic-web/2008Jan/0128.html
> > via my Turtle issue list
> > http://github.com/dajobe/turtle/blob/master/ISSUES.md
[2] http://www.w3.org/2010/01/31-Turtle#⋈

> -- 
> -ericP

Received on Wednesday, 3 February 2010 15:36:06 UTC

This archive was generated by hypermail 2.4.0 : Tuesday, 5 July 2022 08:45:16 UTC