- From: Eric Prud'hommeaux <eric@w3.org>
- Date: Wed, 3 Feb 2010 10:35:29 -0500
- To: Dave Beckett <dave@dajobe.org>
- Cc: pfps@research.bell-labs.com, semantic-web@w3.org
* Eric Prud'hommeaux <eric@w3.org> [2010-02-02 16:50-0500] > * Dave Beckett <dave@dajobe.org> [2010-02-02 07:54-0800] > > Eric Prud'hommeaux wrote: > > > Peter, all, anyone interested in debugging a mapping from a turtle > > > grammar to triple production rules? > > > http://www.w3.org/2010/01/31-Turtle#⋈ > > > > > > I still need to stick encoding issues in there (like \"), > > > but this should serve as a start. > > > > I'm interested and it seems the right direction but I'm finding this a > > little hard to understand. > > I'm certainly sympathetic to that. Any ideas gratefully investigated. > > > I'd hope that we can get out a strong > > mapping (like this) which is sufficiently formal that it addresses the > > concerns Peter raised in 2008 [1] > > yeah, that's what motivated this. pfps outlines a recipe and i need to > test my recipe against his. his target is ntriples, while i prefer to > map to RDF terms and count on the ntriples spec to turn escaped URIs > into IRIs. Comparing pfps's recipe [1] aginst the recipe in [2] which unescapes a set of terminals and defines the production of RDF terms from those unescaped terminals: [pfps] 0/ Handle escape characters and white space [pfps] 0.2/ Turn each uriref into a URI references, handling escaping as in [pfps] S3.3 (and removing the enclosing <>). [[ The characters between "<" and ">" are the unicode string of the IRI. Relative IRI resolution is then performed per Relative IRI Resolution. ]] — http://www.w3.org/2010/01/31-Turtle#handle-IRI_REF [pfps] 0.3/ Turn each quotedString into a Normal Form C Unicode string, [pfps] handling escaping as in S3.3 (and removing the enclosing " or """). (quoting 1 of 4 terminals for lexical forms) [[ The characters between the outermost "'"s are the unicode string of a lexical form. ]] — http://www.w3.org/2010/01/31-Turtle#handle-STRING_LITERAL1 * as with SPARQL, this does not mandate normalization during parsing. A validating parser could, of course, do more. [pfps] 0.4/ Discard any ws [pfps] 1/ Turn each qname and URI reference into an RDF URI reference. [pfps] 1.1/ Turn each URI reference into an RDF URI reference, as in S3.4. I've copied the relative resolution code from SPARQL into http://www.w3.org/2010/01/31-Turtle#⋈ (quoting 1 of 3 terminals for URI production) [[ Relative IRI resolution is then performed per Relative IRI Resolution⋈. ]] — http://www.w3.org/2010/01/31-Turtle#handle-IRI_REF [pfps] 1.2/ Expand each qname into a uriref as in S2.1, which will be an [pfps] RDF URI reference (because all relative URIs have been dealt with [pfps] already). [pfps] 1.3/ Replace each occurence of 'a' as a verb with the RDF URI reference [pfps] rdf:type [[ If token matched was "a", curPredicate is bound to the IRI http://www.w3.org/1999/02/22-rdf-syntax-ns#type (test: aVerb1). ]] — http://www.w3.org/2010/01/31-Turtle#curPredicate [pfps] 1.4/ Discard any directive and trailing . [pfps] 2/ Turn each literal into an RDF literal. The only non-obvious part is [pfps] to add the appropriate datatype to integer, double, decimal, and [pfps] boolean. [[ The literal has a lexical form of the input string, and a datatype of xsd:integer. ]] — http://www.w3.org/2010/01/31-Turtle#handle-INTEGER SPARQL parsing doesn't demand either canonicalization or validation. Similar treatment for DECIMAL, DOUBLE, BooleanLiteral. [pfps] >From now on the process is working with a sequence of processed [pfps] occurences of the triples production, i.e., pieces of the occurences may [pfps] have been replaced with abstract objects. [pfps] [pfps] 3/ Handle blank nodes [pfps] 3.1/ For each name used in a nodeID in the document select a fresh blank [pfps] node and replace any occurence of nodeID of the form _:name with [pfps] that blank node. This processes each of the occurences. [pfps] 3.2/ Recursively, until no unprocessed blank is left in the document, [pfps] select an unprocessed blank that does not contain an unprocessed [pfps] blank, select a fresh blank node, and process the blank as follows: [pfps] a) If blank is of the form [] replace it with the fresh blank node. [pfps] b) If blank is of the form [ predicateObjectList ] replace it with [pfps] fresh blank node and add a new triples consisting of the fresh [pfps] blank node (as subject) and the predicateObjectlist. [pfps] c) If blank is of the form () replace it with the RDF URI [pfps] reference rdf:nil [pfps] e) If blank is of the form ( object1 ... objectn ) for n>=1 [pfps] - select n fresh nodes, node1, ...., noden, [pfps] - replace the blank with node1, [pfps] - add 2n-2 triples with triple 2i-1 having subject nodei, [pfps] verb rdf:first, and object objecti and triple 2i having [pfps] subject nodei, verb rdf:rest, and object nodei+1, and [pfps] - add two triples with the first having subject noden, verb [pfps] rdf:first, and object objectn and the second having subject [pfps] noden, verb rdf:rest, and object rdf:nil (Yes, this is being [pfps] a bit sloppy.) [pfps] 4/ Handle ; constructs [pfps] 4.1/ Recursively replace any subject verb1 objectlist1 ; verb2 objectlist2 [pfps] with subject verb1 objectlist1 . subject verb2 objectlist2 [pfps] 4.2/ Remove any remaining ; [pfps] 5/ Handle , constructs [pfps] 5.1/ Recursively replace any subject verb object1 , object2 [pfps] with subject verb object1 . subject verb object2 [pfps] 6/ Turn each subject verb object . into an RDF triple. [pfps] [pfps] Selecting a fresh blank node means to select a blank node (from the [pfps] infinite collection of blank nodes available) that has not yet been used [pfps] in the process so far. I took a different path here, specifying productions which generate the subject, predicate and object of each triple. [[ Each GraphNode in the document produces an RDF triple of the curSubject, curPredicate and the GraphNode. ]] — http://www.w3.org/2010/01/31-Turtle#triples Once we find an acceptable style for this, I'll add list generation. > > It also might be worth starting to consider whether to align the terminals > > (qnames) more with sparql first. > > the productions ref'd in http://www.w3.org/2010/01/31-Turtle#⋈ are > from a yacker mockup of "TurtleS" (Turtle using SPARQL terminals and > productions, where applicable). it may still be too liberal -- needs > some thought and testing against bad-\d\d.ttl. > > > Dave > > > > [1] http://lists.w3.org/Archives/Public/semantic-web/2008Jan/0128.html > > via my Turtle issue list > > http://github.com/dajobe/turtle/blob/master/ISSUES.md [2] http://www.w3.org/2010/01/31-Turtle#⋈ > -- > -ericP -- -ericP
Received on Wednesday, 3 February 2010 15:36:06 UTC