comments on the turtle team submission

From: Peter F. Patel-Schneider <pfps@research.bell-labs.com>
Subject: Re: comments on the N3 team submission
Date: Wed, 16 Jan 2008 16:31:50 -0500 (EST)

> From: Peter F. Patel-Schneider <pfps@research.bell-labs.com>
> Subject: comments on the N3 and Turtle team submissions
> Date: Wed, 16 Jan 2008 09:08:22 -0500 (EST)
> 
> > From: Sandro Hawke <sandro@w3.org>
> > Subject: Re: ISSUE-93 (Language tags): RFC 3066 - Tags for the Identification of Languages 
> > Date: Wed, 16 Jan 2008 09:29:48 -0500
> 
> [...]
> 
> > > If you've actually read through the documents [the N3 and Turtle
> > > team submissions], it'd be great if you'd
> > > send along any constructive comments you have.
> > > 
> > >      - s
> > 
> > As I've said in some other contexts, I'm willing to provide what I
> > consider to be constructive comments on many documents.  However I don't
> > guarantee that the authors will like my comments.
> > 
> > peter


What is Turtle?  

>From the document:

	This document defines a textual syntax for RDF called Turtle
	that allows RDF graphs to be completely written in a compact and
	natural text form, with abbreviations for common usage patterns
	and datatypes.

	This document defines Turtle, the Terse RDF Triple Language, a
	concrete syntax for RDF as defined in the RDF Concepts and
	Abstract Syntax ([RDF-CONCEPTS]) W3C Recommendation. 

	A Turtle document allows writing down an RDF graph in a compact
	textual form. It consists of a sequence of directives,
	triple-generating statements [and] blank lines. 

I was thus expecting to find not only a grammar for Turtle, but a
normative mechanism for turning a Turtle document into an RDF graph.
Unfortunately there is no such mechanism in the document, even parts of
the informative Section 2 are promoted to being normative.

The document contains portions of what is needed, but not a complete
mechanism.  The flaws are four-fold:

1/ There is nothing said about how to process some parts of the Turtle
   language, e.g., collections with other than zero or two components or
   even the [] construct.

2/ There is no overall guidance on the order in which actions needed to
   convert a Turtle document into an RDF graph are performed.  For
   example, dealing with [] after , can give different results from
   dealing with [] before ,.

3/ Some statements in the document are not correct.  For example,
   triple-generating statements are not defined in the document, but
   this notion is used in the document.

4/ The target of the conversion is not specified.  The target could be
   N-Triples or RDF graphs.  If the target is N-Triples, then Turtle
   documents have to be carefully processed to turn any non-ASCII
   characters into ASCII.


To make the document complete requires a processing model for Turtle.  

I suggest something like:

A Turtle document is turned into an RDF graph in a multi-stage process.
This process turns a sequence of Unicode charaters into an abstract
object.  To implement the process requires more than just character
replacement, as intermediate results are abstract objects like URI
references.  Throughout, rdf:XXX will mean the RDF URI reference of the
form http://www.w3.org/1999/02/22-rdf-syntax-ns#XXX for XXX any name.

0/ Handle escape characters and white space
0.2/ Turn each uriref into a URI references, handling escaping as in
     S3.3 (and removing the enclosing <>).
0.3/ Turn each quotedString into a Normal Form C Unicode string,
     handling escaping as in S3.3 (and removing the enclosing " or """).
0.4/ Discard any ws
1/ Turn each qname and URI reference into an RDF URI reference.
1.1/ Turn each URI reference into an RDF URI reference, as in S3.4.
1.2/ Expand each qname into a uriref as in S2.1, which will be an
     RDF URI reference (because all relative URIs have been dealt with
     already). 
1.3/ Replace each occurence of 'a' as a verb with the RDF URI reference 
	rdf:type
1.4/ Discard any directive and trailing .
2/ Turn each literal into an RDF literal.  The only non-obvious part is
   to add the appropriate datatype to integer, double, decimal, and
   boolean.

>From now on the process is working with a sequence of processed
occurences of the triples production, i.e., pieces of the occurences may
have been replaced with abstract objects.

3/ Handle blank nodes
3.1/ For each name used in a nodeID in the document select a fresh blank
     node and replace any occurence of nodeID of the form _:name with
     that blank node.  This processes each of the occurences.
3.2/ Recursively, until no unprocessed blank is left in the document,
     select an unprocessed blank that does not contain an unprocessed
     blank, select a fresh blank node, and process the blank as follows:
     a) If blank is of the form [] replace it with the fresh blank node.
     b) If blank is of the form [ predicateObjectList ] replace it with
	fresh blank node and add a new triples consisting of the fresh
	blank node (as subject)  and the predicateObjectlist. 
     c) If blank is of the form () replace it with the RDF URI
	reference rdf:nil
     e) If blank is of the form ( object1 ... objectn ) for n>=1
	- select n fresh nodes, node1, ...., noden, 
	- replace the blank with node1,
	- add 2n-2 triples with triple 2i-1 having subject nodei,
	  verb rdf:first, and object objecti and triple 2i having
	  subject nodei, verb rdf:rest, and object nodei+1, and
	- add two triples with the first having subject noden, verb
	  rdf:first, and object objectn and the second having subject
	  noden, verb rdf:rest, and object rdf:nil  (Yes, this is being
	  a bit sloppy.)
4/ Handle ; constructs
4.1/ Recursively replace any subject verb1 objectlist1 ; verb2 objectlist2
   with subject verb1 objectlist1 . subject verb2 objectlist2
4.2/ Remove any remaining ;
5/ Handle , constructs
5.1/ Recursively replace any subject verb object1 , object2
     with subject verb object1 . subject verb object2
6/ Turn each subject verb object . into an RDF triple.

Selecting a fresh blank node means to select a blank node (from the
infinite collection of blank nodes available) that has not yet been used
in the process so far.

The process results in an RDF graph as in
http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/#section-rdf-graph



Some typographical and wording problems:

triple-generating statements or blank lines. -> triple-generating statements and blank lines. 
in the RDF Concepts and -> in RDF Concepts and
keeping it in the RDF model. -> keeping within the RDF model.
separated by whitespace and terminated by '.' after each triple. -> ???
repeated URIs -> URIs
any legal URI form (full or qualified) : full URI has not been defined
a blank node either from the given nodeID. -> a blank node with the given nodeID. 
A generated blank node -> An anonymous blank node ??
be made with [] -> written as []
Boolean may be -> Booleans may be
an relative -> a relative

Received on Thursday, 17 January 2008 09:17:47 UTC