[Turtle] Some initial thoughts

== Relationship of languages

An N-Triples document is a syntactically valid Turtle document.
An N-Triples document is a valid N-Quads document.

A Turtle document is not a valid TriG document.
An N-Quads document is not a valid TriG document.

I find that a bit strange.

== Tokens

It would be useful to split the Turtle grammar more clearly into tokens 
and grammar rules.

Having a set of tokens that can be reused across all Turtle-related 
languages would make for no unexpected surprises for application writers 
(e.g. what's allowed in a prefixed name).

It also means implementers can use one (performance tuned) tokenizer but 
the app writer benefit is more important.

We could add the tokens for variables, keywords, and the symbols "{", 
"}" and so one set of tokens will cover evolutions of N3, N-Triples, 
N-Quads, Turtle, TriG and SPARQL as well as be a possible starting point 
for any other languages of the same style (a rules format; a CSV-like 
results format, or RDF-Tuples; domain specific formats).

This is neutral to decisions of what language futures for named 
graph/graph lityerals/whatever.  It's just establishing the ground work.

The details of prefix names will cause some debate :-)


== Charset

All UTF-8. People do write "UTF-8 N-triples".  This is a change to 
N-Triples and N-Quads that is backwards compatible.

== N-Triples/N-Quads as data

The N-Triples format is designed for testing so
<s> <p> <o> .

could mean IRIs "s" etc, not resolved against the base, so an N-triples 
file can be different from the same bytes as Turtle.

A data-format N-triples / N-Quads would be a subset of Turtle, with the 
same IRI resolution rules and same syntax for IRI tokens.  And in UTF-8.

As these formats are used as dump formats, pinning down details would be 
a help to data publishers and consumers.

 Andy

Received on Monday, 28 February 2011 23:19:30 UTC