W3C home > Mailing lists > Public > public-rdf-wg@w3.org > February 2011

[Turtle] Misc initial thoughts

From: Andy Seaborne <andy.seaborne@epimorphics.com>
Date: Mon, 28 Feb 2011 20:36:13 +0000
Message-ID: <4D6C073D.4010008@epimorphics.com>
To: RDF-WG <public-rdf-wg@w3.org>
These are more some useful things to do ... some quite mundane ...

== Relationship of languages

An N-Triples document is a syntactically valid Turtle document.
An N-Triples document is a valid N-Quads document.

A Turtle document is not a valid TriG document.
An N-Quads document is not a valid TriG document.

I find that a bit strange.

== Tokens

It would be useful to split the Turtle grammar more clearly into tokens 
and grammar rules.

Having a set of tokens that can be reused across all Turtle-related 
languages would make for no unexpected surprises for application writers 
(e.g. what's allowed in a prefixed name).

It also means implementers can use one (performance tuned) tokenizer but 
the app writer benefit is more important.

We could add the tokens for variables, keywords, and the symbols "{", 
"}" and so one set of tokens will cover evolutions of N3, N-Triples, 
N-Quads, Turtle, TriG and SPARQL as well as be a possible starting point 
for any other languages of the same style (a rules format; a CSV-like 
results format, or RDF-Tuples; domain specific formats).

This is neutral to decisions of what language futures for named 
graph/graph literals/whatever.  It's just establishing some ground work.

The details of prefixed names will cause some debate :-)

== N-Triples/N-Quads as data

The N-Triples format is designed for testing so
<s> <p> <o> .

could mean IRIs "s" etc, not resolved against the base, so an N-triples 
file can be different from the same bytes as Turtle.

A data-format N-triples / N-Quads would be a subset of Turtle, with the 
same IRI resolution rules and same syntax for IRI tokens.  And in UTF-8.

As these formats are used as dump formats, pinning down details would be 
a help to data publishers and consumers.

A MIME type which is not text/plain would be helpful.

== Charset

All UTF-8. People do write "UTF-8 N-triples".  This is a change to 
N-Triples and N-Quads that is backwards compatible.

(apologies if this is a repeat - I thought I'd sent it  some hours ago 
and nothing has appeared which is usually a bad sign).

Received on Monday, 28 February 2011 20:36:49 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 17:04:02 UTC