Turtle datatypes preceding literals

A few situations I have encountered have me thinking that datatypes
always following literals in Turtle is an unfortunate limitation.  The
problem is, in general, you don't know what type of data you are reading
while you are reading it.

Examples of this problem are:

* Editors are unable to syntax highlight literals, e.g. XML literals in
Turtle documents.  If the type came first, this would be possible, and
make editing such documents much nicer.

* While parsing literals that must themselves be parsed, the
"sub-parsing" can not start until the entire literal is read.  For
example, if you have xml:base64Binary data in a literal, often you would
want to translate that as you go.  Currently, you have to read the
entire literal before doing so.  If the type came first, you could do
this in a streaming fashion.  For big literals, this is very significant
(for really big literals in DB dumps or something it could even be a
show-stopper).  It also has security and performance implications for
servers that accept data from clients.

There are lots of examples, particularly if you are streaming data.
Because of these kinds of issues, it would be very nice if the datatype
could come before the literal in Turtle.

As for syntax, after batting the idea around in #swig for a bit, I think
the best syntax to do so would be adding an "as" keyword, so, for
example:

<person> eg:bio "<p>...</p>"^^xsd:XMLLiteral .

becomes:

<person> eg:bio as xsd:XMLLiteral "<p>...</p>" .

Timbl proposed using an "of" sort of like that from N3, like:

<person> eg:bio xsd:XMLLiteral of "<p>...</p>" .

but I don't think there is a way of defining the semantics for "of" that
work in both these cases within RDF (in N3 it deals with predicates and
objects, not datatypes and literals), so using a new term is better.
"as" reads more naturally in this case anyway.

Trying to use any available punctuation characters is much uglier IMO,
keyword is best.  Suggestions welcome in any case.  Naturally the
existing syntax would remain.

I realize proposing any Turtle addition is an uphill battle, but it
seems worth some thought.  The specific syntax is not so important, but
I think the ability to specify datatypes before literals would be a
considerable improvement, for deeper reasons than mere syntactic sugar.

Thoughts?

-dr

Received on Monday, 30 January 2012 00:31:35 UTC