How to distinguish among Turtle family members; was Re: TriG being disjoint from Turtle from Sandro Hawke on 2013-05-21 (public-rdf-comments@w3.org from May 2013)

From: Sandro Hawke <sandro@w3.org>
Date: Tue, 21 May 2013 13:10:35 -0400
To: Jan Wielemaker <J.Wielemaker@vu.nl>
CC: Gavin Carothers <gavin@carothers.name>, Andy Seaborne <andy.seaborne@epimorphics.com>, "public-rdf-comments@w3.org" <public-rdf-comments@w3.org>
Message-ID: <519BAA8B.9090708@w3.org>

On 05/17/2013 11:36 AM, Jan Wielemaker wrote:
> On 05/17/2013 05:12 PM, Gavin Carothers wrote:
>>
>> On Fri, May 17, 2013 at 8:02 AM, Jan Wielemaker <J.Wielemaker@vu.nl
>> <mailto:J.Wielemaker@vu.nl>> wrote:
>>
>>     I'm sure this will eventually sort itself out as the old versions of
>>     these formats die away and everybody complies to the latest 
>> standard.
>>     That might take a while though :-( Also, nobody says this is the 
>> last
>>     revision of RDF serialization syntax.
>>
>>
>>
>> The goal of this Turtle standardization effort was to NOT change the
>> parsing of any existing (non pathological) Turtle document. If you are
>> aware of any changes we made that do change existing Turtle data please
>> tell us. All existing Turtle documents should parse to exactly the same
>> RDF Graph (with the exception of changes in RDF Concepts 1.1, such as
>> plain literals becoming xsd:strings). Parsers need updating to deal with
>> interop issues, but documents and data shouldn't.
>
> The only case I came across with while running my new parser on the
> (very) old test cases was (test-28.ttl, #-comment by me).
>
> <http://example.org/foo>
>     <http://example.org/bar>
>         2.345,
>         1,
>         1.0,
> #       1.,                     (no longer valid)
>         1.000000000,
>     ...
>
> Whether that is pathological or not is a bit of a border case I'd say.
>
> Otherwise, I think you are right validity of data wrt. versions. What
> remains are two things. Firstly, if the file has a version that is newer
> than what my parser supports I'd much rather tell this right away than
> generating hard to understand error messages. Secondly, we have a lot of
> different formats, most of which produce triples and some quads (with
> multiple graphs) and they all look alike. What parser do I take if the
> extension/mimetype is lacking/wrong/lost? If that is the most generic
> (TriG/quads), I'm beginning to wonder why we have all these other ones
> ...

My practical advice (and what I expect to do myself) is to use a parser 
that handles the superset of turtle, trig, n-triples, n-quads, and maybe 
the obvious subset of SPARQL.   (That's what I hoped TriG would be.)

For output of graphs I'd use a conservative version of turtle (space 
before period, not many chars allowed in pnames), and for datasets I'm 
not sure but I'll probably use N-Quads, unless TriG gets better. Or I'll 
make up something new and hope it catches on.  :)

(I'm also thinking about using reification to fit datasets into graphs, 
but I wouldn't dare mention that in email like this.)

Would that work for you, or do you want to be validating -- giving 
people warnings about data that's not allowed by some spec?

      -- Sandro

>
>     Cheers --- Jan
>
> P.s.    The most common issue I've come accross with is about
>     handling %XX in RDF data.  I think the standards are
>     clear, but the daily experience is no fun :-(
>
>

Received on Tuesday, 21 May 2013 17:10:53 UTC