Re: looks like it should be turtle Re: Oracle's stand regarding N-TRIPLES

On Mon, Aug 22, 2011 at 1:49 PM, Zhe Wu <alan.wu@oracle.com> wrote:
> Hi Gavin,
>
> I just did a quick test against that
>
> http://id.loc.gov/vocabulary/iso639-1/nn.nt
>
> If we read the file as NTRIPLES, then raptor complains.
>
> raptor2-1.9.0/utils/rapper -i ntriples ./tests/iso639-1-nn.nt -o ntriples >
> /tmp/rapper.nt_readAsNTRIPLES
> lt-rapper: Parsing URI file:///...iso639-1-nn.nt with parser ntriples
> lt-rapper: Serializing with serializer ntriples
> lt-rapper: Error - URI file:///...iso639-1-nn.nt:5 column 101 -
> Non-printable ASCII character 195 (0xC3) found.

Correct, raptor does not implement UTF-8 handling of N-Triples.

> lt-rapper: Parsing returned 16 triples
>
>
> If we read the file as Turtle, everything seems fine.
>
> raptor2-1.9.0/utils/rapper -i turtle ./tests/iso639-1-nn.nt -o ntriples >
> /tmp/rapper.nt_readAsTurtle
> lt-rapper: Parsing URI file:///...iso639-1-nn.nt with parser turtle
> lt-rapper: Serializing with serializer ntriples
> lt-rapper: Parsing returned 76 triples
>
> As far as I can tell, LOC is serving turtle.  That filename is slightly
> confusing.

Nope, the mime type is clearly text/plain and if we look at the HTML
version of that resource http://id.loc.gov/vocabulary/iso639-1/nn.html
we see it naming the link N-Triples.

Of course as you point out an N-Triples (UTF-8) file can be considered
to be a subset of Turtle.

--Gavin

>
> Thanks,
>
> Zhe
>
>
> On 8/22/2011 11:53 AM, Gavin Carothers wrote:
>
> On Mon, Aug 22, 2011 at 11:14 AM, Zhe Wu <alan.wu@oracle.com> wrote:
>
> Hi Pat,
>
> Actually, no. It is just plain better for all but a tiny fraction of human
> readers, anywhere on the planet. This tiny fraction includes some software
> engineers. I personally will simply ignore any string that contains \u
> escapes, and immediately cease using any software that shows them to me. And
> I suspect that more people share my instincts than share yours.
>
> I don't think N-TRIPLES is an end user oriented format. It's originally
> designed for Test cases as pointed out by Jeremy. It
> happens to be used (quite well actually) by large-scale machine to machine
> communication as pointed out by Richard. I would
> dare say that the chance to see \u from a User Interface of a semantic web
> application is very low.
>
> The chances of coming across UTF-8 N-Triples is rather high.
>
> http://id.loc.gov/vocabulary/iso639-1/nn.nt
>
> In fact all of the Library of Congress N-Triple documents are served
> in a perfectly reasonable
>
> Content-type: text/plain; charset=UTF-8
>
> If a vendor expects to work with the LOC Subject Headings or any other
> ontology published by the LOC and wants to use N-Triples they will
> need to support UTF-8.
>
> Cheers,
> Gavin
>
>
>

Received on Monday, 22 August 2011 21:04:57 UTC