W3C home > Mailing lists > Public > public-rdf-wg@w3.org > March 2012

Re: on Gavin's proposal concerning NTriples mimetype

From: Ivan Herman <ivan@w3.org>
Date: Thu, 8 Mar 2012 09:43:16 +0100
Cc: RDF Working Group WG <public-rdf-wg@w3.org>
Message-Id: <2A46FC7F-2194-4A1F-B786-D4BF77AAC458@w3.org>
To: Jeremy Carroll <jeremy@topquadrant.com>

just a tiny addition/modification. If, instead of application/ntriples, we use text/ntriples, then the default encoding is ASCII. In other word, an ntriple file that will sent back to the consumer will ex officio refer to the current, ie, ASCII version of ntriples; I would expect that any software using ntriples today will work out of the box. To use UTF-8, the file has to be sent with

text/ntriples; charset=utf-8

to use, well, utf-8. This is in contrast to the application/* tree where the default is utf-8, ie, the sender has to set ascii explicitly.

Ie, using the text/ntriples media type the transition to a newer version of ntriples is even smoother...


On Mar 7, 2012, at 20:19 , Jeremy Carroll wrote:

> In the telecon Gavin made the following proposal
> PROPOSAL N-Triple has content type 'application/ntriples' and uses the content type parameter charset with allowed values of utf-8 or ascii
> My understanding was that Souri's "+1 to Zhe's -1" indicated that this matter might raise a formal objection.
> For other reasons this morning I had to track down the Oracle documentation for their RDF support, and I glanced at the N-Triple support.
> I found:
> http://docs.oracle.com/cd/E11882_01/appdev.112/e25609/sdo_rdf_concepts.htm#CHDIHAGI
> that does not make any reference to loading from a URL and hence any change to mimetypes would not be visible, and the threatened formal objection appears ill-founded
> It does also contain:
> [[
> To load an N-Triple file with a character set different from the default, specify the JVM property -Dcharset=<charsetName>. For example, -Dcharset="UTF-8" will recognize UTF-8 encoding. However, for UTF-8 characters to be stored properly in the N-Triple file, the Oracle database must be configured to use a corresponding universal character set, such as AL32UTF8.
> ]]
> which seems to suggest that Oracle's older software does already support UTF-8 and the arguments made to the WG in 08-31-11 to continue to permit NTriples ascii were perhaps ill-thought out and that the impact on Oracle and Oracle customers of moving to UTF-8 only for N-Triples would be slight.
> A further piece of documentation that seemed relevant was
> http://docs.oracle.com/cd/E11882_01/appdev.112/e25609/sem_jena.htm#BGBCHIED
> At first blush this is more worrying from a backward compatibility point of view since the data is passed to the prepareBulk method as an input stream (i.e. bytes) rather than as decoded chars
> graph.getBulkUpdateHandler().prepareBulk(
>         is,                    // input stream
> "http://example.com"
> ,  // base URI
>         "RDF/XML",             // data file type: can be RDF/XML, N-TRIPLE, etc.
>         "SEMTS",               // tablespace
>         null,                  // flags
>         null,                  // listener
>         null                   // staging table name.
>         );
> However, judging from the data file type I assume this is going through the Jena RDFReader interface, and the default implementation of the N-Triples reader has the following code:
>      public void read(Model model, InputStream in, String base)
>          {
>         // N-Triples must be in ASCII, we permit UTF-8.
>         read(model, FileUtils.asUTF8(in), base);
>     }
> i.e. again UTF-8 N-Triples data, when loaded following Oracle's documented procedure, will already work!!
> So I am curious as to the actual basis for Oracle's dogmatic objection to any change to N-Triples, as opposed to any change that demonstrably negatively impacts their customers
> Jeremy

Ivan Herman, W3C Semantic Web Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
FOAF: http://www.ivan-herman.net/foaf.rdf

Received on Thursday, 8 March 2012 08:43:02 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 17:04:12 UTC