on Gavin's proposal concerning NTriples mimetype from Jeremy Carroll on 2012-03-07 (public-rdf-wg@w3.org from March 2012)

From: Jeremy Carroll <jeremy@topquadrant.com>
Date: Wed, 07 Mar 2012 11:19:10 -0800
To: RDF Working Group WG <public-rdf-wg@w3.org>
Message-ID: <4F57B4AE.6020704@topquadrant.com>

In the telecon Gavin made the following proposal

PROPOSAL N-Triple has content type 'application/ntriples' and uses the 
content type parameter charset with allowed values of utf-8 or ascii

My understanding was that Souri's "+1 to Zhe's -1" indicated that this 
matter might raise a formal objection.

For other reasons this morning I had to track down the Oracle 
documentation for their RDF support, and I glanced at the N-Triple support.

I found:
http://docs.oracle.com/cd/E11882_01/appdev.112/e25609/sdo_rdf_concepts.htm#CHDIHAGI

that does not make any reference to loading from a URL and hence any 
change to mimetypes would not be visible, and the threatened formal 
objection appears ill-founded


It does also contain:
[[
To load an N-Triple file with a character set different from the 
default, specify the JVM property |-Dcharset=<charsetName>|. For 
example, |-Dcharset="UTF-8"| will recognize UTF-8 encoding. However, for 
UTF-8 characters to be stored properly in the N-Triple file, the Oracle 
database must be configured to use a corresponding universal character 
set, such as AL32UTF8.
]]

which seems to suggest that Oracle's older software does already support 
UTF-8 and the arguments made to the WG in 08-31-11 to continue to permit 
NTriples ascii were perhaps ill-thought out and that the impact on 
Oracle and Oracle customers of moving to UTF-8 only for N-Triples would 
be slight.

A further piece of documentation that seemed relevant was

http://docs.oracle.com/cd/E11882_01/appdev.112/e25609/sem_jena.htm#BGBCHIED

At first blush this is more worrying from a backward compatibility point 
of view since the data is passed to the prepareBulk method as an input 
stream (i.e. bytes) rather than as decoded chars

graph.getBulkUpdateHandler().prepareBulk(
         is,                    // input stream
         "http://example.com",  // base URI
         "RDF/XML",             // data file type: can be RDF/XML, N-TRIPLE, etc.
         "SEMTS",               // tablespace
         null,                  // flags
         null,                  // listener
         null                   // staging table name.
         );


However, judging from the data file type I assume this is going through 
the Jena RDFReader interface, and the default implementation of the 
N-Triples reader has the following code:

      public void read(Model model, InputStream in, String base)
          {
         // N-Triples must be in ASCII, we permit UTF-8.
         read(model, FileUtils.asUTF8(in), base);
     }

i.e. again UTF-8 N-Triples data, when loaded following Oracle's 
documented procedure, will already work!!

So I am curious as to the actual basis for Oracle's dogmatic objection 
to any change to N-Triples, as opposed to any change that demonstrably 
negatively impacts their customers

Jeremy

Received on Wednesday, 7 March 2012 19:19:34 UTC