- From: Jeremy Carroll <jeremy@topquadrant.com>
- Date: Wed, 07 Mar 2012 11:19:10 -0800
- To: RDF Working Group WG <public-rdf-wg@w3.org>
- Message-ID: <4F57B4AE.6020704@topquadrant.com>
In the telecon Gavin made the following proposal PROPOSAL N-Triple has content type 'application/ntriples' and uses the content type parameter charset with allowed values of utf-8 or ascii My understanding was that Souri's "+1 to Zhe's -1" indicated that this matter might raise a formal objection. For other reasons this morning I had to track down the Oracle documentation for their RDF support, and I glanced at the N-Triple support. I found: http://docs.oracle.com/cd/E11882_01/appdev.112/e25609/sdo_rdf_concepts.htm#CHDIHAGI that does not make any reference to loading from a URL and hence any change to mimetypes would not be visible, and the threatened formal objection appears ill-founded It does also contain: [[ To load an N-Triple file with a character set different from the default, specify the JVM property |-Dcharset=<charsetName>|. For example, |-Dcharset="UTF-8"| will recognize UTF-8 encoding. However, for UTF-8 characters to be stored properly in the N-Triple file, the Oracle database must be configured to use a corresponding universal character set, such as AL32UTF8. ]] which seems to suggest that Oracle's older software does already support UTF-8 and the arguments made to the WG in 08-31-11 to continue to permit NTriples ascii were perhaps ill-thought out and that the impact on Oracle and Oracle customers of moving to UTF-8 only for N-Triples would be slight. A further piece of documentation that seemed relevant was http://docs.oracle.com/cd/E11882_01/appdev.112/e25609/sem_jena.htm#BGBCHIED At first blush this is more worrying from a backward compatibility point of view since the data is passed to the prepareBulk method as an input stream (i.e. bytes) rather than as decoded chars graph.getBulkUpdateHandler().prepareBulk( is, // input stream "http://example.com", // base URI "RDF/XML", // data file type: can be RDF/XML, N-TRIPLE, etc. "SEMTS", // tablespace null, // flags null, // listener null // staging table name. ); However, judging from the data file type I assume this is going through the Jena RDFReader interface, and the default implementation of the N-Triples reader has the following code: public void read(Model model, InputStream in, String base) { // N-Triples must be in ASCII, we permit UTF-8. read(model, FileUtils.asUTF8(in), base); } i.e. again UTF-8 N-Triples data, when loaded following Oracle's documented procedure, will already work!! So I am curious as to the actual basis for Oracle's dogmatic objection to any change to N-Triples, as opposed to any change that demonstrably negatively impacts their customers Jeremy
Received on Wednesday, 7 March 2012 19:19:34 UTC