- From: Jeremy Carroll <jeremy@topquadrant.com>
- Date: Wed, 07 Mar 2012 11:19:10 -0800
- To: RDF Working Group WG <public-rdf-wg@w3.org>
- Message-ID: <4F57B4AE.6020704@topquadrant.com>
In the telecon Gavin made the following proposal
PROPOSAL N-Triple has content type 'application/ntriples' and uses the
content type parameter charset with allowed values of utf-8 or ascii
My understanding was that Souri's "+1 to Zhe's -1" indicated that this
matter might raise a formal objection.
For other reasons this morning I had to track down the Oracle
documentation for their RDF support, and I glanced at the N-Triple support.
I found:
http://docs.oracle.com/cd/E11882_01/appdev.112/e25609/sdo_rdf_concepts.htm#CHDIHAGI
that does not make any reference to loading from a URL and hence any
change to mimetypes would not be visible, and the threatened formal
objection appears ill-founded
It does also contain:
[[
To load an N-Triple file with a character set different from the
default, specify the JVM property |-Dcharset=<charsetName>|. For
example, |-Dcharset="UTF-8"| will recognize UTF-8 encoding. However, for
UTF-8 characters to be stored properly in the N-Triple file, the Oracle
database must be configured to use a corresponding universal character
set, such as AL32UTF8.
]]
which seems to suggest that Oracle's older software does already support
UTF-8 and the arguments made to the WG in 08-31-11 to continue to permit
NTriples ascii were perhaps ill-thought out and that the impact on
Oracle and Oracle customers of moving to UTF-8 only for N-Triples would
be slight.
A further piece of documentation that seemed relevant was
http://docs.oracle.com/cd/E11882_01/appdev.112/e25609/sem_jena.htm#BGBCHIED
At first blush this is more worrying from a backward compatibility point
of view since the data is passed to the prepareBulk method as an input
stream (i.e. bytes) rather than as decoded chars
graph.getBulkUpdateHandler().prepareBulk(
is, // input stream
"http://example.com", // base URI
"RDF/XML", // data file type: can be RDF/XML, N-TRIPLE, etc.
"SEMTS", // tablespace
null, // flags
null, // listener
null // staging table name.
);
However, judging from the data file type I assume this is going through
the Jena RDFReader interface, and the default implementation of the
N-Triples reader has the following code:
public void read(Model model, InputStream in, String base)
{
// N-Triples must be in ASCII, we permit UTF-8.
read(model, FileUtils.asUTF8(in), base);
}
i.e. again UTF-8 N-Triples data, when loaded following Oracle's
documented procedure, will already work!!
So I am curious as to the actual basis for Oracle's dogmatic objection
to any change to N-Triples, as opposed to any change that demonstrably
negatively impacts their customers
Jeremy
Received on Wednesday, 7 March 2012 19:19:34 UTC