Re: character encoding in RDF

On Wed, 29 Oct 2003 08:36:01 -0500 (EST), "Peter F. Patel-Schneider" <pfps@research.bell-labs.com> wrote:

> 
> In the abstract of RDF/XML Syntax Specification Revised (W3C Working Draft
> 10 October 2003) it is stated that the actions generate ``triples of the
> RDF graph as defined in RDF Concepts and Abstract Syntax'' ``written using
> the N-Triples RDF graph serializing format'' defined in RDF Test Cases.
> 
> In RDF Test Cases, Section 3.1, it is stated that ``[a]n N-Triples document
> is a sequence of US-ASCII characters''.  In Section 3, it is further
> specified that N-Triples documents are to be encoded as ``7-bit US-ASCII''.
> It is further specified in Section 3.1 that the only allowable characters
> in absoluteURIs and strings are the characters represented by code points
> from decimal 32 to decimal 126.  Characters outside of this range (and a
> few withing it) are encoded using a non-standard encoding.  
> 
> However, the strings allowed in RDF/XML documents are defined from Unicode
> strings.  This leads to a number of problems.
> 
> Section 6.1.6 of RDF/XML Syntax Specification Revised states that ``[t]he
> <>-quoted identifier accessor value [of a URI Reference Event] must use the
> N-Triple escapes for URI references ...''.  This statement, along with the
> way that these events are created seems to indicate that URI references in
> RDF/XML documents must use the N-Triple character encoding for Unicode, not
> any of the more usual encodings, such as UTF-8.

RDF/XML is defined on the syntax data model events which are created
from Unicode strings.  The string-value accessors are for outputing the
events as strings, as N-Triples, which does not how the events were
created, from the XML input.

> Section 6.1.8 of RDF/XML Syntax Specification Revised states that ``[t]he
> double-quoted literal-value accessor value [of a plain literal event] must
> use the N-Triples escapes for strings ...''.  Again, this statement, along
> with the way that these events are created seems to indicate that URI
> references in RDF/XML documents must use the N-Triple character encoding
> for Unicode, not any of the more usual encodings, such as UTF-8.

Again, not in creation and there are no content encoding issues
involved. Only Unicode strings (from the XML infoset items).  N-Triples
is an output form only, in order to describe the test cases and grammar
and not required to implement.

> Similar problems occur with Attribute Events.
> 
> Similar problems occur with Typed Literal Events and Plain Literal Events,
> indicating that typed literals and plain literals must be written in
> RDF/XML documents using the N-Triple character encoding for Unicode.

I don't follow how you conclude there are problems in any of these sections.

Taking URI reference events as an example.   These are constructed from
a string value (a Unicode string) used as an RDF reference, the definition
of which and limitations on the characters allowed are all defined in
RDF Concepts, linked when that event is first defined.

When those events are written out as N-Triples, they clearly have to
conform to the N-Triples syntax rules, but that is solely a way to write
the Unicode string in N-Triples, it does not limit in any way the range
of characters in an RDF URI reference.  RDF Concepts defines that, and
RDF Concepts does not depend on N-Triples.

Similarly for the other events.  The RDF Concepts terms when written in
N-Triples do not limit the alphabets of the terms.

> I suggest that the wording in question should be changed to something like:
> 
> 	... encodes the same Unicode character string as ... but using the
> 	string encoding in N-Triples ...

At present I think I don't understand your problem.  

I'm also not sure where you are proposing wording change; I can't see
that in any of the sections you mention.  Do you mean the abstract? I
would think that isn't required to give the fine detail of the document,
which this might be.

Dave

Received on Tuesday, 4 November 2003 06:12:32 UTC