character encoding in RDF from Peter F. Patel-Schneider on 2003-10-29 (www-rdf-comments@w3.org from October to December 2003)

From: Peter F. Patel-Schneider <pfps@research.bell-labs.com>
Date: Wed, 29 Oct 2003 08:36:01 -0500 (EST)
To: www-rdf-comments@w3.org
Message-Id: <20031029.083601.04936209.pfps@research.bell-labs.com>

In the abstract of RDF/XML Syntax Specification Revised (W3C Working Draft
10 October 2003) it is stated that the actions generate ``triples of the
RDF graph as defined in RDF Concepts and Abstract Syntax'' ``written using
the N-Triples RDF graph serializing format'' defined in RDF Test Cases.

In RDF Test Cases, Section 3.1, it is stated that ``[a]n N-Triples document
is a sequence of US-ASCII characters''.  In Section 3, it is further
specified that N-Triples documents are to be encoded as ``7-bit US-ASCII''.
It is further specified in Section 3.1 that the only allowable characters
in absoluteURIs and strings are the characters represented by code points
from decimal 32 to decimal 126.  Characters outside of this range (and a
few withing it) are encoded using a non-standard encoding.  

However, the strings allowed in RDF/XML documents are defined from Unicode
strings.  This leads to a number of problems.

Section 6.1.6 of RDF/XML Syntax Specification Revised states that ``[t]he
<>-quoted identifier accessor value [of a URI Reference Event] must use the
N-Triple escapes for URI references ...''.  This statement, along with the
way that these events are created seems to indicate that URI references in
RDF/XML documents must use the N-Triple character encoding for Unicode, not
any of the more usual encodings, such as UTF-8.

Section 6.1.8 of RDF/XML Syntax Specification Revised states that ``[t]he
double-quoted literal-value accessor value [of a plain literal event] must
use the N-Triples escapes for strings ...''.  Again, this statement, along
with the way that these events are created seems to indicate that URI
references in RDF/XML documents must use the N-Triple character encoding
for Unicode, not any of the more usual encodings, such as UTF-8.

Similar problems occur with Attribute Events.

Similar problems occur with Typed Literal Events and Plain Literal Events,
indicating that typed literals and plain literals must be written in
RDF/XML documents using the N-Triple character encoding for Unicode.

I suggest that the wording in question should be changed to something like:

	... encodes the same Unicode character string as ... but using the
	string encoding in N-Triples ...

Peter F. Patel-Schneider

Received on Wednesday, 29 October 2003 08:39:22 UTC