Re: character encoding in RDF

On Tue, 04 Nov 2003 08:11:32 -0500 (EST), "Peter F. Patel-Schneider" <pfps@research.bell-labs.com> wrote:

> 
> From: Dave Beckett <dave.beckett@bristol.ac.uk>
> Subject: Re: character encoding in RDF
> Date: Tue, 4 Nov 2003 11:08:31 +0000
> 
> > On Wed, 29 Oct 2003 08:36:01 -0500 (EST), "Peter F. Patel-Schneider" <pfps@research.bell-labs.com> wrote:

<snip/>

> Section 6 of RDF/XML Syntax Specification Revised says that the grammar
> action, ``[t]aken together [...] define a transformation from any
> syntactially well-formed RDF/XML into an RDF graph represented in the
> N-Triples language''.  It is the ``represented in ...'' phrase that causes
> the problems, because it does bring in issues related to the character
> encoding in N-Triples documents.  If the reference to N-Triples was
> removed, then this problem would be eliminated.

I don't see how represented has any overloaded meaning here. It is hard
to produce a mapping that is machine testable without writing down, or
representing, the output stage of the mapping in a syntax.  The rules
for writing down the N-Triples obviously have to be followed since
that's the form we chose, but didn't mandate you implement.

> > > Section 6.1.8 of RDF/XML Syntax Specification Revised states that ``[t]he
> > > double-quoted literal-value accessor value [of a plain literal event] must
> > > use the N-Triples escapes for strings ...''.  Again, this statement, along
> > > with the way that these events are created seems to indicate that URI
> > > references in RDF/XML documents must use the N-Triple character encoding
> > > for Unicode, not any of the more usual encodings, such as UTF-8.
> > 
> > Again, not in creation and there are no content encoding issues
> > involved. Only Unicode strings (from the XML infoset items).  N-Triples
> > is an output form only, in order to describe the test cases and grammar
> > and not required to implement.
> 
> There are definitely content encoding issues involved.  The grammar actions
> are supposed to emit N-Triples, which brings content encoding issues to the
> fore.  If grammar actions were of the form

The grammar actions emit RDF triples written down in N-Triples.

> 	Add a triple with subject ..., predicate ..., and object ... to the
> 	graph. 
> 
> instead of
> 
> 	... the following statement is added to the graph:
> 	... ... ... .

I don't see that as needing changing (unless replacing statment with
triple, editorial). The context is clear and the document has already
explained the relationship between the XML syntax, triples, the graph
and N-Triples and made several references to it for each event
definition.

> then there would not be any issues of content encoding.
> 
> > > Similar problems occur with Attribute Events.
> > > 
> > > Similar problems occur with Typed Literal Events and Plain Literal Events,
> > > indicating that typed literals and plain literals must be written in
> > > RDF/XML documents using the N-Triple character encoding for Unicode.
> > 
> > I don't follow how you conclude there are problems in any of these sections.
> > 
> > Taking URI reference events as an example.   These are constructed from
> > a string value (a Unicode string) used as an RDF reference, the definition
> > of which and limitations on the characters allowed are all defined in
> > RDF Concepts, linked when that event is first defined.
> 
> Agreed.
> 
> > When those events are written out as N-Triples, they clearly have to
> > conform to the N-Triples syntax rules, but that is solely a way to write
> > the Unicode string in N-Triples, it does not limit in any way the range
> > of characters in an RDF URI reference.  RDF Concepts defines that, and
> > RDF Concepts does not depend on N-Triples.
> 
> I agree that they have to conform to the N-Triples syntax rules, and this
> is the problem that I see.  The grammar actions directly place Unicode
> strings, for example Unicode strings that are part of Plain Literal Events,
> into the N-Triples document, without any possibility of encoding.  This
> means that this string must be in the form required by N-Triples, which is
> the problem that I have seen.

The document mentions that the string-value must use the N-Triples encoding
so this point is already covered in
  http://www.w3.org/TR/2003/WD-rdf-syntax-grammar-20031010/#section-literal-node

The Unicode strings (sequences of Unicode characters) are not directly
put into the N-Triples document but using N-Triples encoding, which is
linked directly at the URL above and all other events with string-value.

The table it points to in RDF Test Cases section 3.2 was added in the
last version after a previous comment and suggestion from you, providing
a straightforward description of the Unicode character to N-Triples encoding.

> > Similarly for the other events.  The RDF Concepts terms when written in
> > N-Triples do not limit the alphabets of the terms.
> > 
> > > I suggest that the wording in question should be changed to something like:
> > > 
> > > 	... encodes the same Unicode character string as ... but using the
> > > 	string encoding in N-Triples ...
> > 
> > At present I think I don't understand your problem.  
> 
> The problem is that there is no place in the grammar actions for the
> encoding used by N-Triples.  In the absence of this transformation, the
> character encoding used by N-Triples is pushed back into the RDF/XML document.

I still don't see a problem.  Only if you are writing N-Triples (and
this is optional, as section 6 introduction describes) then you need to
consider the N-Triples encoding; otherwise you can generate the triples
inside your application without dealing with such details.  The RDF/XML
WD defines a mapping where the output triples are encoded in N-Triples. 
It does not mandate that you implement the mapping to N-Triples, or use
N-Triples encodings:

  "The model given here illustrates one way to create a representation of
  an RDF Graph from an RDF/XML document. It does not mandate any
  implementation method -- any other method that results in a
  representation of the same RDF Graph may be use"
  -- http://www.w3.org/TR/2003/WD-rdf-syntax-grammar-20031010/#section-Data-Model

but I'm sure you are familiar with that.

In terms of implemenations, as far as I'm aware, this is what most of
them do, they do not write N-Triples; the output of the mapping is
always some other result form (software object typically).

> > I'm also not sure where you are proposing wording change; I can't see
> > that in any of the sections you mention.  Do you mean the abstract? I
> > would think that isn't required to give the fine detail of the document,
> > which this might be.
> 
> I meant the various bits of the document that I quoted.
> 
> > Dave
> 
> On further reflection, it would be better to change the grammar actions as
> shown above.  This might be too big of a change at this stage, so I would
> be satisfied with changes to the various bits of Section 6 having to do
> with string-value accessors.

Those sections already tell you to use the N-Triples encoding.
Take URI Reference Event, for example It says:
[[
string-value

    The value is the concatenation of "<", the value of the identifier accessor and ">"

    The <>-quoted identifier accessor value must use the N-Triples
    escapes for URI references as described in 3.3 URI References. 
]] -- http://www.w3.org/TR/2003/WD-rdf-syntax-grammar-20031010/#section-identifier-node

Which tells you how to turn the identifier accessor value into an
encoded N-Triples URI reference for output purposes.  There is no direct
copying of Uncode strings into N-Triples without encoding. The other
events have similar words and links.

At this point, the only change I see here is an editorial one to change
'statement' to 'triple' in the grammar action descriptions which would
probably be more accurate.  

It could be "the following triple encoded in N-Triples is added to the
RDF graph" but that's a mouthful and already covered by the earlier
definition of the actions:

"The grammar action may include generating new triples to the graph, written in N-Triples format."
-- http://www.w3.org/TR/2003/WD-rdf-syntax-grammar-20031010/#section-Infoset-Grammar-Notation

Dave

Received on Tuesday, 4 November 2003 09:39:38 UTC