Re: character encoding in RDF (including some new related issues)

On Thu, 06 Nov 2003 08:59:05 -0500 (EST), "Peter F. Patel-Schneider" <pfps@research.bell-labs.com> wrote:

> From: Dave Beckett <dave.beckett@bristol.ac.uk>
> Subject: Re: character encoding in RDF
> Date: Thu, 6 Nov 2003 10:42:33 +0000
> 
> > It has been suggested off list that you might be satisfied with the editorial
> > changes suggesed by Jeremy Carroll in
> >   http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2003Nov/0006.html
> 
> I view these changes as a variation of the changes I suggested in my
> initial message on this topic.  These changes do indeed capture the intent
> of the situation, as opposed to the wording in the current document.
> 
....

> These changes would indeed provide an acceptable disposition, provided that
> they are made in all the appropriate places.  I identified Section 6.1.6,
> 6.1.7, 6.1.8, and 6.1.9 in my initial message; Jeremy only proposes three
> changes, not including the one for blank node identifiers.  This difference
> indicates that there should be another effort to identify all the places
> where this sort of change needs to be made.

So if we change 6.1.6, 6.1.8 and 6.1.9 as Jeremy outlines that's part
of an answer - read further on for more.

We changed the 6.1.7 blank node description from your comments in
earlier WDs and I haven't seen you mention it in this thread.  I'm
not proposing any changes there since it already says how the entire
value MUST match an N-Triples production.


> Upon further analysis, I note that the URI and string-value for attribute
> events as well as the URI for element events can be placed directly in a
> triple (as in Section 7.2.11) and so need a similar treatment.  Any grammar
> action that has a <...> in it probably suffers from this problem.
> 
> However, the string-value of attribute events is used in the sections
> above, so just making a variation of Jeremy's proposed change is
> insufficient, as it would end up specifying double escaping.  My proposed
> change would be somewhat better at avoiding double escaping, but it still
> could be read as requiring double escaping.

Yes, that seems something we should fix.

I think the best way to do this would be to as you suggest, remove all
  <X.URI> <X.string-value>
in N-Triples actions for X=e, a as elements and attributes and to
create new accessors for both the element and attribute events
when used to make URI strings for N-Triples (similar to 6.1.6
URI Reference Event)


So, this would add

[[
  URI-string-value

The value is the concatenation of the following in this order "<",
the escaped value of the *URI* accessor and ">".

The escaping of the *URI* accessor uses the N-Triples escapes for
URI references as described in 3.3 URI References.
]]

to both 6.1.2 Element Event and 6.1.4 Attribute Event.

Read on for further changes


> Also, I believe that the treatment in the second actions of Section 7.2.11
> and Section 7.2.21 are insufficient, as they neither check that the type
> URI is in the form required of a URI in an RDF Graph nor do any escaping.
> I expect that using a URI Reference Event as an intermediary would both
> solve all of these problems as well as part of the problem above.


After the above changes, these could be the consequent changes:

7.2.11 
[[ 
If there is an attribute a in propertyAttr with a.URI == rdf:type
then u:=uri(identifier:=resolve(a.string-value))
and the following triple is added to the graph:

e.subject.string-value <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> u.string-value . 

]]

7.2.21
[[
If a.URI == rdf:type then u:=uri(identifier:=resolve(a.string-value))
and the following triple is added to the graph:

r.string-value <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> u.string-value .

]]


Looking at other changes needed from X.URI to X.URI-string-value
(anywhere "<"..">" appears in the grammar action without a
hardcoded URI reference would be changed)
  7.2.11 <e.URI> and <a.URI>
  7.2.15 <e.URI>
  7.2.16 <e.URI>
  7.2.17 <e.URI>
  7.2.18 <e.URI>
  7.2.19 <e.URI> twice
  7.2.21 <e.URI> twice, <a.URI> once


> Further, the wording in 7.2.32 is rather suspect.  What does it mean for a
> string to represent an RDF URI reference?  

Amusingly, those words are from the original RDF M&S BNF, updated for
later notation changes and it might be they aren't needed.

The choices I see are
 1 remove the URI-reference term, replacing with string where it was used
 2 changing the wording to just say "An RDF URI Reference"
 3 changing the wording to just say "A Unicode string"

I'm favouring #2 since it is handy to see where in the grammar where
we know RDF URI references appear and we already enforce elsewhere
(in URI Reference Event 6.1.6) that those Unicode strings must be RDF
URI References.


> I also worry about the details of espacing in URI references in RDF/XML.
> My understanding is that URI references are supposed to be in escaped form,
> and that downstream applications are not supposed to perform escaping,
> except perhaps for the escaping for non-ASCII Unicode in IRIs.  I think
> that RDF/XML takes a different and inconsistent stance on this, sometimes
> allowing the escaping of certain ASCII characters when they appear in
> RDF/XML.
> 
> To illustrate this point
> 
> 	http://www.w3.org/foo{bar}
> 
> is not a legal URI (or IRI).  However, it is a legal RDF URI reference,
> because it is a Unicode string that turns into a legal absolute URI with
> optional fragment identifier when subject to the encoding in Section 6.4 of
> RDF Concepts.

I think the above changes mean that all URIs in RDF/XML will either
pass through the URI Reference Event - and are thus required to be RDF
URI references - or are hard coded RDF URI references such as
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>

Can you give an RDF/XML example that demonstrates otherwise?


> I note that various ``3.3 URI References'' pointers are to another document
> and thus should probably be in a different form.  Besides which, the
> relevant section (in RDF Tests) is mostly a pointer to another place, which
> sould probably be referred to directly.

I'd like to keep that pointer to N-Triples URI references encoding
since there are other editorial changes I want to make at 3.3, not
relevant to this discussion.

> 
> > Thanks
> > 
> > Dave
> 
> I await a revised, fully-worked-out proposal for the actual changes.

You've raised some more things each time for us to answer so you'll
have to let me know.

Dave

Received on Thursday, 6 November 2003 10:28:39 UTC