W3C home > Mailing lists > Public > w3c-rdfcore-wg@w3.org > September 2001

Re: URI terminology demystified (I18N details)

From: Graham Klyne <Graham.Klyne@MIMEsweeper.com>
Date: Thu, 20 Sep 2001 15:33:19 +0100
Message-Id: <5.1.0.14.2.20010920153007.038f1bb0@joy.songbird.com>
To: Dan Connolly <connolly@w3.org>
Cc: Jeremy Carroll <jjc@hplb.hpl.hp.com>, w3c-rdfcore-wg@w3.org
FWIW, I'm having a separate discussion with Martin Duerst about this issue 
with respect to CC/PP (an application of RDF);  Martin seems to think the 
XML system identifier rules should apply to URI values in RDF -- I'm 
pressing for clarity about why this is so, given that URIs per se cannot 
contain non-US-ASCII characters.

(I think part of the motivation is to prepare the ground for deployment of 
IRIs.)

I had been planning to report the outcome of the discussion back to this 
group, as it relates to some wording that exists in the existing RDF spec.

#g
--

At 09:11 AM 9/20/01 -0500, Dan Connolly wrote:
>Jeremy Carroll wrote:
> >
> > Hmmm, I was just examing the XML specs concerning system identifiers
> > ....
> >
> > See:
> >
> > http://www.w3.org/XML/xml-V10-2e-errata#E4
> >
> > Your quote from the old RDF spec:
> >
> > Dan Connolly wrote:
> > >
> > >   Note: Although non-ASCII characters in URIs are not allowed by [URI],
> > > [XML]
> > >   specifies a convention to avoid unnecessary incompatibilities in
> > > extended URI
> > >   syntax. Implementors of RDF are encouraged to avoid further
> > > incompatibility and
> > >   use the XML convention for system identifiers. Namely, that a
> > > non-ASCII character
> > >   in a URI be represented in UTF-8 as one or more bytes, and then these
> > > bytes be
> > >   escaped with the URI escaping mechanism (i.e., by converting each byte
> > > to %HH,
> > >   where HH is the hexadecimal notation of the byte value).
> > >
> >
> > This seems to be a misinterpretation of the XML spec, which the erratum
> > clarifies.
>
>Strictly speaking, it's not; system identifiers only occur
>in things like <!ENTITY ...> delcarations. The value of
>an rdf:resource attribute isn't a system identifier (unless
>we change RDF 1.0 to say that it is for some reason).
>
>
> > We should, IMO, hence go along with the clarification, and the RDF/XML
> > processor is responsible for escaping non-permitted characters in
> > URI-refs.
>
>It's not XML 1.0 that compells us to go with the
>Unicode->URI escaping in resource/about/ID,
>but the history of HTML 4.0 href, the text from RDF 1.0
>excerpted above, the precedent of the XLink REC (xlink:href),
>and the recent opinion of the I18N WG expressed
>in the charmod spec.
>
> > I also note that this is consistent with our test case:
> >
> > 
> http://www.w3.org/2000/10/rdf-tests/rdfcore/rdfms-difference-between-ID-and-about/test2.nt
> >
> > 
> http://www.w3.org/2000/10/rdf-tests/rdfcore/rdfms-difference-between-ID-and-about/test2.rdf
> >
> > which has not been approved, seems to suggest the following
> >
> > 1: ID's are subject to the same URI encoding rule.
>
>Yup. (that is: values of rdf:ID attributes.)
>
> > 2: N-triple URIs are in US-ASCII and must be already encoded.
>
>Yes; to be crystal clear: All URIs are in US-ASCII.
>URIs appear in N-triple syntax as-is, with no further encoding.
>
> > These seem like good things.
>
>Agreed.
>
> > Dan - do you know about namespace declarations?
> >     - are the URIs in Unicode (needing escaping) or US-ASCII?
>
>I think namespace declarations must use URI references as-is;
>i.e. you're not allowed to put non-uri characters in them.
>This follows from
>         (a) a literal reading of the namespaces REC,
>         which says that the value of an xmlns attribute
>         is a namespace name and a namespace name *is* URI references
>         (not that they can be decoded into URI references).
>         Nobody has suggested changing/clarifying this
>         aspect of the namespace spec, to my knowledge.
>
>         (b) my own observation that the XML infrastructure
>         treats namespace names as plain old strings, and
>         never decodes or otherwise mangles them (other
>         than normal XML attribute value literal interpretation).
>
>It's at least worth a health-warning to say "if you
>put non-URI characters in your namespace names, LOOK OUT!
>We know of no software that's going to help you!"
>
>And it's worth a test case or two. Care to cook some up?
>
>--
>Dan Connolly, W3C http://www.w3.org/People/Connolly/

------------------------------------------------------------
Graham Klyne                    MIMEsweeper Group
Strategic Research              <http://www.mimesweeper.com>
<Graham.Klyne@MIMEsweeper.com>
------------------------------------------------------------
Received on Thursday, 20 September 2001 10:41:07 EDT

This archive was generated by hypermail pre-2.1.9 : Wednesday, 3 September 2003 09:39:48 EDT