Re: URI terminology demystified from Jeremy Carroll on 2001-09-20 (w3c-rdfcore-wg@w3.org from September 2001)

From: Jeremy Carroll <jjc@hplb.hpl.hp.com>
Date: Thu, 20 Sep 2001 14:24:41 +0100
To: w3c-rdfcore-wg@w3.org
Message-ID: <3BA9EE19.BFC38FFB@hplb.hpl.hp.com>

Hmmm, I was just examing the XML specs concerning system identifiers
....

See:

http://www.w3.org/XML/xml-V10-2e-errata#E4

Your quote from the old RDF spec:

Dan Connolly wrote:
> 
>   Note: Although non-ASCII characters in URIs are not allowed by [URI],
> [XML]
>   specifies a convention to avoid unnecessary incompatibilities in
> extended URI
>   syntax. Implementors of RDF are encouraged to avoid further
> incompatibility and
>   use the XML convention for system identifiers. Namely, that a
> non-ASCII character
>   in a URI be represented in UTF-8 as one or more bytes, and then these
> bytes be
>   escaped with the URI escaping mechanism (i.e., by converting each byte
> to %HH,
>   where HH is the hexadecimal notation of the byte value).
> 

This seems to be a misinterpretation of the XML spec, which the erratum
clarifies.
We should, IMO, hence go along with the clarification, and the RDF/XML
processor is responsible for escaping non-permitted characters in
URI-refs.

I also note that this is consistent with our test case:

http://www.w3.org/2000/10/rdf-tests/rdfcore/rdfms-difference-between-ID-and-about/test2.nt

http://www.w3.org/2000/10/rdf-tests/rdfcore/rdfms-difference-between-ID-and-about/test2.rdf

which has not been approved, seems to suggest the following

1: ID's are subject to the same URI encoding rule.
2: N-triple URIs are in US-ASCII and must be already encoded.

These seem like good things.

Dan - do you know about namespace declarations? 
    - are the URIs in Unicode (needing escaping) or US-ASCII?

Jeremy

Received on Thursday, 20 September 2001 09:20:31 UTC