Re: URI terminology demystified (I18N details)

At 05:04 PM 9/20/01 +0100, Jeremy Carroll wrote:



> > FWIW, I'm having a separate discussion with Martin Duerst about this issue
> > with respect to CC/PP (an application of RDF);  Martin seems to think the
> > XML system identifier rules should apply to URI values in RDF -- I'm
> > pressing for clarity about why this is so, given that URIs per se cannot
> > contain non-US-ASCII characters.
> >
>
>I think this is a no-brainer from an internationalization point of view.
>
>When a non-English speaker wishes to write a meaningful rdf:about or
>rdf:ID value then they will use non-US ASCII characters.
>
>Since URIs are US ASCII somewhere someone has to do the conversion, and
>the %HH encoding of UTF-8 is the correct conversion to do.
>
>It is necessary for a spec to say who does the conversion, and given
>that RDF/XML is meant to be (barely) end user readable, it should be in
>their language. Hence the RDF/XML processor needs to do the conversion.

I have no problem with that position.  I just don't think it's clear from 
the XML spec (for system identifiers), the XML namespace spec (for 
namespace URIs), or the current RDF spec (for URI-valued attributes, etc.).

I think that when one says a piece of text has URI syntax, and that it may 
also contain non-US-ASCII characters, then the latter has to be stated very 
clearly.  This is not completely at odds with RFC2396, which talks about 
characters -> octets -> URI-character mappings.  But I do think that when 
one talks of URIs in data streams, what is usually meant is a sequence of 
URI-characters;  i.e. the US-ASCII subset used by URIs.

#g


------------------------------------------------------------
Graham Klyne                    MIMEsweeper Group
Strategic Research              <http://www.mimesweeper.com>
<Graham.Klyne@MIMEsweeper.com>
------------------------------------------------------------

Received on Thursday, 20 September 2001 14:01:28 UTC