Are system identifiers already IRIs? from Norman Walsh on 2007-07-16 (public-xml-core-wg@w3.org from July 2007)

From: Norman Walsh <ndw@nwalsh.com>
Date: Mon, 16 Jul 2007 15:32:22 -0400
To: public-xml-core-wg@w3.org
Message-ID: <87y7hgywjd.fsf@nwalsh.com>
In one of the messages in the thread about HRRIs, Martin wrote:

> At least for the XML spec itself, there may be a point of
> view that simply saying "it's an IRI" won't change anything.
> I'll try to explain this below.
> 
> Looking at the definition of a PubidLiteral for a moment
> (http://www.w3.org/TR/REC-xml/#NT-PubidLiteral), it just
> specifies a range of characters that can be used, nothing
> more in terms of syntax, although it could be argued that
> some syntaxes (those with several // included) are much
> more likely, or even highly expected to make a PubidLiteral
> usable in a wider context (which 'Public' suggests in the
> first place).
> 
> Likewise, the syntax for SystemLiteral is specified simply
> as a string of characters (from a much wider repertoire).
> To say that this is an IRI does not restrict this syntax.
> 
> It is a well acknowledged fact that URI and IRI syntax are
> very difficult to check (because there are scheme-dependent
> restrictions, and so on) and that therefore, any strict
> checking (in the way e.g. the XML syntax is checked for
> well-formedness) is not appropriate for URIs or IRIs.
> 
> The rest (namely conversion of unallowed characters to
> %hh-encoding) seems to already be covered under the following
> paragraph from the IRI spec:
> 
>    Systems accepting IRIs MAY also deal with the printable characters in
>    US-ASCII that are not allowed in URIs, namely "<", ">", '"', space,
>    "{", "}", "|", "\", "^", and "`", in step 2 above.  If these
>    characters are found but are not converted, then the conversion
>    SHOULD fail.  Please note that the number sign ("#"), the percent
>    sign ("%"), and the square bracket characters ("[", "]") are not part
>    of the above list and MUST NOT be converted.  Protocols and formats
>    that have used earlier definitions of IRIs including these characters
>    MAY require percent-encoding of these characters as a preprocessing
>    step to extract the actual IRI from a given field.  This
>    preprocessing MAY also be used by applications allowing the user to
>    enter an IRI.
> 
> I'm not saying that this interpretation is the only one possible,
> and I'm not sure how it would apply to XLink and others, but
> I wanted to show it here as one point of view.

How does the XML Core WG feel about this interpretation?

Certainly, if we can comfortably conclude that everywhere we're
thinking of using HRRIs (and everywhere we can imagine wanting to in
the future) we can already say "it's an IRI", that simplifies things.

                                        Be seeing you,
                                          norm

-- 
Norman Walsh <ndw@nwalsh.com> | A great deal may be done by severity,
http://nwalsh.com/            | more by love, but most by clear
                              | discernment and impartial
                              | justice.--Goethe
Received on Monday, 16 July 2007 19:32:32 UTC