Re: Fwd: Re: HRRIs, IRIs, etc

Hello John,

At 06:42 07/06/21, John Cowan wrote:
>Addison Phillips scripsit:
>
>> I'm concerned about this discussion. I note that it has been a long 
>> standing (perhaps mythological) belief by many of us in the 
>> internationalization activity that XLink, XML Base, et al, represented 
>> an instance of IRI. 
>
>It's always been true that random ASCII characters that are forbidden
>in URI/IRIs have "worked" in XML system identifiers, as well as the
>other things derived from it.  That didn't turn out to be what IRIs
>are -- they have the same restrictions within the ASCII repertoire
>as IRIs.

I guess you ment "URIs" in the last line.

This is true, and is also true for HTML.

There are several ways to explain this:

- Implementers carefully implemented the spec.

- Implementers did what worked with the least effort.

- Implementers understood that it's a well-held principle for URIs
  and IRIs that there shouldn't (or can't) be any detailled syntax checks.
  About the only thing you can check reliably without going down the
  scheme specific road is that if it contains a ':', then the characters
  before the first ':' need to match the scheme production:
       scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )
  I think the XML Schema WG tried to come up with a regexp, but they
  gave up. Please also see the following note at
  http://www.w3.org/TR/xmlschema-2/#anyURI
  Note:  Each URI scheme imposes specialized syntax rules for URIs in that
     scheme, including restrictions on the syntax of allowed fragment identifiers.
     Because it is impractical for processors to check that a value is a
     context-appropriate URI reference, this specification follows the lead of
     [RFC 2396] (as amended by [RFC 2732]) in this matter: such rules and
     restrictions are not part of type validity and are not checked by
     ・minimally conforming・ processors. Thus in practice the above definition
     imposes only very modest obligations on ・minimally conforming・ processors.

>This is quite independent of the status of SPACE.

Can you explain how this is independent? Isn't space just one of these
characters?

Regards,    Martin.



#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst@it.aoyama.ac.jp     

Received on Friday, 22 June 2007 10:40:28 UTC