Re: IDN test, another potential validator bug

Martin Duerst wrote:

> assuming that what you wanted to say above was that an IRI really containing non-ASCII
> characters wasn't permitted as a system identifier. What makes you so sure? How would
> you deduce this from http://www.w3.org/TR/REC-xml/#dt-sysid?

| The characters to be escaped are the control characters [...], as well as all characters 
| above #x7F. 

"Above #x7F" means "non-ASCII", doesn't it ?  

>>SYSTEM "http://испытание.boldlygoingnowhere.org/xhtml1-i18n.dtd"
>>| cannot generate system identifier for document type "html".
 
> Despite the fact that the system identifier contains "boldlygoingnowhere",
> this IRI actually resolves. Therefore, this IS a validator bug.

But it contains unescaped non-ASCII characters, a Cyril label.  Isn't the
idea that a "system identifier" needs to be independent of the encoding ? 

>>| Line 2, Column 124: host 
>>| "%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5.boldlygoingnowhere.org"
>>| not found.
 
> This again is a validator bug, because the above is a perfectly legal
> (according to RFC 3986) URI.

Okay, legal <reg-name>,  maybe not perfect, RFC 3986 says about DNS:

| A registered name intended for lookup in the DNS uses the syntax
| defined in Section 3.5 of [RFC1034] and Section 2.1 of [RFC1123].
| Such a name consists of a sequence of domain labels separated by ".",
| each domain label starting and ending with an alphanumeric character
| and possibly also containing "-" characters. 

In other words "LDH" doesn't cover "%".   Later RFC 3986 says that the
"operating system of each application decides what it will allow for the
 purpose of host identification".   The OS of the W3C validator doesn't
like this construct, why do you think it's a validator bug ?

 Frank

Received on Tuesday, 6 November 2007 14:01:22 UTC