W3C home > Mailing lists > Public > www-validator@w3.org > November 2007

Re: IDN test, another potential validator bug

From: Martin Duerst <duerst@it.aoyama.ac.jp>
Date: Tue, 06 Nov 2007 17:24:50 +0900
Message-Id: <6.0.0.20.2.20071106170927.0b4c07f0@localhost>
To: www-validator@w3.org

Frank Ellermann answered apparently to himself:

>Frank Ellermann wrote:
>
>> I'm not exactly sure about this issue
>
>Meanwhile I'm sure that an unencoded UTF-8 IRI isn't permitted in a system identifier:

First, wrong terminology. XML, including system identifiers in
XML, are described in terms of characters, not in terms of any
fixed encoding.

Second, assuming that what you wanted to say above was that
an IRI really containing non-ASCII characters wasn't permitted
as a system identifier. What makes you so sure? How would you
deduce this from http://www.w3.org/TR/REC-xml/#dt-sysid?
There are known differences between the above and IRIs,
the two main ones are that the above paragraph doesn't use
the term IRI and that the XML System Identifier definition
allows some characters that the IRI spec doesn't allow,
but these are mostly fringe characters that shouldn't be
used anyway.

>http://hmdmhdfmhdjmzdtjmzdtzktdkztdjz.googlepages.com/IDN-IRI-test.htm
>is identified as invalid.  It uses
>SYSTEM "http://испытание.boldlygoingnowhere.org/xhtml1-i18n.dtd"
>
>| cannot generate system identifier for document type "html".

Despite the fact that the system identifier contains "boldlygoingnowhere",
this IRI actually resolves. Therefore, this IS a validator bug.

>Perfect.  What I really wanted was this:
>
>http://hmdmhdfmhdjmzdtjmzdtzktdkztdjz.googlepages.com/IDN-XML-test.htm
>is identified as invalid.  It uses
>SYSTEM "http://%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5.boldlygoingnowhere.org/xhtml1-i18n.dtd"
>
>| Line 2, Column 124: host 
>| "%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5.boldlygoingnowhere.org"
>| not found.

This again is a validator bug, because the above is a perfectly legal
(according to RFC 3986) URI.

>It's no host, no URI, and no IRI, I think it's an XML 1.0 (3rd ed. or 4th ed.) "system identifier".

The system identifier overall is a system identifier. But what
the validator is complaining is that it can't find the host that's
part of this system identifier. If the validator were correct,
this might help you locate a typo in your system identifier.
So as an error message, there's nothing wrong with it.

Regards,     Martin.


#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst@it.aoyama.ac.jp     
Received on Tuesday, 6 November 2007 08:26:44 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:14:26 GMT