- From: Frank Ellermann <nobody@xyzzy.claranet.de>
- Date: Tue, 6 Nov 2007 13:52:42 +0100
- To: www-validator@w3.org
Martin Duerst wrote: > they'd only clarify whatthey meant when they recommended, > in http://www.w3.org/TR/html4/appendix/notes.html#h-B.2.1, > to convert URIs with non-ASCII characters to UTF-8 and > then to use percent-encoding The given example states that it's <strong>illegal</strong>, after that it explains a best guess implementation clearly written before you published RFC 3987. It doesn't address IDNs, IDNs didn't exist 1999. When browsers try to guess what broken URIs mean they could run into the recent flood of "XP with IE7" security issues. >> for incompatible modifications we need new document types. > The new document type would not at all differ in functionality > from the old one. The only changes might be comments and > the names of parameter entities, but as with programs, that > doesn't change the functionality at all. It's a "formally valid" experiment with unencoded IRIs in links, that can be (legally) relevant for accesibility. It might also help for an RFC 3987 implementation and interoperability report, "it's illegal but it works" wouldn't be convincing (of course there would be still atom and xmpp if all else fails). Maybe somebody creates a corresponding schema allowing to check IRI syntax. > %D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5 > would be just garbage for them. The W3C validator apparently hates this in a system identifier: http://hmdmhdfmhdjmzdtjmzdtzktdkztdjz.googlepages.com/IDN-XML-test.htm (sorry, I can't read your unencoded ISO-2022-JP examples at the moment) > Formally valid means valid according to the DTD, I guess. No, I meant the prose 2396 specification of URI, not CDATA. > Temporary accessibility issues are issues of the kind "The > current screen readers/audio browsers/... only support foo, > so in order to be accessible, use foo, not bar". Once the > technology has caught up (and accessibility technology > improves in the same way other technology improves), such > a requirement may no longer apply. "Temporary" can be a rather long time, RFC 2277 talks about 50 years wrt UTF-8. Worldwide upgrades take some time. The "real" IDN TLD test started less than four weeks ago, and on another list you argued that not much will happen before real IDN TLDs are introduced. > For some tests, please see > http://www.sw.it.aoyama.ac.jp/2005/iritest/, Thanks, Firefox 2.0.0.9 fails already in the Latin-1 "Bücher" test, of course it works for UTF-8. What I had in mind would be minimally harder, using "Bücher" in an unencoded IDN on a Web page using a legacy charset. Obviously I can forget this for now, if it doesn't work in an <ipath> it also won't work in an <ihost>. > URIs/IRIs are supposed to be very flexible. Actually I'm lost with LEIRIs, HRRIs, options allowing to use unencoded ASCII characters in IRIs not permitted in URIs, and the recent discussion about allowing unencoded square brackets outside of <IP-literal>. With URIs it's clear, if they're valid they must match the generic STD 66 syntax. No unencoded spaces etc. > If somebody came along tomorrow with a very great idea > for an extension to the URI syntax, and the community > agreed with that extension, even if it wouldn't fit the > current syntax definition, then this would lead to an > update of the URI spec. There's no "updates RFC 3986" in the URI template draft. > If you want to create some software that tries to spot > potential mistakes in an HTML document, I'd guess you'd > surely flag something like <a href='htpp://www.w3.org'... Flag and warn yes, but it's no STD 66 syntax error. The tool could restrict schemes to registered schemes and allow to configure additional unregistered schemes. > example, consider the following: > <img src='http://example.org/top.html'> > Again, this clearly looks like a mistake The "top.html" is just a name, admittedly a bad name if the resource is something that can be displayed as image. A legal URI. OTOH "bücher.html" isn't a legal URI. > Again, <img src='mailto:abc@example.com'> looks like > nonsense, but again, it may make sense in the future. "Syntactically valid" isn't the same as "makes sense", I think we don't disagree about this. Where we might disagree is about "syntactically invalid". Browsers are forced to make sense out of (some kinds of) garbage, but a syntax check is supposed to report syntax errors. > if it took you several months to figure out why you > need octet 128 rather than NCR €, then at least > at that point in time, you didn't really know much > about the fundamentals of Web internationalization. In 2001 I knew _nothing_ about it, I was armed with a Netscape 2.02 not supporting UTF-8 and treating € as Euro, an O'Reilly book with "XHTML" in its title published 2000, the W3C validator for online syntax checks, and a box with local codepages "850" + 437. > Well, this is a circular argument. "Let's annoy users > now so that we don't need to annoy them later." doesn't > make sense if "Let's not annoy them at all." is the > best option anyway. "Let's not annoy them at all" won't fly if the STD 66 syntax is checked later. It was good when the validator finally (2001-09-13) informed me that € is crap, it would have been better if it had done that a few months earlier. Frank
Received on Tuesday, 6 November 2007 13:08:50 UTC