- From: Liam Quinn <liam@htmlhelp.com>
- Date: Sat, 21 Apr 2001 13:21:30 -0400 (EDT)
- To: Terje Bless <link@tss.no>
- cc: "'gerald et al.'" <www-validator@w3.org>
On Sat, 21 Apr 2001, Terje Bless wrote: > More worrying is the fact that we don't catch ISO-8859-1 in documents > labelled as US-ASCII (see TODO #1 <URL:http://validator.w3.org/todo.html>) > and I don't quite know why. Do any of you (Liam, Nick? Anyone?) have any > ideas? What does Page Valet and the WDG Validator (and A Real Validator for > that matter) do with that doc? The WDG HTML Validator labels US-ASCII documents as ISO-8859-1 when passing off to lq-nsgmls, and so it considers that example document valid. And it is valid: "An XML document is valid if it has an associated document type declaration and if the document complies with the constraints expressed in it." [1] The 8-bit character is an error, but it's an error in a similar way to including <a href="foo bar"> in an HTML document. URIs can't contain spaces, but HTML validators don't complain. It would be nice if HTML validators could warn of invalid URI syntax and character coding problems, but it's not required. I wouldn't know how to warn with Text::Iconv, but it should be possible to report the problem with Unicode::Map8 by subclassing and overriding the unmapped_to8 member. But then there's still the problem of multi-byte encoding problems... [1] http://www.w3.org/TR/REC-xml#dt-valid -- Liam Quinn
Received on Saturday, 21 April 2001 13:21:42 UTC