- From: Bjoern Hoehrmann <derhoermi@gmx.net>
- Date: Tue, 28 Sep 2004 13:35:34 +0200
- To: Martin Duerst <duerst@w3.org>
- Cc: public-qa-dev@w3.org
* Martin Duerst wrote: >>Well, I need the functionality for a number of other things than the >>Validator > >What other things are these? My HTML::Tidy module needs this as HTML Tidy has only limited, highly- experimental and off-by-default functionality in this regard which won't change too soon, I already mentioned the experimental AppC Validator, and there will be a PerlSAX extension that annotates event streams with additional information that depends on the availability of the source code of the document in form of a character string. >What kind of trials and errors would this include? Are you thinking >about some encoding detection heuristics, or something else? I do not know yet. One example could be to choose a different encoding if the encoding has been determined as X but is not legal X, especially in case of conflicting declarations and/or specifications. >I agree that this should be the goal. But the wish for the >perfect now is the enemy of the good soon (such as release often). That depends... As I wrote, it is often much simpler to address such issues in an external module (possibly started from scratch) than messing with the code deeply burried into check. >Yes, getting more knowledge of aliases and stuff into such >a module would probably be something to do. Encode::Alias and I18N::Charset or modules building on top of those might be better places though. >>Just like people tend to disagree what >>the encoding of a document http://www.example.org/ >> >> ... >> Content-Type: text/html >> >> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" >> "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> >> <html xmlns="http://www.w3.org/1999/xhtml"><head> >> <meta http-equiv = "Content-Type" content = >> "text/html;charset=iso-8859-2" /> >> <title></title></head><body><p>...</p></body></html> >> >>would be. > >I know some people might claim that this is iso-8859-1, >but they definitely would be wrong. If it's not for the >specs, then for all the implementations out there. Other claims are * UTF-8 * ISO-8859-2 * US-ASCII * implementation defined * ... And implementations do disagree here. The Markup Validator for example would consider it ISO-8859-2, the W3C CSS Validator would consider it UTF-8 encoded. But implementations do not seem very relevant here, I know some people might claim that (if the type were text/xml) it is US-ASCII, but they definitely would be wrong. If it's not for the specs, then for all the implementations out there...
Received on Tuesday, 28 September 2004 11:36:17 UTC