W3C home > Mailing lists > Public > html-tidy@w3.org > October to December 2002

Re: pb with entities FOUND IT

From: Christophe Strobbe <christophe.strobbe@esat.kuleuven.ac.be>
Date: Wed, 04 Dec 2002 17:35:49 +0100
Message-Id: <5.0.2.1.2.20021204172641.028a2220@mailserv.esat.kuleuven.ac.be>
To: Riccardo Cohen <rcohen@dial.oleane.com>, html-tidy@w3.org

Hi Riccardo,


At 18:15 4/12/2002, Riccardo Cohen wrote:

>by the way, is this a bug in tidy, or is it normal that without doctype, 
>&eacute; cant
>be generated ? (from my point of view this behavior is not normal, but I 
>dont know very well standards)
>Thanks

In HTML, the DOCTYPE is actually required; see 
http://www.w3.org/TR/html401/struct/global.html#idx-document_type_declaration-3:
"A valid HTML document declares what version of HTML is used in the 
document. The document type declaration names the document type definition 
(DTD) in use for the document (see [ISO8879])."
The DTD (whether strict, transitional, or frameset) contains a reference to 
"HTMLlat1.ent", which contains the ISO Latin 1 character entities, so 
without the DOCTYPE, you can't always expect an HTML parser to know about 
entities.

This may not be an explanation of the behaviour of Tidy, but it's relevant 
to parsing generally.

Regards,

Christophe
Received on Wednesday, 4 December 2002 11:35:32 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 5 February 2014 23:39:48 UTC