(unknown charset) Re: Problem with HTML Tidy: no encoding specified in XML output

On Mon, 4 Sep 2000, Bjoern Hoehrmann wrote:

> * "Mikael Ståldal" <d96-mst-ingen-reklam@d.kth.se> wrote:
> | When using HTML Tidy with the options -asxml -latin1, it doesn't output
> |
> | <?xml version="1.0" encoding="iso-8859-1"?>
> |
> | as it should in order to produce well-formed XML. Without the encoding
> | specification, an XML parser will assume UTF-8.
> 
> Use '--add-xml-decl yes' but i agree, that tidy should do this
> automatically (if there are iso-8859-1 characters in the file.
> If all chars are encoded as entities it isn't necessary,
> beacause the file is us-ascii and us-ascii is a subset of utf-8,
> the default encoding of XML files.)

I have modified AdjustConfig() in config.c and the misnamed
FixXMLPI() in lexer.c to deal with this. This feature will be
available in the next release, as further thought is needed
on dealing with say Microsoft Windows specific encodings.

Regards,

-- Dave Raggett <dsr@w3.org> http://www.w3.org/People/Raggett
tel/fax: +44 122 578 3011 (or 2521) +44 778 532 0444 (mobile)
World Wide Web Consortium (on assignment from HP Labs)

Received on Saturday, 16 September 2000 13:48:52 UTC