(unknown charset) Re: Problem with HTML Tidy: no encoding specified in XML output from (unknown charset) Dave Raggett on 2000-09-16 (html-tidy@w3.org from July to September 2000)

From: (unknown charset) Dave Raggett <dsr@w3.org>
Date: Sat, 16 Sep 2000 18:48:43 +0100 (GMT Daylight Time)
To: (unknown charset) Bjoern Hoehrmann <derhoermi@gmx.net>
cc: (unknown charset) Mikael StÃ¥ldal <d96-mst-ingen-reklam@d.kth.se>, html-tidy@w3.org
Message-ID: <Pine.WNT.4.10.10009161841190.-522721@hazel.hpl.hp.com>

On Mon, 4 Sep 2000, Bjoern Hoehrmann wrote:

> * "Mikael Ståldal" <d96-mst-ingen-reklam@d.kth.se> wrote:
> | When using HTML Tidy with the options -asxml -latin1, it doesn't output
> |
> | <?xml version="1.0" encoding="iso-8859-1"?>
> |
> | as it should in order to produce well-formed XML. Without the encoding
> | specification, an XML parser will assume UTF-8.
> 
> Use '--add-xml-decl yes' but i agree, that tidy should do this
> automatically (if there are iso-8859-1 characters in the file.
> If all chars are encoded as entities it isn't necessary,
> beacause the file is us-ascii and us-ascii is a subset of utf-8,
> the default encoding of XML files.)

I have modified AdjustConfig() in config.c and the misnamed
FixXMLPI() in lexer.c to deal with this. This feature will be
available in the next release, as further thought is needed
on dealing with say Microsoft Windows specific encodings.

Regards,

-- Dave Raggett <dsr@w3.org> http://www.w3.org/People/Raggett
tel/fax: +44 122 578 3011 (or 2521) +44 778 532 0444 (mobile)
World Wide Web Consortium (on assignment from HP Labs)

Received on Saturday, 16 September 2000 13:48:52 UTC