W3C home > Mailing lists > Public > html-tidy@w3.org > April to June 2008

Re: asxml produces invalid XML

From: Vaclav Barta <vbar@comp.cz>
Date: Mon, 23 Jun 2008 19:07:39 +0200
To: Bjoern Hoehrmann <derhoermi@gmx.net>
Cc: html-tidy@w3.org
Message-Id: <200806231907.40531.vbar@comp.cz>

On Monday 23 June 2008 18:19:56 Bjoern Hoehrmann wrote:
> * Vaclav Barta wrote:
> >which obviously not only isn't valid XHTML (and tidy knows that, warns
> > about proprietary attributes yet insists on the doctype and namespace
> >declarations), but isn't even XML - some synthetised attributes end with a
> >colon.
> This is actually allowed, it's only the Namespaces in XML Recommendation
> that considers this malformed.
Well yes, technically, but since XHTML does use namespaces, shouldn't tidy 
follow the recommendation?

> You may be able to turn namespace support
> off in your parser and strip the attributes, or ignore them. Further,
Not easily - in fact, I don't think I've ever used an XML library with 
configurable namespace support. I don't doubt they exist, but I don't think 
they're all that popular - most XML processing is AFAIK namespace-aware these 
days...

> you can use the --drop-proprietary-attributes (or whatever is called)
> option to drop them (and other attributes). Other than that Tidy has not
Well, I don't really want to drop all proprietary attributes - just the 
unparseable ones... I think the general problem is that HTML Tidy is meant to 
produce documents with standartized semantics, while I want just the XML 
syntax (as an input for capturing site-specific semantics later) from it - 
maybe I'm using the wrong tool...

> so many choices here to produce better-formed XML, it could only strip
> the attributes. Perhaps that merits some configuration option though.
I think asxml should be a totally different option from asxhtml - while XHTML 
certainly is XML, tidy seems to assume the opposite as well...

	Bye
		Vasek
--
http://www.mangrove.cz/
Open Source integration
Received on Monday, 23 June 2008 17:08:18 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 3 April 2012 06:13:59 GMT