W3C home > Mailing lists > Public > public-qa-dev@w3.org > July 2008

Re: & really?

From: Dominique Hazael-Massieux <dom@w3.org>
Date: Thu, 24 Jul 2008 14:14:18 +0200
To: Jon Diamond <strategictech@gmail.com>
Cc: public-qa-dev@w3.org
Message-Id: <1216901658.7139.52.camel@localhost>

Le jeudi 24 juillet 2008 à 04:34 -0400, Jon Diamond a écrit :
> Please don't mistake this as a suggestion for the markup validation
> tool. Although I would love to be able to feed multiple pages for
> evaluation... but for the
> http://www.w3.org/2003/12/semantic-extractor.html tool... why can't
> you just disregard invalid character entities?

The reason for this is that the semantic extractor tool is mostly an
XSLT style sheet, that is to say that it relies on its input to be
well-formed XML; since the relevant content is already supposed to be
XML (since XHTML is an application of XML), the semantic extractor
doesn't try to transform it into XML beforehand, and thus fails on this
well-formedness bug.

You can see what you would get with a well-formed content at:
http://www.w3.org/2005/08/online_xslt/xslt?xmlfile=http%3A%2F%
2Fcgi.w3.org%2Fcgi-bin%2Ftidy%3FdocAddr%3Dhttp%253A%252F%
252Fwww.imageworksstudio.com%252F&xslfile=http%3A%2F%2Fwww.w3.org%
2F2002%2F08%2Fextract-semantic.xsl

HTH,

Dom
Received on Thursday, 24 July 2008 12:15:35 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 19 August 2010 18:12:49 GMT