- From: Ville Skyttä <ville.skytta@iki.fi>
- Date: Fri, 11 Sep 2009 22:25:09 +0300
- To: public-qa-dev@w3.org
On Thursday 10 September 2009, Patrick Boens wrote: > Hello, > > When I use the "Semantic Data Extractor" on <http://www.latosensu.be/> > > Using org.apache.xerces.parsers.SAXParser > > Exception net.sf.saxon.trans.XPathException: org.xml.sax.SAXParseException: > The markup in the document following the root element must be well-formed. > > org.xml.sax.SAXParseException: The markup in the document following the > root element must be well-formed. > > However, when I validate this page with the W3C validator, it seems that > the document is perfectly well-formed. > > I don't know exactly why the parser blows up. Neither do I, nor can I reproduce the problem at the moment. But here are some things to look into that I noticed when grabbing the above URL with wget and libwww-perl's HEAD tool: - No charset parameter in HTTP Content-Type header - XHTML 1.1 served as text/html - No XML declaration (IIRC this means XML processors will default to UTF-8) - The document contents are ISO-8859-1 Unlike the markup validator, plain XML parsers quite likely will not do anything with the charset in the document's meta http-equiv tag.
Received on Friday, 11 September 2009 19:25:45 UTC