On Thursday 10 September 2009, Patrick Boens wrote: > Hello, > > When I use the "Semantic Data Extractor" on <http://www.latosensu.be/> > > Using org.apache.xerces.parsers.SAXParser > > Exception net.sf.saxon.trans.XPathException: org.xml.sax.SAXParseException: > The markup in the document following the root element must be well-formed. > > org.xml.sax.SAXParseException: The markup in the document following the > root element must be well-formed. > > However, when I validate this page with the W3C validator, it seems that > the document is perfectly well-formed. > > I don't know exactly why the parser blows up. Neither do I, nor can I reproduce the problem at the moment. But here are some things to look into that I noticed when grabbing the above URL with wget and libwww-perl's HEAD tool: - No charset parameter in HTTP Content-Type header - XHTML 1.1 served as text/html - No XML declaration (IIRC this means XML processors will default to UTF-8) - The document contents are ISO-8859-1 Unlike the markup validator, plain XML parsers quite likely will not do anything with the charset in the document's meta http-equiv tag.Received on Friday, 11 September 2009 19:25:45 UTC
This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:54:56 UTC