Re: rdfa parsing issue -- was: fixed

On 6 Jan 2012, at 22:03, Jürgen Jakobitsch wrote:

> henry,
> i had exactly the same problem with rdfa parser from openrdf and DTD.

Do you know of a good way to reproduce this? I have tried with your profile

but when I try this locally with pretty much the same code base I never get 
the same error

> ERROR [pool-3-thread-5] ( - 106 column 22): {E213} Unexpected end of file from server

So it's a bit difficult for me to work out if I am fixing the problem.

> what you wanna do is :
> 1. create a catalog (catalog.xml and download all DTDs)
> 2. add file "" to the classpath (in a maven project on netbeans you would simply put it in "other resources", so it gets jar'd)
> 3. modify the code so the xml reader uses that catalog.
> my parser looks about so :
>   CatalogResolver catRes
>   Transformer transformer
> constructor
>   TransformerFactory transFact = TransformerFactory.newInstance();
>   CatalogManager catMan = new CatalogManager("");
>   catRes = new CatalogResolver(catMan);
>   ClassLoader cl = RDFaParser.class.getClassLoader();
>   Templates cachedXSLT = transFact.newTemplates(new StreamSource(cl.getResourceAsStream(XSLT)));
>               transformer = cachedXSLT.newTransformer();
>               transformer.setURIResolver(catRes);
> parserMethod (StreamSource source)
> XMLReader reader = XMLReaderFactory.createXMLReader();
>          reader.setEntityResolver(catRes);
>          reader.setFeature("", Boolean.FALSE);
>          Source sXML=new SAXSource(reader, new InputSource(source.getInputStream()));
> 	  transformer.transform(sXML, new StreamResult(out));                     
> it took me some time to get this catalog thing up and running, here are some links,
> 1.  (see here for the overall trouble)
> 2.
> 3.
> 4.
> 5.
> 6. (download apache's resolver)
> to save you some time :
> 1. find the catalog.xml and the catalog in use by WebIDRealm attached (copy the contents of to /usr/share/catalogs/ and make sure the files are readable)
>   if you copy the contents elsewhere you need to change the path in as well.
> 2. find the in use by WebIDRealm attached
> the basic workflow would :
> 1. CatalogManager reads and finds path of catalog.xml
> 2. when resolving CatalogResolver looks in catalog.xml to see, if the docType (or module) is mapped there and tries to find the file in
>   path specified in the catalog.xml.
> if you have any questions regarding the catalog feel free to ask.
> wkr j
> ----- Original Message -----
> From: "Henry Story" <>
> To: "Damian Steer" <>
> Cc: "Jürgen Jakobitsch" <>, " XG" <>
> Sent: Friday, January 6, 2012 9:10:58 PM
> Subject: Re: rdfa parsing issue -- was: fixed
> Thanks Damian,
>  that was very helpful.
> I have now fixed a couple of issues on my side now, and I see that Jürgen has updated his xhtml even to be closer to xhtml. So the tester should work with that resource in any case.
> Btw, I get the following in the logs
> INF: [console logger] dispatch: GET /sea.jsp HTTP/1.1
> ERROR [pool-3-thread-5] ( - 106 column 22): {E213} Unexpected end of file from server
> It looks like the RDFa parser is following the DTDs. Is there a way to stop that? I guess the W3C does not serve those files.
> Henry
> On 6 Jan 2012, at 15:57, Damian Steer wrote:
>> Hi Henry and Jürgen,
>> On 06/01/12 12:49, Henry Story wrote:
>>> Shellac's parser parses the xhtml correctly as xhtml in fact, but
>>> when the html parser is used it comes to a different conclusion.
>> Yes, this is becoming a classic issue, and has nothing to do with RDFa
>> (although RDFa obscures the issue horribly).
>>> RDFA 1 is defined in xhtml only I understand, so it is true that we
>>> are going beyond what the spec by trying to parse html too. Perhaps
>>> this will be a lot simplified with rdfa1.1 which can be made to work
>>> with html5.
>> Yes, RDFa 1.0 is only really defined for xhtml, although useful work was
>> done on html 5 at the time (there are some html 5 tests). RDFa 1.1 does
>> address html 5, but note that it doesn't change anything here.
>> The problem is this:
>>   <div rel="foaf:depiction" href=""/>
>>   <div rel="cert:key">
>> 	...
>>   </div>
>> An xml parser sees a closed div, followed by another div. An html parser
>> sees a broken div so repairs it as follows:
>>   <div rel="foaf:depiction" href="">
>>     <div rel="cert:key">
>>       ...
>>     </div>
>>   </div> <!-- close that div -->
>> i.e. one div contains another now, and thus you find
>> <> cert:key ....
>> I ought to add a utility to switch the parser based on content type,
>> however in practice there's so much broken xhtml out there that tag soup
>> parsing is much safer (although it does lead to issues like this).
>> My advice would be to expect tag soup parsing in the wild and change the
>> html:
>>   <div rel="foaf:depiction" href=""></div>
>> Hope this makes sense,
>> Damian
> Social Web Architect
> --
> | Jürgen Jakobitsch,
> | Software Developer
> | Semantic Web Company GmbH
> | Mariahilfer Straße 70 / Neubaugasse 1, Top 8
> | A - 1070 Wien, Austria
> | Mob +43 676 62 12 710 | Fax +43.1.402 12 35 - 22
> |
> | web   :
> | foaf  :
> | skype : jakobitsch-punkt
> <><>

Social Web Architect

Received on Saturday, 7 January 2012 13:46:48 UTC