W3C home > Mailing lists > Public > public-xg-webid@w3.org > January 2012

Re: rdfa parsing issue -- was: fixed https://foafssl.org/test/WebId

From: Henry Story <henry.story@bblfish.net>
Date: Sun, 8 Jan 2012 19:55:10 +0100
Cc: "public-xg-webid@w3.org XG" <public-xg-webid@w3.org>, Damian Steer <pldms@mac.com>
Message-Id: <733D4B37-56D1-4C90-AB87-371DCB97E107@bblfish.net>
To: Jürgen Jakobitsch <j.jakobitsch@semantic-web.at>

On 8 Jan 2012, at 19:18, Jürgen Jakobitsch wrote:

> hi henry,
> 
> my rdfa test profile [1] now passes tests on your site [2].

Great. Even with the #j ? ;-)

> its also so fast, that i infer, that the catalog is working, right?

In fact I have not added the catalog code. I have added code to monitor outward connections to see if those were being called at all. Because I wanted to be able to be able to duplicate the issue before fixing it - since otherwise there is no way for me to know if any fixes I apply fix anything. 

So I'll keep monitoring the server until I can duplicate the problem, then apply the fix.

It could be that upgrading to the latest Jena on the Apache source code fixed something... 

Henry

> 
> wkr j
> 
> 
> [1] http://2sea.org/sea.jsp#j
> [2] https://foafssl.org/test/WebId
> 
> ----- Original Message -----
> From: "Henry Story" <henry.story@bblfish.net>
> To: "Jürgen Jakobitsch" <j.jakobitsch@semantic-web.at>
> Cc: "public-xg-webid@w3.org XG" <public-xg-webid@w3.org>, "Damian Steer" <pldms@mac.com>
> Sent: Saturday, January 7, 2012 2:46:10 PM
> Subject: Re: rdfa parsing issue -- was: fixed https://foafssl.org/test/WebId
> 
> 
> On 6 Jan 2012, at 22:03, Jürgen Jakobitsch wrote:
> 
>> henry,
>> 
>> i had exactly the same problem with rdfa parser from openrdf and DTD.
> 
> Do you know of a good way to reproduce this? I have tried with your profile
> 
>   http://2sea.org/sea.jsp
> 
> but when I try this locally with pretty much the same code base I never get
> the same error
> 
>> ERROR [pool-3-thread-5] (RDFDefaultErrorHandler.java:40) - http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd(line 106 column 22): {E213} Unexpected end of file from server
> 
> So it's a bit difficult for me to work out if I am fixing the problem.
> 
>> 
>> 
>> what you wanna do is :
>> 
>> 1. create a catalog (catalog.xml and download all DTDs)
>> 2. add file "CatalogManager.properties" to the classpath (in a maven project on netbeans you would simply put it in "other resources", so it gets jar'd)
>> 3. modify the code so the xml reader uses that catalog.
>> 
>> my parser looks about so :
>> 
>>  CatalogResolver catRes
>>  Transformer transformer
>> 
>> constructor
>> 
>>  TransformerFactory transFact = TransformerFactory.newInstance();
>>  CatalogManager catMan = new CatalogManager("CatalogManager.properties");
>>  catRes = new CatalogResolver(catMan);
>>  ClassLoader cl = RDFaParser.class.getClassLoader();
>>  Templates cachedXSLT = transFact.newTemplates(new StreamSource(cl.getResourceAsStream(XSLT)));
>>              transformer = cachedXSLT.newTransformer();
>>              transformer.setURIResolver(catRes);
>> 
>> parserMethod (StreamSource source)
>> 
>> XMLReader reader = XMLReaderFactory.createXMLReader();
>>         reader.setEntityResolver(catRes);
>>         reader.setFeature("http://xml.org/sax/features/validation", Boolean.FALSE);
>>         Source sXML=new SAXSource(reader, new InputSource(source.getInputStream()));
>> 	  transformer.transform(sXML, new StreamResult(out));                   
>> 
>> 
>> it took me some time to get this catalog thing up and running, here are some links,
>> 
>> 1. http://www.w3.org/blog/systeam/2008/02/08/w3c_s_excessive_dtd_traffic/  (see here for the overall trouble)
>> 2. http://xml.apache.org/commons/components/resolver/resolver-article.html
>> 3. http://nwalsh.com/docs/articles/xml2003/
>> 4. http://xerces.apache.org/xerces2-j/faq-xcatalogs.html
>> 5. http://www.sagehill.net/docbookxsl/WriteCatalog.html
>> 6. http://xml.apache.org/mirrors.cgi (download apache's resolver)
>> 
>> to save you some time :
>> 
>> 1. find the catalog.xml and the catalog in use by WebIDRealm attached (copy the contents of catalog.zip to /usr/share/catalogs/ and make sure the files are readable)
>>  if you copy the contents elsewhere you need to change the path in CatalogManager.properties as well.
>> 2. find the CatalogManager.properties in use by WebIDRealm attached
>> 
>> the basic workflow would :
>> 
>> 1. CatalogManager reads CatalogManager.properties and finds path of catalog.xml
>> 2. when resolving CatalogResolver looks in catalog.xml to see, if the docType (or module) is mapped there and tries to find the file in
>>  path specified in the catalog.xml.
>> 
>> 
>> 
>> if you have any questions regarding the catalog feel free to ask.
>> 
>> wkr j
>> 
>> ----- Original Message -----
>> From: "Henry Story" <henry.story@bblfish.net>
>> To: "Damian Steer" <pldms@mac.com>
>> Cc: "Jürgen Jakobitsch" <j.jakobitsch@semantic-web.at>, "public-xg-webid@w3.org XG" <public-xg-webid@w3.org>
>> Sent: Friday, January 6, 2012 9:10:58 PM
>> Subject: Re: rdfa parsing issue -- was: fixed https://foafssl.org/test/WebId
>> 
>> Thanks Damian,
>> 
>> that was very helpful.
>> 
>> I have now fixed a couple of issues on my side now, and I see that Jürgen has updated his xhtml even to be closer to xhtml. So the foafssl.org tester should work with that resource in any case.
>> 
>> Btw, I get the following in the logs
>> 
>> INF: [console logger] dispatch: 2sea.org GET /sea.jsp HTTP/1.1
>> ERROR [pool-3-thread-5] (RDFDefaultErrorHandler.java:40) - http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd(line 106 column 22): {E213} Unexpected end of file from server
>> 
>> It looks like the RDFa parser is following the DTDs. Is there a way to stop that? I guess the W3C does not serve those files.
>> 
>> Henry
>> 
>> 
>> On 6 Jan 2012, at 15:57, Damian Steer wrote:
>> 
>>> Hi Henry and Jürgen,
>>> 
>>> On 06/01/12 12:49, Henry Story wrote:
>>> 
>>>> Shellac's parser parses the xhtml correctly as xhtml in fact, but
>>>> when the html parser is used it comes to a different conclusion.
>>> 
>>> Yes, this is becoming a classic issue, and has nothing to do with RDFa
>>> (although RDFa obscures the issue horribly).
>>> 
>>>> RDFA 1 is defined in xhtml only I understand, so it is true that we
>>>> are going beyond what the spec by trying to parse html too. Perhaps
>>>> this will be a lot simplified with rdfa1.1 which can be made to work
>>>> with html5.
>>> 
>>> Yes, RDFa 1.0 is only really defined for xhtml, although useful work was
>>> done on html 5 at the time (there are some html 5 tests). RDFa 1.1 does
>>> address html 5, but note that it doesn't change anything here.
>>> 
>>> The problem is this:
>>> 
>>>  <div rel="foaf:depiction" href="http://2sea.org/2sealogo.png"/>
>>>  <div rel="cert:key">
>>> 	...
>>>  </div>
>>> 
>>> An xml parser sees a closed div, followed by another div. An html parser
>>> sees a broken div so repairs it as follows:
>>> 
>>>  <div rel="foaf:depiction" href="http://2sea.org/2sealogo.png">
>>>    <div rel="cert:key">
>>>      ...
>>>    </div>
>>>  </div> <!-- close that div -->
>>> 
>>> i.e. one div contains another now, and thus you find
>>> 
>>> <http://2sea.org/2sealogo.png> cert:key ....
>>> 
>>> I ought to add a utility to switch the parser based on content type,
>>> however in practice there's so much broken xhtml out there that tag soup
>>> parsing is much safer (although it does lead to issues like this).
>>> 
>>> My advice would be to expect tag soup parsing in the wild and change the
>>> html:
>>> 
>>>  <div rel="foaf:depiction" href="http://2sea.org/2sealogo.png"></div>
>>> 
>>> Hope this makes sense,
>>> 
>>> Damian
>> 
>> Social Web Architect
>> http://bblfish.net/
>> 
>> 
>> 
>> --
>> | Jürgen Jakobitsch,
>> | Software Developer
>> | Semantic Web Company GmbH
>> | Mariahilfer Straße 70 / Neubaugasse 1, Top 8
>> | A - 1070 Wien, Austria
>> | Mob +43 676 62 12 710 | Fax +43.1.402 12 35 - 22
>> 
>> COMPANY INFORMATION
>> | http://www.semantic-web.at/
>> 
>> PERSONAL INFORMATION
>> | web   : http://www.turnguard.com
>> | foaf  : http://www.turnguard.com/turnguard
>> | skype : jakobitsch-punkt
>> <RDFACatalog.zip><CatalogManager.properties>
> 
> Social Web Architect
> http://bblfish.net/
> 
> 
> 
> --
> | Jürgen Jakobitsch,
> | Software Developer
> | Semantic Web Company GmbH
> | Mariahilfer Straße 70 / Neubaugasse 1, Top 8
> | A - 1070 Wien, Austria
> | Mob +43 676 62 12 710 | Fax +43.1.402 12 35 - 22
> 
> COMPANY INFORMATION
> | http://www.semantic-web.at/
> 
> PERSONAL INFORMATION
> | web   : http://www.turnguard.com
> | foaf  : http://www.turnguard.com/turnguard
> | skype : jakobitsch-punkt

Social Web Architect
http://bblfish.net/
Received on Sunday, 8 January 2012 18:55:43 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Sunday, 8 January 2012 18:55:44 GMT