W3C home > Mailing lists > Public > public-xg-webid@w3.org > January 2012

Re: rdfa parsing issue -- was: fixed https://foafssl.org/test/WebId

From: Jürgen Jakobitsch <j.jakobitsch@semantic-web.at>
Date: Sat, 07 Jan 2012 15:03:08 +0100 (CET)
To: Henry Story <henry.story@bblfish.net>
Cc: "public-xg-webid@w3.org XG" <public-xg-webid@w3.org>, Damian Steer <pldms@mac.com>
Message-ID: <c861358c-cf35-4f4c-a309-a0a2e4d8c30b@zcs>
hi henry,

that might already be a good sign, if you don't get the error.
what i first did, was to ensure that the catalog is used at all.

to do so, i

1. created a class that extends apache's CatalogResolver [1] (named it XCatalogResolver)
2. implement resolveEntity method like so

    @Override
    public InputSource resolveEntity(String publicId, String systemId) {
        logger.debug("RESOLVING "+publicId + " " + systemId);
        return super.resolveEntity(publicId, systemId);
    }

    - if you don't want to use log4j, you can also use System.out for testing purposes...
3. changed line 
   catRes = new CatalogResolver(catMan); (see below)
   to
   catRes = new XCatalogResolver(catMan);
4. then ran test-parsing.
   when you DO NOT see "RESOLVING...." in your logs or console, it is save to say
   the the catalog is not in use.
5. if you DO see a list of "RESOLVING" in your logs or console, it is save to say
   the catalog is used and everything is fine.

what can cause a catalog not beeing used :

1. CatalogManager doesn't find CatalogManager.properties
2. CatalogManager doesn't find catalog.xml from CatalogManager.properties (or is not allowed to read)
3. in one post (i lost the url) i found a comment that the URIResolver was not used because there
   where no Templates in use.
   it looks like this (URIResolver is simply not used) can happen on different occasions, but you'll
   find out as soon as you have a "logging"-URIResolver in place.

wkr j


[1] org.apache.xml.resolver.tools.CatalogResolver


----- Original Message -----
From: "Henry Story" <henry.story@bblfish.net>
To: "Jürgen Jakobitsch" <j.jakobitsch@semantic-web.at>
Cc: "public-xg-webid@w3.org XG" <public-xg-webid@w3.org>, "Damian Steer" <pldms@mac.com>
Sent: Saturday, January 7, 2012 2:46:10 PM
Subject: Re: rdfa parsing issue -- was: fixed https://foafssl.org/test/WebId


On 6 Jan 2012, at 22:03, Jürgen Jakobitsch wrote:

> henry,
> 
> i had exactly the same problem with rdfa parser from openrdf and DTD.

Do you know of a good way to reproduce this? I have tried with your profile

   http://2sea.org/sea.jsp

but when I try this locally with pretty much the same code base I never get 
the same error

> ERROR [pool-3-thread-5] (RDFDefaultErrorHandler.java:40) - http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd(line 106 column 22): {E213} Unexpected end of file from server

So it's a bit difficult for me to work out if I am fixing the problem.

> 
> 
> what you wanna do is :
> 
> 1. create a catalog (catalog.xml and download all DTDs)
> 2. add file "CatalogManager.properties" to the classpath (in a maven project on netbeans you would simply put it in "other resources", so it gets jar'd)
> 3. modify the code so the xml reader uses that catalog.
> 
> my parser looks about so :
> 
>   CatalogResolver catRes
>   Transformer transformer
> 
> constructor
> 
>   TransformerFactory transFact = TransformerFactory.newInstance();
>   CatalogManager catMan = new CatalogManager("CatalogManager.properties");
>   catRes = new CatalogResolver(catMan);
>   ClassLoader cl = RDFaParser.class.getClassLoader();
>   Templates cachedXSLT = transFact.newTemplates(new StreamSource(cl.getResourceAsStream(XSLT)));
>               transformer = cachedXSLT.newTransformer();
>               transformer.setURIResolver(catRes);
> 
> parserMethod (StreamSource source)
> 
> XMLReader reader = XMLReaderFactory.createXMLReader();
>          reader.setEntityResolver(catRes);
>          reader.setFeature("http://xml.org/sax/features/validation", Boolean.FALSE);
>          Source sXML=new SAXSource(reader, new InputSource(source.getInputStream()));
> 	  transformer.transform(sXML, new StreamResult(out));                     
> 
> 
> it took me some time to get this catalog thing up and running, here are some links,
> 
> 1. http://www.w3.org/blog/systeam/2008/02/08/w3c_s_excessive_dtd_traffic/  (see here for the overall trouble)
> 2. http://xml.apache.org/commons/components/resolver/resolver-article.html
> 3. http://nwalsh.com/docs/articles/xml2003/
> 4. http://xerces.apache.org/xerces2-j/faq-xcatalogs.html
> 5. http://www.sagehill.net/docbookxsl/WriteCatalog.html
> 6. http://xml.apache.org/mirrors.cgi (download apache's resolver)
> 
> to save you some time :
> 
> 1. find the catalog.xml and the catalog in use by WebIDRealm attached (copy the contents of catalog.zip to /usr/share/catalogs/ and make sure the files are readable)
>   if you copy the contents elsewhere you need to change the path in CatalogManager.properties as well.
> 2. find the CatalogManager.properties in use by WebIDRealm attached
> 
> the basic workflow would :
> 
> 1. CatalogManager reads CatalogManager.properties and finds path of catalog.xml
> 2. when resolving CatalogResolver looks in catalog.xml to see, if the docType (or module) is mapped there and tries to find the file in
>   path specified in the catalog.xml.
> 
> 
> 
> if you have any questions regarding the catalog feel free to ask.
> 
> wkr j
> 
> ----- Original Message -----
> From: "Henry Story" <henry.story@bblfish.net>
> To: "Damian Steer" <pldms@mac.com>
> Cc: "Jürgen Jakobitsch" <j.jakobitsch@semantic-web.at>, "public-xg-webid@w3.org XG" <public-xg-webid@w3.org>
> Sent: Friday, January 6, 2012 9:10:58 PM
> Subject: Re: rdfa parsing issue -- was: fixed https://foafssl.org/test/WebId
> 
> Thanks Damian,
> 
>  that was very helpful.
> 
> I have now fixed a couple of issues on my side now, and I see that Jürgen has updated his xhtml even to be closer to xhtml. So the foafssl.org tester should work with that resource in any case.
> 
> Btw, I get the following in the logs
> 
> INF: [console logger] dispatch: 2sea.org GET /sea.jsp HTTP/1.1
> ERROR [pool-3-thread-5] (RDFDefaultErrorHandler.java:40) - http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd(line 106 column 22): {E213} Unexpected end of file from server
> 
> It looks like the RDFa parser is following the DTDs. Is there a way to stop that? I guess the W3C does not serve those files.
> 
> Henry
> 
> 
> On 6 Jan 2012, at 15:57, Damian Steer wrote:
> 
>> Hi Henry and Jürgen,
>> 
>> On 06/01/12 12:49, Henry Story wrote:
>> 
>>> Shellac's parser parses the xhtml correctly as xhtml in fact, but
>>> when the html parser is used it comes to a different conclusion.
>> 
>> Yes, this is becoming a classic issue, and has nothing to do with RDFa
>> (although RDFa obscures the issue horribly).
>> 
>>> RDFA 1 is defined in xhtml only I understand, so it is true that we
>>> are going beyond what the spec by trying to parse html too. Perhaps
>>> this will be a lot simplified with rdfa1.1 which can be made to work
>>> with html5.
>> 
>> Yes, RDFa 1.0 is only really defined for xhtml, although useful work was
>> done on html 5 at the time (there are some html 5 tests). RDFa 1.1 does
>> address html 5, but note that it doesn't change anything here.
>> 
>> The problem is this:
>> 
>>   <div rel="foaf:depiction" href="http://2sea.org/2sealogo.png"/>
>>   <div rel="cert:key">
>> 	...
>>   </div>
>> 
>> An xml parser sees a closed div, followed by another div. An html parser
>> sees a broken div so repairs it as follows:
>> 
>>   <div rel="foaf:depiction" href="http://2sea.org/2sealogo.png">
>>     <div rel="cert:key">
>>       ...
>>     </div>
>>   </div> <!-- close that div -->
>> 
>> i.e. one div contains another now, and thus you find
>> 
>> <http://2sea.org/2sealogo.png> cert:key ....
>> 
>> I ought to add a utility to switch the parser based on content type,
>> however in practice there's so much broken xhtml out there that tag soup
>> parsing is much safer (although it does lead to issues like this).
>> 
>> My advice would be to expect tag soup parsing in the wild and change the
>> html:
>> 
>>   <div rel="foaf:depiction" href="http://2sea.org/2sealogo.png"></div>
>> 
>> Hope this makes sense,
>> 
>> Damian
> 
> Social Web Architect
> http://bblfish.net/
> 
> 
> 
> --
> | Jürgen Jakobitsch,
> | Software Developer
> | Semantic Web Company GmbH
> | Mariahilfer Straße 70 / Neubaugasse 1, Top 8
> | A - 1070 Wien, Austria
> | Mob +43 676 62 12 710 | Fax +43.1.402 12 35 - 22
> 
> COMPANY INFORMATION
> | http://www.semantic-web.at/
> 
> PERSONAL INFORMATION
> | web   : http://www.turnguard.com
> | foaf  : http://www.turnguard.com/turnguard
> | skype : jakobitsch-punkt
> <RDFACatalog.zip><CatalogManager.properties>

Social Web Architect
http://bblfish.net/



-- 
| Jürgen Jakobitsch, 
| Software Developer
| Semantic Web Company GmbH
| Mariahilfer Straße 70 / Neubaugasse 1, Top 8
| A - 1070 Wien, Austria
| Mob +43 676 62 12 710 | Fax +43.1.402 12 35 - 22

COMPANY INFORMATION
| http://www.semantic-web.at/

PERSONAL INFORMATION
| web   : http://www.turnguard.com
| foaf  : http://www.turnguard.com/turnguard
| skype : jakobitsch-punkt
Received on Saturday, 7 January 2012 14:03:42 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Saturday, 7 January 2012 14:03:42 GMT