- From: Henry Story <henry.story@bblfish.net>
- Date: Fri, 6 Jan 2012 23:08:23 +0100
- To: Jürgen Jakobitsch <j.jakobitsch@semantic-web.at>
- Cc: "public-xg-webid@w3.org XG" <public-xg-webid@w3.org>, Damian Steer <pldms@mac.com>, Carvalho Melvin <mel@mel.vn>
Wow. This is really helpful Jürgen. If I could give out bonus points for good answers this one should be it...
... Well I was about to do that using Melvin's OpenTabs webid enabled service. I added you to my foaf file then logged on to
http://opentabs.data.fm/ and was about to give you 50% of my wealth ($5) but it looks like you don't appear in my friends list there! Melvin can you give Jürgen 10 bonus points (at least)?
---
Btw. I think it is also Damian's code that is in Sesame ( and I think it is I who ported that code to Sesame a while ago to
tell the truth! :-| )
Now I wonder if this is something that should be incorporated into a new RDFa release. What do you think Damian? It's probably worth fixing this at the source.
Do these DTDs change constantly? I mean do new ones appear regularly? I found it odd to find DTDs with ruby in the title there... Looks like people are making up names in the w3c namespace.
Henry
On 6 Jan 2012, at 22:03, Jürgen Jakobitsch wrote:
> henry,
>
> i had exactly the same problem with rdfa parser from openrdf and DTD.
>
>
> what you wanna do is :
>
> 1. create a catalog (catalog.xml and download all DTDs)
> 2. add file "CatalogManager.properties" to the classpath (in a maven project on netbeans you would simply put it in "other resources", so it gets jar'd)
> 3. modify the code so the xml reader uses that catalog.
>
> my parser looks about so :
>
> CatalogResolver catRes
> Transformer transformer
>
> constructor
>
> TransformerFactory transFact = TransformerFactory.newInstance();
> CatalogManager catMan = new CatalogManager("CatalogManager.properties");
> catRes = new CatalogResolver(catMan);
> ClassLoader cl = RDFaParser.class.getClassLoader();
> Templates cachedXSLT = transFact.newTemplates(new StreamSource(cl.getResourceAsStream(XSLT)));
> transformer = cachedXSLT.newTransformer();
> transformer.setURIResolver(catRes);
>
> parserMethod (StreamSource source)
>
> XMLReader reader = XMLReaderFactory.createXMLReader();
> reader.setEntityResolver(catRes);
> reader.setFeature("http://xml.org/sax/features/validation", Boolean.FALSE);
> Source sXML=new SAXSource(reader, new InputSource(source.getInputStream()));
> transformer.transform(sXML, new StreamResult(out));
>
>
> it took me some time to get this catalog thing up and running, here are some links,
>
> 1. http://www.w3.org/blog/systeam/2008/02/08/w3c_s_excessive_dtd_traffic/ (see here for the overall trouble)
> 2. http://xml.apache.org/commons/components/resolver/resolver-article.html
> 3. http://nwalsh.com/docs/articles/xml2003/
> 4. http://xerces.apache.org/xerces2-j/faq-xcatalogs.html
> 5. http://www.sagehill.net/docbookxsl/WriteCatalog.html
> 6. http://xml.apache.org/mirrors.cgi (download apache's resolver)
>
> to save you some time :
>
> 1. find the catalog.xml and the catalog in use by WebIDRealm attached (copy the contents of catalog.zip to /usr/share/catalogs/ and make sure the files are readable)
> if you copy the contents elsewhere you need to change the path in CatalogManager.properties as well.
> 2. find the CatalogManager.properties in use by WebIDRealm attached
>
> the basic workflow would :
>
> 1. CatalogManager reads CatalogManager.properties and finds path of catalog.xml
> 2. when resolving CatalogResolver looks in catalog.xml to see, if the docType (or module) is mapped there and tries to find the file in
> path specified in the catalog.xml.
>
>
>
> if you have any questions regarding the catalog feel free to ask.
>
> wkr j
>
> ----- Original Message -----
> From: "Henry Story" <henry.story@bblfish.net>
> To: "Damian Steer" <pldms@mac.com>
> Cc: "Jürgen Jakobitsch" <j.jakobitsch@semantic-web.at>, "public-xg-webid@w3.org XG" <public-xg-webid@w3.org>
> Sent: Friday, January 6, 2012 9:10:58 PM
> Subject: Re: rdfa parsing issue -- was: fixed https://foafssl.org/test/WebId
>
> Thanks Damian,
>
> that was very helpful.
>
> I have now fixed a couple of issues on my side now, and I see that Jürgen has updated his xhtml even to be closer to xhtml. So the foafssl.org tester should work with that resource in any case.
>
> Btw, I get the following in the logs
>
> INF: [console logger] dispatch: 2sea.org GET /sea.jsp HTTP/1.1
> ERROR [pool-3-thread-5] (RDFDefaultErrorHandler.java:40) - http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd(line 106 column 22): {E213} Unexpected end of file from server
>
> It looks like the RDFa parser is following the DTDs. Is there a way to stop that? I guess the W3C does not serve those files.
>
> Henry
>
>
> On 6 Jan 2012, at 15:57, Damian Steer wrote:
>
>> Hi Henry and Jürgen,
>>
>> On 06/01/12 12:49, Henry Story wrote:
>>
>>> Shellac's parser parses the xhtml correctly as xhtml in fact, but
>>> when the html parser is used it comes to a different conclusion.
>>
>> Yes, this is becoming a classic issue, and has nothing to do with RDFa
>> (although RDFa obscures the issue horribly).
>>
>>> RDFA 1 is defined in xhtml only I understand, so it is true that we
>>> are going beyond what the spec by trying to parse html too. Perhaps
>>> this will be a lot simplified with rdfa1.1 which can be made to work
>>> with html5.
>>
>> Yes, RDFa 1.0 is only really defined for xhtml, although useful work was
>> done on html 5 at the time (there are some html 5 tests). RDFa 1.1 does
>> address html 5, but note that it doesn't change anything here.
>>
>> The problem is this:
>>
>> <div rel="foaf:depiction" href="http://2sea.org/2sealogo.png"/>
>> <div rel="cert:key">
>> ...
>> </div>
>>
>> An xml parser sees a closed div, followed by another div. An html parser
>> sees a broken div so repairs it as follows:
>>
>> <div rel="foaf:depiction" href="http://2sea.org/2sealogo.png">
>> <div rel="cert:key">
>> ...
>> </div>
>> </div> <!-- close that div -->
>>
>> i.e. one div contains another now, and thus you find
>>
>> <http://2sea.org/2sealogo.png> cert:key ....
>>
>> I ought to add a utility to switch the parser based on content type,
>> however in practice there's so much broken xhtml out there that tag soup
>> parsing is much safer (although it does lead to issues like this).
>>
>> My advice would be to expect tag soup parsing in the wild and change the
>> html:
>>
>> <div rel="foaf:depiction" href="http://2sea.org/2sealogo.png"></div>
>>
>> Hope this makes sense,
>>
>> Damian
>
> Social Web Architect
> http://bblfish.net/
>
>
>
> --
> | Jürgen Jakobitsch,
> | Software Developer
> | Semantic Web Company GmbH
> | Mariahilfer Straße 70 / Neubaugasse 1, Top 8
> | A - 1070 Wien, Austria
> | Mob +43 676 62 12 710 | Fax +43.1.402 12 35 - 22
>
> COMPANY INFORMATION
> | http://www.semantic-web.at/
>
> PERSONAL INFORMATION
> | web : http://www.turnguard.com
> | foaf : http://www.turnguard.com/turnguard
> | skype : jakobitsch-punkt
> <RDFACatalog.zip><CatalogManager.properties>
Social Web Architect
http://bblfish.net/
Received on Friday, 6 January 2012 22:08:56 UTC