- From: Henry Story <henry.story@bblfish.net>
- Date: Sun, 8 Jan 2012 19:55:10 +0100
- To: Jürgen Jakobitsch <j.jakobitsch@semantic-web.at>
- Cc: "public-xg-webid@w3.org XG" <public-xg-webid@w3.org>, Damian Steer <pldms@mac.com>
On 8 Jan 2012, at 19:18, Jürgen Jakobitsch wrote: > hi henry, > > my rdfa test profile [1] now passes tests on your site [2]. Great. Even with the #j ? ;-) > its also so fast, that i infer, that the catalog is working, right? In fact I have not added the catalog code. I have added code to monitor outward connections to see if those were being called at all. Because I wanted to be able to be able to duplicate the issue before fixing it - since otherwise there is no way for me to know if any fixes I apply fix anything. So I'll keep monitoring the server until I can duplicate the problem, then apply the fix. It could be that upgrading to the latest Jena on the Apache source code fixed something... Henry > > wkr j > > > [1] http://2sea.org/sea.jsp#j > [2] https://foafssl.org/test/WebId > > ----- Original Message ----- > From: "Henry Story" <henry.story@bblfish.net> > To: "Jürgen Jakobitsch" <j.jakobitsch@semantic-web.at> > Cc: "public-xg-webid@w3.org XG" <public-xg-webid@w3.org>, "Damian Steer" <pldms@mac.com> > Sent: Saturday, January 7, 2012 2:46:10 PM > Subject: Re: rdfa parsing issue -- was: fixed https://foafssl.org/test/WebId > > > On 6 Jan 2012, at 22:03, Jürgen Jakobitsch wrote: > >> henry, >> >> i had exactly the same problem with rdfa parser from openrdf and DTD. > > Do you know of a good way to reproduce this? I have tried with your profile > > http://2sea.org/sea.jsp > > but when I try this locally with pretty much the same code base I never get > the same error > >> ERROR [pool-3-thread-5] (RDFDefaultErrorHandler.java:40) - http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd(line 106 column 22): {E213} Unexpected end of file from server > > So it's a bit difficult for me to work out if I am fixing the problem. > >> >> >> what you wanna do is : >> >> 1. create a catalog (catalog.xml and download all DTDs) >> 2. add file "CatalogManager.properties" to the classpath (in a maven project on netbeans you would simply put it in "other resources", so it gets jar'd) >> 3. modify the code so the xml reader uses that catalog. >> >> my parser looks about so : >> >> CatalogResolver catRes >> Transformer transformer >> >> constructor >> >> TransformerFactory transFact = TransformerFactory.newInstance(); >> CatalogManager catMan = new CatalogManager("CatalogManager.properties"); >> catRes = new CatalogResolver(catMan); >> ClassLoader cl = RDFaParser.class.getClassLoader(); >> Templates cachedXSLT = transFact.newTemplates(new StreamSource(cl.getResourceAsStream(XSLT))); >> transformer = cachedXSLT.newTransformer(); >> transformer.setURIResolver(catRes); >> >> parserMethod (StreamSource source) >> >> XMLReader reader = XMLReaderFactory.createXMLReader(); >> reader.setEntityResolver(catRes); >> reader.setFeature("http://xml.org/sax/features/validation", Boolean.FALSE); >> Source sXML=new SAXSource(reader, new InputSource(source.getInputStream())); >> transformer.transform(sXML, new StreamResult(out)); >> >> >> it took me some time to get this catalog thing up and running, here are some links, >> >> 1. http://www.w3.org/blog/systeam/2008/02/08/w3c_s_excessive_dtd_traffic/ (see here for the overall trouble) >> 2. http://xml.apache.org/commons/components/resolver/resolver-article.html >> 3. http://nwalsh.com/docs/articles/xml2003/ >> 4. http://xerces.apache.org/xerces2-j/faq-xcatalogs.html >> 5. http://www.sagehill.net/docbookxsl/WriteCatalog.html >> 6. http://xml.apache.org/mirrors.cgi (download apache's resolver) >> >> to save you some time : >> >> 1. find the catalog.xml and the catalog in use by WebIDRealm attached (copy the contents of catalog.zip to /usr/share/catalogs/ and make sure the files are readable) >> if you copy the contents elsewhere you need to change the path in CatalogManager.properties as well. >> 2. find the CatalogManager.properties in use by WebIDRealm attached >> >> the basic workflow would : >> >> 1. CatalogManager reads CatalogManager.properties and finds path of catalog.xml >> 2. when resolving CatalogResolver looks in catalog.xml to see, if the docType (or module) is mapped there and tries to find the file in >> path specified in the catalog.xml. >> >> >> >> if you have any questions regarding the catalog feel free to ask. >> >> wkr j >> >> ----- Original Message ----- >> From: "Henry Story" <henry.story@bblfish.net> >> To: "Damian Steer" <pldms@mac.com> >> Cc: "Jürgen Jakobitsch" <j.jakobitsch@semantic-web.at>, "public-xg-webid@w3.org XG" <public-xg-webid@w3.org> >> Sent: Friday, January 6, 2012 9:10:58 PM >> Subject: Re: rdfa parsing issue -- was: fixed https://foafssl.org/test/WebId >> >> Thanks Damian, >> >> that was very helpful. >> >> I have now fixed a couple of issues on my side now, and I see that Jürgen has updated his xhtml even to be closer to xhtml. So the foafssl.org tester should work with that resource in any case. >> >> Btw, I get the following in the logs >> >> INF: [console logger] dispatch: 2sea.org GET /sea.jsp HTTP/1.1 >> ERROR [pool-3-thread-5] (RDFDefaultErrorHandler.java:40) - http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd(line 106 column 22): {E213} Unexpected end of file from server >> >> It looks like the RDFa parser is following the DTDs. Is there a way to stop that? I guess the W3C does not serve those files. >> >> Henry >> >> >> On 6 Jan 2012, at 15:57, Damian Steer wrote: >> >>> Hi Henry and Jürgen, >>> >>> On 06/01/12 12:49, Henry Story wrote: >>> >>>> Shellac's parser parses the xhtml correctly as xhtml in fact, but >>>> when the html parser is used it comes to a different conclusion. >>> >>> Yes, this is becoming a classic issue, and has nothing to do with RDFa >>> (although RDFa obscures the issue horribly). >>> >>>> RDFA 1 is defined in xhtml only I understand, so it is true that we >>>> are going beyond what the spec by trying to parse html too. Perhaps >>>> this will be a lot simplified with rdfa1.1 which can be made to work >>>> with html5. >>> >>> Yes, RDFa 1.0 is only really defined for xhtml, although useful work was >>> done on html 5 at the time (there are some html 5 tests). RDFa 1.1 does >>> address html 5, but note that it doesn't change anything here. >>> >>> The problem is this: >>> >>> <div rel="foaf:depiction" href="http://2sea.org/2sealogo.png"/> >>> <div rel="cert:key"> >>> ... >>> </div> >>> >>> An xml parser sees a closed div, followed by another div. An html parser >>> sees a broken div so repairs it as follows: >>> >>> <div rel="foaf:depiction" href="http://2sea.org/2sealogo.png"> >>> <div rel="cert:key"> >>> ... >>> </div> >>> </div> <!-- close that div --> >>> >>> i.e. one div contains another now, and thus you find >>> >>> <http://2sea.org/2sealogo.png> cert:key .... >>> >>> I ought to add a utility to switch the parser based on content type, >>> however in practice there's so much broken xhtml out there that tag soup >>> parsing is much safer (although it does lead to issues like this). >>> >>> My advice would be to expect tag soup parsing in the wild and change the >>> html: >>> >>> <div rel="foaf:depiction" href="http://2sea.org/2sealogo.png"></div> >>> >>> Hope this makes sense, >>> >>> Damian >> >> Social Web Architect >> http://bblfish.net/ >> >> >> >> -- >> | Jürgen Jakobitsch, >> | Software Developer >> | Semantic Web Company GmbH >> | Mariahilfer Straße 70 / Neubaugasse 1, Top 8 >> | A - 1070 Wien, Austria >> | Mob +43 676 62 12 710 | Fax +43.1.402 12 35 - 22 >> >> COMPANY INFORMATION >> | http://www.semantic-web.at/ >> >> PERSONAL INFORMATION >> | web : http://www.turnguard.com >> | foaf : http://www.turnguard.com/turnguard >> | skype : jakobitsch-punkt >> <RDFACatalog.zip><CatalogManager.properties> > > Social Web Architect > http://bblfish.net/ > > > > -- > | Jürgen Jakobitsch, > | Software Developer > | Semantic Web Company GmbH > | Mariahilfer Straße 70 / Neubaugasse 1, Top 8 > | A - 1070 Wien, Austria > | Mob +43 676 62 12 710 | Fax +43.1.402 12 35 - 22 > > COMPANY INFORMATION > | http://www.semantic-web.at/ > > PERSONAL INFORMATION > | web : http://www.turnguard.com > | foaf : http://www.turnguard.com/turnguard > | skype : jakobitsch-punkt Social Web Architect http://bblfish.net/
Received on Sunday, 8 January 2012 18:55:43 UTC