- From: Christoph LANGE <math.semantic.web@gmail.com>
- Date: Thu, 10 Oct 2013 16:54:00 +0100
- To: public-rdfa@w3.org
Dear RDFa community, I am writing in the role of technical editor of the CEUR-WS.org open access publishing service (http://ceur-ws.org/), which many of you have used before. We provide a tool that allows proceedings editors to include RDFa annotations into their tables of content (https://github.com/clange/ceur-make). FYI: roughly 1 in 6 proceedings volumes has been using RDFa recently. We are now possibly running into a problem by having changed the official URLs of our volume pages from, e.g., http://ceur-ws.org/Vol-994/ into http://ceur-ws.org/Vol-994, i.e. dropping the trailing slash. In short, RDFa requested from http://ceur-ws.org/Vol-994 contains broken URIs in outgoing links, as RDFa clients don't seem to follow the "HTTP 301 Moved Permanently", which points from the slash-less URL to the slashed URL (which still exists, as our server-side directory layout hasn't changed). And I'm wondering whether that's something we should expect an RDFa client to do, or whether we need to fix our RDFa instead. Our rationale for dropping the trailing slash was the following: 1. While at the moment all papers inside our volumes are PDF files, e.g. http://ceur-ws.org/Vol-994/paper-01.pdf, we are thinking about other content types (see http://ceurws.wordpress.com/2013/09/25/is-a-paper-just-a-pdf-file/), in particular directories containing accompanying data such as original research data, and the main entry point to such a paper could then be another HTML page in a subdirectory. 2. As the user (here we mean a human using a browser) should not be responsible for knowing whether a paper, or a volume, is a file or a directory, we thought we'd use slash-less URLs throughout, and then let the server tell the browser (and thus the user) when some resource actually is a directory. (Do these considerations make sense?) This behaviour is implemented as follows (irrelevant headers stripped): $ wget -O /dev/null -S http://ceur-ws.org/Vol-1010 --2013-10-10 16:33:57-- http://ceur-ws.org/Vol-1010 Resolving ceur-ws.org... 137.226.34.227 Connecting to ceur-ws.org|137.226.34.227|:80... connected. HTTP request sent, awaiting response... HTTP/1.1 301 Moved Permanently Location: http://ceur-ws.org/Vol-1010/ Location: http://ceur-ws.org/Vol-1010/ [following] --2013-10-10 16:33:57-- http://ceur-ws.org/Vol-1010/ Reusing existing connection to ceur-ws.org:80. HTTP request sent, awaiting response... HTTP/1.1 200 OK But now RDFa clients don't seem to respect this redirect. Please try for yourself with http://www.w3.org/2012/pyRdfa/ and http://linkeddata.uriburner.com/. These are two freely accessible RDFa extractors I could think of, and I think they are based on different implementations. (Am I right?) When you enter a slashed URI, e.g. http://ceur-ws.org/Vol-1010/, you get correct RDFa, in particular outgoing links to, e.g., http://ceur-ws.org/Vol-1010/paper-01.pdf. When you enter the same URI without a slash, the relative URIs that point from index.html to the papers like <ol rel="dcterms:hasPart"><li about="paper-01.pdf"> resolve to http://ceur-ws.org/paper-01.pdf. Now I have the following questions: Are these RDFa clients broken? If they are not broken, what is broken on our side, and how can we fix it? Is it acceptable that RDFa retrieved from a slash-less URL is broken, whereas RDFa from the slashed URL works? Is it OK to say that the "canonical URL" of something should be slash-less, whereas the "semantic identifier" of the same thing (if that's what we mean by its RDFa URI) should have a slash? Or should both be the same? (Note: I am well aware of the difference between information resources and non-information resources, but IMHO this difference doesn't apply here, as we publish online proceedings. http://ceur-ws.org/Vol-1010 _is_ the workshop volume, which has editors and contains papers; it is not just a page that describes the workshop volume.) Is there an acceptable way of indicating in my RDFa that the slashed version of the URL is to be preferred? It would be easy for us to put an explicit about="http://ceur-ws.org/Vol-1010/" into all index.html files. But this would still leave relative about="..." links broken when RDFa is requested from the slash-less URL, as these are resolved against the then slash-less base URI of the document. Or do we finally have to make all outgoing RDFa links more explicit, e.g. by using about="/Vol-1010/paper-01.pdf"? That wouldn't be much of a problem, as the RDFa is generated by a script anyway, but it would once more make the script's output less readable. Cheers, and many thanks in advance for your advice, Christoph -- Christoph Lange, School of Computer Science, University of Birmingham http://cs.bham.ac.uk/~langec/, Skype duke4701 → Mathematics in Computer Science Special Issue on “Enabling Domain Experts to use Formalised Reasoning”; submission until 31 October. http://cs.bham.ac.uk/research/projects/formare/pubs/mcs-doform/
Received on Thursday, 10 October 2013 15:54:17 UTC