- From: Christoph LANGE <math.semantic.web@gmail.com>
- Date: Thu, 17 Oct 2013 15:39:30 +0100
- To: public-rdfa@w3.org
Dear all, let me try again. I phrased the subject of this email in a catchier way. I believe that, when an open access publisher that is a big player at least in the field of computer science workshop, introduces RDFa, this has the potential to become a very interesting use case for RDFa. (Please see also our blog at http://ceurws.wordpress.com/ for further planned innovations.) While I think I have very good knowledge of RDFa, we are in an early phase of implementing RDFa in the specific setting of CEUR-WS.org. Therefore we would highly appreciate any input on how to get our RDFa implementation right. Please see below for the gory technical details. Cheers, and thanks in advance, Christoph (CEUR-WS.org technical editor) On 2013-10-10 16:54, Christoph LANGE wrote: > Dear RDFa community, > > I am writing in the role of technical editor of the CEUR-WS.org open > access publishing service (http://ceur-ws.org/), which many of you have > used before. > > We provide a tool that allows proceedings editors to include RDFa > annotations into their tables of content > (https://github.com/clange/ceur-make). FYI: roughly 1 in 6 proceedings > volumes has been using RDFa recently. > > We are now possibly running into a problem by having changed the > official URLs of our volume pages from, e.g., > http://ceur-ws.org/Vol-994/ into http://ceur-ws.org/Vol-994, i.e. > dropping the trailing slash. In short, RDFa requested from > http://ceur-ws.org/Vol-994 contains broken URIs in outgoing links, as > RDFa clients don't seem to follow the "HTTP 301 Moved Permanently", > which points from the slash-less URL to the slashed URL (which still > exists, as our server-side directory layout hasn't changed). And I'm > wondering whether that's something we should expect an RDFa client to > do, or whether we need to fix our RDFa instead. > > Our rationale for dropping the trailing slash was the following: > > 1. While at the moment all papers inside our volumes are PDF files, e.g. > http://ceur-ws.org/Vol-994/paper-01.pdf, we are thinking about other > content types (see > http://ceurws.wordpress.com/2013/09/25/is-a-paper-just-a-pdf-file/), in > particular directories containing accompanying data such as original > research data, and the main entry point to such a paper could then be > another HTML page in a subdirectory. > > 2. As the user (here we mean a human using a browser) should not be > responsible for knowing whether a paper, or a volume, is a file or a > directory, we thought we'd use slash-less URLs throughout, and then let > the server tell the browser (and thus the user) when some resource > actually is a directory. > > (Do these considerations make sense?) > > This behaviour is implemented as follows (irrelevant headers stripped): > > $ wget -O /dev/null -S http://ceur-ws.org/Vol-1010 > --2013-10-10 16:33:57-- http://ceur-ws.org/Vol-1010 > Resolving ceur-ws.org... 137.226.34.227 > Connecting to ceur-ws.org|137.226.34.227|:80... connected. > HTTP request sent, awaiting response... > HTTP/1.1 301 Moved Permanently > Location: http://ceur-ws.org/Vol-1010/ > Location: http://ceur-ws.org/Vol-1010/ [following] > --2013-10-10 16:33:57-- http://ceur-ws.org/Vol-1010/ > Reusing existing connection to ceur-ws.org:80. > HTTP request sent, awaiting response... > HTTP/1.1 200 OK > > But now RDFa clients don't seem to respect this redirect. Please try > for yourself with http://www.w3.org/2012/pyRdfa/ and > http://linkeddata.uriburner.com/. These are two freely accessible RDFa > extractors I could think of, and I think they are based on different > implementations. (Am I right?) > > When you enter a slashed URI, e.g. http://ceur-ws.org/Vol-1010/, you get > correct RDFa, in particular outgoing links to, e.g., > http://ceur-ws.org/Vol-1010/paper-01.pdf. When you enter the same URI > without a slash, the relative URIs that point from index.html to the > papers like <ol rel="dcterms:hasPart"><li about="paper-01.pdf"> resolve > to http://ceur-ws.org/paper-01.pdf. > > Now I have the following questions: > > Are these RDFa clients broken? > > If they are not broken, what is broken on our side, and how can we fix it? > > Is it acceptable that RDFa retrieved from a slash-less URL is broken, > whereas RDFa from the slashed URL works? > > Is it OK to say that the "canonical URL" of something should be > slash-less, whereas the "semantic identifier" of the same thing (if > that's what we mean by its RDFa URI) should have a slash? Or should > both be the same? (Note: I am well aware of the difference between > information resources and non-information resources, but IMHO this > difference doesn't apply here, as we publish online proceedings. > http://ceur-ws.org/Vol-1010 _is_ the workshop volume, which has editors > and contains papers; it is not just a page that describes the workshop > volume.) > > Is there an acceptable way of indicating in my RDFa that the slashed > version of the URL is to be preferred? It would be easy for us to put > an explicit about="http://ceur-ws.org/Vol-1010/" into all index.html > files. But this would still leave relative about="..." links broken > when RDFa is requested from the slash-less URL, as these are resolved > against the then slash-less base URI of the document. > > Or do we finally have to make all outgoing RDFa links more explicit, > e.g. by using about="/Vol-1010/paper-01.pdf"? That wouldn't be much of > a problem, as the RDFa is generated by a script anyway, but it would > once more make the script's output less readable. > > Cheers, and many thanks in advance for your advice, > > Christoph > -- Christoph Lange, School of Computer Science, University of Birmingham http://cs.bham.ac.uk/~langec/, Skype duke4701 → Mathematics in Computer Science Special Issue on “Enabling Domain Experts to use Formalised Reasoning”; submission until 31 October. http://cs.bham.ac.uk/research/projects/formare/pubs/mcs-doform/
Received on Thursday, 17 October 2013 14:39:47 UTC