Computer science publisher needs help with RDFa/HTTP technical issue [Re: How are RDFa clients expected to handle 301 Moved Permanently?]

Dear all,

let me try again.  I phrased the subject of this email in a catchier
way.  I believe that, when an open access publisher that is a big
player at least in the field of computer science workshop, introduces
RDFa, this has the potential to become a very interesting use case for
RDFa.  (Please see also our blog at http://ceurws.wordpress.com/ for
further planned innovations.)

While I think I have very good knowledge of RDFa, we are in an
early phase of implementing RDFa in the specific setting of
CEUR-WS.org.  Therefore we would highly appreciate any input on how to
get our RDFa implementation right.  Please see below for the gory
technical details.

Cheers, and thanks in advance,

Christoph (CEUR-WS.org technical editor)

On 2013-10-10 16:54, Christoph LANGE wrote:
 > Dear RDFa community,
 >
 > I am writing in the role of technical editor of the CEUR-WS.org open
 > access publishing service (http://ceur-ws.org/), which many of you have
 > used before.
 >
 > We provide a tool that allows proceedings editors to include RDFa
 > annotations into their tables of content
 > (https://github.com/clange/ceur-make).  FYI: roughly 1 in 6 proceedings
 > volumes has been using RDFa recently.
 >
 > We are now possibly running into a problem by having changed the
 > official URLs of our volume pages from, e.g.,
 > http://ceur-ws.org/Vol-994/ into http://ceur-ws.org/Vol-994, i.e.
 > dropping the trailing slash.  In short, RDFa requested from
 > http://ceur-ws.org/Vol-994 contains broken URIs in outgoing links, as
 > RDFa clients don't seem to follow the "HTTP 301 Moved Permanently",
 > which points from the slash-less URL to the slashed URL (which still
 > exists, as our server-side directory layout hasn't changed).  And I'm
 > wondering whether that's something we should expect an RDFa client to
 > do, or whether we need to fix our RDFa instead.
 >
 > Our rationale for dropping the trailing slash was the following:
 >
 > 1. While at the moment all papers inside our volumes are PDF files, e.g.
 > http://ceur-ws.org/Vol-994/paper-01.pdf, we are thinking about other
 > content types (see
 > http://ceurws.wordpress.com/2013/09/25/is-a-paper-just-a-pdf-file/), in
 > particular directories containing accompanying data such as original
 > research data, and the main entry point to such a paper could then be
 > another HTML page in a subdirectory.
 >
 > 2. As the user (here we mean a human using a browser) should not be
 > responsible for knowing whether a paper, or a volume, is a file or a
 > directory, we thought we'd use slash-less URLs throughout, and then let
 > the server tell the browser (and thus the user) when some resource
 > actually is a directory.
 >
 > (Do these considerations make sense?)
 >
 > This behaviour is implemented as follows (irrelevant headers stripped):
 >
 > $ wget -O /dev/null -S http://ceur-ws.org/Vol-1010
 > --2013-10-10 16:33:57--  http://ceur-ws.org/Vol-1010
 > Resolving ceur-ws.org... 137.226.34.227
 > Connecting to ceur-ws.org|137.226.34.227|:80... connected.
 > HTTP request sent, awaiting response...
 >    HTTP/1.1 301 Moved Permanently
 >    Location: http://ceur-ws.org/Vol-1010/
 > Location: http://ceur-ws.org/Vol-1010/ [following]
 > --2013-10-10 16:33:57--  http://ceur-ws.org/Vol-1010/
 > Reusing existing connection to ceur-ws.org:80.
 > HTTP request sent, awaiting response...
 >    HTTP/1.1 200 OK
 >
 > But now RDFa clients don't seem to respect this redirect.  Please try
 > for yourself with http://www.w3.org/2012/pyRdfa/ and
 > http://linkeddata.uriburner.com/.  These are two freely accessible RDFa
 > extractors I could think of, and I think they are based on different
 > implementations.  (Am I right?)
 >
 > When you enter a slashed URI, e.g. http://ceur-ws.org/Vol-1010/, you get
 > correct RDFa, in particular outgoing links to, e.g.,
 > http://ceur-ws.org/Vol-1010/paper-01.pdf.  When you enter the same URI
 > without a slash, the relative URIs that point from index.html to the
 > papers like <ol rel="dcterms:hasPart"><li about="paper-01.pdf"> resolve
 > to http://ceur-ws.org/paper-01.pdf.
 >
 > Now I have the following questions:
 >
 > Are these RDFa clients broken?
 >
 > If they are not broken, what is broken on our side, and how can we 
fix it?
 >
 > Is it acceptable that RDFa retrieved from a slash-less URL is broken,
 > whereas RDFa from the slashed URL works?
 >
 > Is it OK to say that the "canonical URL" of something should be
 > slash-less, whereas the "semantic identifier" of the same thing (if
 > that's what we mean by its RDFa URI) should have a slash?  Or should
 > both be the same?  (Note: I am well aware of the difference between
 > information resources and non-information resources, but IMHO this
 > difference doesn't apply here, as we publish online proceedings.
 > http://ceur-ws.org/Vol-1010 _is_ the workshop volume, which has editors
 > and contains papers; it is not just a page that describes the workshop
 > volume.)
 >
 > Is there an acceptable way of indicating in my RDFa that the slashed
 > version of the URL is to be preferred?  It would be easy for us to put
 > an explicit about="http://ceur-ws.org/Vol-1010/" into all index.html
 > files.  But this would still leave relative about="..." links broken
 > when RDFa is requested from the slash-less URL, as these are resolved
 > against the then slash-less base URI of the document.
 >
 > Or do we finally have to make all outgoing RDFa links more explicit,
 > e.g. by using about="/Vol-1010/paper-01.pdf"?  That wouldn't be much of
 > a problem, as the RDFa is generated by a script anyway, but it would
 > once more make the script's output less readable.
 >
 > Cheers, and many thanks in advance for your advice,
 >
 > Christoph
 >


-- 
Christoph Lange, School of Computer Science, University of Birmingham
http://cs.bham.ac.uk/~langec/, Skype duke4701

→ Mathematics in Computer Science Special Issue on “Enabling Domain
   Experts to use Formalised Reasoning”; submission until 31 October.
   http://cs.bham.ac.uk/research/projects/formare/pubs/mcs-doform/

Received on Thursday, 17 October 2013 14:39:47 UTC