Computer science publisher needs help with RDFa/HTTP technical issue [Re: How are RDFa clients expected to handle 301 Moved Permanently?]

Dear all,

it seems the RDFa mailing list is not that active any more, as I haven't
got an answer for this question for two weeks.  As my question is also
related to LOD publishing, let me try to ask it here.  We, the
publishers of CEUR-WS.org, are facing a technical issue involving RDFa
and hash vs. slash URIs/URLs.

I believe that, when an open access publisher that is a big player at
least in the field of computer science workshops, introduces RDFa, this
has the potential to become a very interesting use case for RDFa.
(Please see also our blog at http://ceurws.wordpress.com/ for further
planned innovations.)

While I think I have very good knowledge of RDFa, we are in an early
phase of implementing RDFa in the specific setting of CEUR-WS.org.
Therefore we would highly appreciate any input on how to get our RDFa
implementation right.  Please see below for the original message with
the gory technical details.

Cheers, and thanks in advance,

Christoph (CEUR-WS.org technical editor)

On 2013-10-10 16:54, Christoph LANGE wrote:
> Dear RDFa community,
>
> I am writing in the role of technical editor of the CEUR-WS.org open
> access publishing service (http://ceur-ws.org/), which many of you have
> used before.
>
> We provide a tool that allows proceedings editors to include RDFa
> annotations into their tables of content
> (https://github.com/clange/ceur-make).  FYI: roughly 1 in 6 proceedings
> volumes has been using RDFa recently.
>
> We are now possibly running into a problem by having changed the
> official URLs of our volume pages from, e.g.,
> http://ceur-ws.org/Vol-994/ into http://ceur-ws.org/Vol-994, i.e.
> dropping the trailing slash.  In short, RDFa requested from
> http://ceur-ws.org/Vol-994 contains broken URIs in outgoing links, as
> RDFa clients don't seem to follow the "HTTP 301 Moved Permanently",
> which points from the slash-less URL to the slashed URL (which still
> exists, as our server-side directory layout hasn't changed).  And I'm
> wondering whether that's something we should expect an RDFa client to
> do, or whether we need to fix our RDFa instead.
>
> Our rationale for dropping the trailing slash was the following:
>
> 1. While at the moment all papers inside our volumes are PDF files, e.g.
> http://ceur-ws.org/Vol-994/paper-01.pdf, we are thinking about other
> content types (see
> http://ceurws.wordpress.com/2013/09/25/is-a-paper-just-a-pdf-file/), in
> particular directories containing accompanying data such as original
> research data, and the main entry point to such a paper could then be
> another HTML page in a subdirectory.
>
> 2. As the user (here we mean a human using a browser) should not be
> responsible for knowing whether a paper, or a volume, is a file or a
> directory, we thought we'd use slash-less URLs throughout, and then let
> the server tell the browser (and thus the user) when some resource
> actually is a directory.
>
> (Do these considerations make sense?)
>
> This behaviour is implemented as follows (irrelevant headers stripped):
>
> $ wget -O /dev/null -S http://ceur-ws.org/Vol-1010
> --2013-10-10 16:33:57--  http://ceur-ws.org/Vol-1010
> Resolving ceur-ws.org... 137.226.34.227
> Connecting to ceur-ws.org|137.226.34.227|:80... connected.
> HTTP request sent, awaiting response...
>    HTTP/1.1 301 Moved Permanently
>    Location: http://ceur-ws.org/Vol-1010/
> Location: http://ceur-ws.org/Vol-1010/ [following]
> --2013-10-10 16:33:57--  http://ceur-ws.org/Vol-1010/
> Reusing existing connection to ceur-ws.org:80.
> HTTP request sent, awaiting response...
>    HTTP/1.1 200 OK
>
> But now RDFa clients don't seem to respect this redirect.  Please try
> for yourself with http://www.w3.org/2012/pyRdfa/ and
> http://linkeddata.uriburner.com/.  These are two freely accessible RDFa
> extractors I could think of, and I think they are based on different
> implementations.  (Am I right?)
>
> When you enter a slashed URI, e.g. http://ceur-ws.org/Vol-1010/, you get
> correct RDFa, in particular outgoing links to, e.g.,
> http://ceur-ws.org/Vol-1010/paper-01.pdf.  When you enter the same URI
> without a slash, the relative URIs that point from index.html to the
> papers like <ol rel="dcterms:hasPart"><li about="paper-01.pdf"> resolve
> to http://ceur-ws.org/paper-01.pdf.
>
> Now I have the following questions:
>
> Are these RDFa clients broken?
>
> If they are not broken, what is broken on our side, and how can we
fix it?
>
> Is it acceptable that RDFa retrieved from a slash-less URL is broken,
> whereas RDFa from the slashed URL works?
>
> Is it OK to say that the "canonical URL" of something should be
> slash-less, whereas the "semantic identifier" of the same thing (if
> that's what we mean by its RDFa URI) should have a slash?  Or should
> both be the same?  (Note: I am well aware of the difference between
> information resources and non-information resources, but IMHO this
> difference doesn't apply here, as we publish online proceedings.
> http://ceur-ws.org/Vol-1010 _is_ the workshop volume, which has editors
> and contains papers; it is not just a page that describes the workshop
> volume.)
>
> Is there an acceptable way of indicating in my RDFa that the slashed
> version of the URL is to be preferred?  It would be easy for us to put
> an explicit about="http://ceur-ws.org/Vol-1010/" into all index.html
> files.  But this would still leave relative about="..." links broken
> when RDFa is requested from the slash-less URL, as these are resolved
> against the then slash-less base URI of the document.
>
> Or do we finally have to make all outgoing RDFa links more explicit,
> e.g. by using about="/Vol-1010/paper-01.pdf"?  That wouldn't be much of
> a problem, as the RDFa is generated by a script anyway, but it would
> once more make the script's output less readable.
>
> Cheers, and many thanks in advance for your advice,
>
> Christoph
>


-- 
Christoph Lange, School of Computer Science, University of Birmingham
http://cs.bham.ac.uk/~langec/, Skype duke4701

→ Mathematics in Computer Science Special Issue on “Enabling Domain
  Experts to use Formalised Reasoning”; submission until 31 October.
  http://cs.bham.ac.uk/research/projects/formare/pubs/mcs-doform/

Received on Friday, 25 October 2013 16:03:21 UTC