Re: Question about "paths as URIs" in the BBC RDF

Hello!

On Thu, Jan 28, 2010 at 8:36 PM, Dan Brickley <danbri@danbri.org> wrote:
> On Thu, Jan 28, 2010 at 7:56 PM, Ross Singer <rossfsinger@gmail.com> wrote:
>> Hi, I have a question about something I've run across when trying to
>> parse the RDF coming from the BBC.  If you take a document like:
>>
>> http://www.bbc.co.uk/music/artists/72c536dc-7137-4477-a521-567eeb840fa8.rdf
>>
>> notice how all of the URIs are paths, but there's no xml:base to
>> declare where these actual paths may reside.
>>
>> If I point rapper at that URI, it brings me back fully qualified URIs:
>> <http://www.bbc.co.uk/music/artists/72c536dc-7137-4477-a521-567eeb840fa8#artist>
>>
>> but the only way I can figure it's able to do that is for the parser
>> and the HTTP agent to be in cahoots somehow, which seems like a
>> breakdown in the separation of concerns -- this document is useless,
>> except in the context of living on www.bbc.co.uk.  The moment I cache
>> it to my local system, if I'm understanding it correctly, it's now
>> asserting these things about my filesystem (effectively).  Rapper now
>> says:
>> <file:///music/artists/72c536dc-7137-4477-a521-567eeb840fa8#artist>
>>
>> So my questions would be:
>> 1) Is this "valid"?
>> 2) If so, is there an expectation of the parser being aware of the URI
>> of retrieval? (I have written my own set of parsers, so I'd need to
>> rethink this assumption, if so)
>> 3) How do other client libraries handle this?
>
> Hi Ross,
>
> The relevant specs are
>
> http://www.w3.org/TR/2004/REC-rdf-syntax-grammar-20040210/#section-Syntax-ID-xml-base
>
> "The XML Infoset provides a base URI attribute xml:base that sets the
> base URI for resolving relative RDF URI references, otherwise the base
> URI is that of the document. The base URI applies to all RDF/XML
> attributes that deal with RDF URI references which are rdf:about,
> rdf:resource, rdf:ID and rdf:datatype."

This is only relevant for things like rdf:resource="some/path", not
rdf:resource="/some/path". In the latter case, the URI will be
resolved from the root path, in our case http://www.bbc.co.uk/ (this
is the same in the corresponding XHTML pages, btw). I never took a
look at the relevant spec (yes, I know, it's bad :-)), but all parsers
seem to understand it correctly...

As far as caching is concerned, you'd need to parse it and then cache
it, not store the documents as is. It would cause issues even in the
case of simple relative paths (e.g. rdf:about="#me", as in most FOAF
files) not to do it.

I hope that helps!
Cheers,
y

>
> http://www.faqs.org/rfcs/rfc2396.html which specifies relative URI
> processing given a base URI.
>
> I think most of what you need is in :5.1. Establishing a Base URI" there.
>
> cheers,
>
> Dan
>
>

Received on Thursday, 28 January 2010 20:54:58 UTC