Re: DCMI Metadata Terms - issues with the RDFa script, content negotiation, etc from Gregg Kellogg on 2012-05-18 (public-rdfa@w3.org from May 2012)

From: Gregg Kellogg <gregg@greggkellogg.net>
Date: Fri, 18 May 2012 13:14:44 -0400
To: Thomas Baker <tom@tombaker.org>
CC: Danny Ayers <danny.ayers@gmail.com>, Dan Brickley <danbri@danbri.org>, public-rdfa Community <public-rdfa@w3.org>, "hugh@hubns.com Barnes" <hugh@hubns.com>, Richard Cyganiak <richard.cyganiak@deri.org>, Stuart Sutton <sasutton@dublincore.net>, Jon Phipps <jphipps@madcreek.com>
Message-ID: <4791EE9C-6985-4E1B-9A57-A4F5C2D0930B@greggkellogg.net>

The following SPARQL will output just subjects in dcterms from the RDFa document. (Note that the raw URI won't work, as it returns text/plain rather than text/html, so it tries to parse it with N-Triples).

PREFIX dcterms: <http://purl.org/dc/terms/>
CONSTRUCT {?s ?p ?o}
FROM <http://rdf.greggkellogg.net/dcterms.html>
WHERE {
  ?s ?p ?o
  FILTER (regex(str(?s), str(dcterms:)))
}

You could define different queries for each subset of the data you'd like, and add boilerplate to the CONSTRUCT clause as necessary to add any other vocabulary-specific triples.

To get nicer prefix definitions out, you'd need to re-process, the tool doesn't currently pass the prefix definitions from the SPARQL query to the serializer, but I could probably get that working. Adding the prefix definitions in and re-parsing would work, though. Another SPARQL service may do this for you.

Gregg

On May 18, 2012, at 8:19 AM, Jon Phipps wrote:

> 
> On Fri, May 18, 2012 at 2:43 AM, Gregg Kellogg <gregg@greggkellogg.net> wrote:
> > The basic question is whether we need to have four (or five!) separate
> > RDF/XML and four (or five!) separate Turtle representations at all, or
> > can instead serve up just one dc.rdf and/or one dc.ttl.  What does
> > everything think?
> 
> You could also consider having for or five URLs all resolve to the same resource, you'd just get more triples than you would have before, but I don't see the harm in that.
> 
> This makes me very uncomfortable.
> 
> First, from a management & versioning standpoint you're by necessity versioning all 4 vocabs at the same time, regardless of which one actually changed. I know that dcmi doesn't identify the explicit version in the URI, but it seems wrong to try to manage multiple versions/variations of different vocabs from a single commit.
> 
> Second, although it might not matter much to a triple store, humans like me very often parse a vocab and having all four vocabs only available as a single resource is at the very least annoying. Certainly I can load the file into protégé or another editor and have some of the ordering done for me, but it's certainly less than desirable to have that as a requirement from my perspective.
> 
> Third, It significantly multiplies the bandwidth of the URI resolution cost for both downstream users and DCMI. For instance the current version of dc.rdf is 112,814 bytes and the current version of dcelements  is 17,424 bytes -- and increase of nearly 650%. This is uncompressed, but how many semweb clients support gzip encoding? When we did our study of dcmi's website several years ago the majority of the bandwidth consumption came from schema resolution and this is highly likely to dramatically affect dcmi's bandwidth consumption.
> 
> I don't object at all to the idea that there would be a single large dc.rdf file, but it should definitely be a matter of a specific request for it and not served by default. And it really should be built from its constituent vocabularies in order to let them be uniquely versioned, rather than the other way around.
> 
> Jon

Received on Friday, 18 May 2012 17:15:49 UTC