Re: DCMI Metadata Terms - issues with the RDFa script, content negotiation, etc from Jon Phipps on 2012-05-18 (public-rdfa@w3.org from May 2012)

From: Jon Phipps <jphipps@madcreek.com>
Date: Fri, 18 May 2012 11:19:23 -0400
To: Gregg Kellogg <gregg@greggkellogg.net>
Cc: Thomas Baker <tom@tombaker.org>, Danny Ayers <danny.ayers@gmail.com>, Dan Brickley <danbri@danbri.org>, public-rdfa <public-rdfa@w3.org>, "hugh@hubns.com" <hugh@hubns.com>, Richard Cyganiak <richard.cyganiak@deri.org>, Stuart Sutton <sasutton@dublincore.net>
Message-ID: <CAOyfVmHDxaeVAKXc699GOT56PFSN1T9iX+C_=pnHXKGzYi3WEQ@mail.gmail.com>

On Fri, May 18, 2012 at 2:43 AM, Gregg Kellogg <gregg@greggkellogg.net>wrote:

> > The basic question is whether we need to have four (or five!) separate
> > RDF/XML and four (or five!) separate Turtle representations at all, or
> > can instead serve up just one dc.rdf and/or one dc.ttl.  What does
> > everything think?
>
> You could also consider having for or five URLs all resolve to the same
> resource, you'd just get more triples than you would have before, but I
> don't see the harm in that.

This makes me very uncomfortable.

First, from a management & versioning standpoint you're by necessity
versioning all 4 vocabs at the same time, regardless of which one actually
changed. I know that dcmi doesn't identify the explicit version in the URI,
but it seems wrong to try to manage multiple versions/variations of
different vocabs from a single commit.

Second, although it might not matter much to a triple store, humans like me
very often parse a vocab and having all four vocabs only available as a
single resource is at the very least annoying. Certainly I can load the
file into protégé or another editor and have some of the ordering done for
me, but it's certainly less than desirable to have that as a requirement
from my perspective.

Third, It significantly multiplies the bandwidth of the URI resolution cost
for both downstream users and DCMI. For instance the current version of
dc.rdf is 112,814 bytes and the current version of dcelements  is 17,424
bytes -- and increase of nearly 650%. This is uncompressed, but how many
semweb clients support gzip encoding? When we did our study of dcmi's
website several years ago the majority of the bandwidth consumption came
from schema resolution and this is highly likely to dramatically affect
dcmi's bandwidth consumption.

I don't object at all to the idea that there would be a single large dc.rdf
file, but it should definitely be a matter of a specific request for it and
not served by default. And it really should be built from its constituent
vocabularies in order to let them be uniquely versioned, rather than the
other way around.

Jon

Received on Friday, 18 May 2012 15:20:17 UTC