Data Identification section (was Re: reviewing the BP doc) from Phil Archer on 2015-08-06 (public-dwbp-wg@w3.org from August 2015)

From: Phil Archer <phila@w3.org>
Date: Thu, 6 Aug 2015 16:57:52 +0100
To: Annette Greiner <amgreiner@lbl.gov>, Data on the Web Best Practices Working Group <public-dwbp-wg@w3.org>
Message-ID: <55C38400.8010202@w3.org>

Hi Annette,

You make several comments here, I want to reply to one particular set,
hence the change in subject.

On 19/06/2015 03:03, Annette Greiner wrote:
[..]

> Data Identification
> The introductory text about URIs and URLs and IRIs is potentially confusing and not necessary for our audience to understand the BPs about identifiers.

I disagree (which is why I wrote it of course!)

The three terms *are* confusing and I was attempting to clear that up.
My reason being that we do talk about URLs and URIs and they're not
interchangeable. A few, a very few, will talk about IRIs. Anyone dipping
a toe in reading a W3C spec these days will see that rare term and
wonder what the heck it means.

Do you think it's worth me having another shot at explaining the
differences or are you opposed to including any such explanation?

Also, URLs are for for the internet, not just the web.

That's not my understanding although I guess it's not an absolute
distinction. To take an example of an Internet service that is not on
the Web, Skype doesn't use URLs except to address servers, the actual
data is not transmitted using HTTP.

I also disagree with the representation of DOIs as something that
cannot be looked up, though the question is not something I think we
should make readers think about.

Hobby horse alert!

To look up doi:10.1103/PhysRevD.89.032002 you have to:

- strip the doi: scheme;

- choose a resolver service (that you have to already know about);

- append the remaining string to that base URL to get something like
http://dx.doi.org/10.1103/PhysRevD.89.032002

- use HTTP to dereference it.

If you choose a different base URI and you might get something very
different (http://philarcher.org/10.1103/PhysRevD.89.032002 for example
;-) )

My intention when I included that was to point out that other identifier
schemes, DOIs being one of the best known, are not dereferenceable and
not (natively) part of the Web.

> * I would like this section to limit itself to information that applies to publishing *data*.

It's about identifiers and identifiers are dumb strings, therefore I
can't see how we can talk about identifiers that only apply to data and
not everything else.

The BP is about assigning persistent identifiers to datasets, but the
possible approach to implementation is about much more than that.

Yes, but that's for the reason just given.

The list items are also not consistent. (one shows use of extensions,
another says not to do that).

Fair enough, yes, I'd need to expand that and tie it back to the
multiple formats BP. I'd want to say something along the lines of:

Use an identifier like http://data.example.org/doc/foo/bar to link to
the resource.

Only include the file extension if it refers to a specific
representation of that resource, like
http://data.example.org/doc/foo/bar.rdf
http://data.example.org/doc/foo/bar.html

(btw, a feature of w3.org's server set up is that we don't need to
include file extensions. A URL like
http://www.w3.org/2013/share-psi/workshop/krems/report actually returns
a .php file (you can add the extension of you like) ). We make a lot of
use of conneg.

I worry that this will open up a holy war about how to implement a
REST API.

OK, that we want to avoid and it's being dealt with in another thread.
But I am prepared to defend the general principles here - it's what
marks out the Web as a data platform and not a means of transmitting
datasets that could just as easily be transported by sending a USB stick
in the post.

Phil.

For tracker: this is issue-194

Phil Archer
W3C Data Activity Lead
http://www.w3.org/2013/data/

http://philarcher.org
+44 (0)7887 767755
@philarcher1

Received on Thursday, 6 August 2015 15:57:52 UTC