Data Identification section (was Re: reviewing the BP doc)

Hi Annette,

You make several comments here, I want to reply to one particular set, 
hence the change in subject.

On 19/06/2015 03:03, Annette Greiner wrote:

> Data Identification
> The introductory text about URIs and URLs and IRIs is potentially confusing and not necessary for our audience to understand the BPs about identifiers.

I disagree (which is why I wrote it of course!)

The three terms *are* confusing and I was attempting to clear that up. 
My reason being that we do talk about URLs and URIs and they're not 
interchangeable. A few, a very few, will talk about IRIs. Anyone dipping 
a toe in reading a W3C spec these days will see that rare term and 
wonder what the heck it means.

Do you think it's worth me having another shot at explaining the 
differences or are you opposed to including any such explanation?

  Also, URLs are for for the internet, not just the web.

That's not my understanding although I guess it's not an absolute 
distinction. To take an example of an Internet service that is not on 
the Web, Skype doesn't use URLs except to address servers, the actual 
data is not transmitted using HTTP.

  I also disagree with the representation of DOIs as something that 
cannot be looked up, though the question is not something I think we 
should make readers think about.

Hobby horse alert!

To look up doi:10.1103/PhysRevD.89.032002 you have to:

- strip the doi: scheme;

- choose a resolver service (that you have to already know about);

- append the remaining string to that base URL to get something like

- use HTTP to dereference it.

If you choose a different base URI and you might get something very 
different ( for example 
;-) )

My intention when I included that was to point out that other identifier 
schemes, DOIs being one of the best known, are not dereferenceable and 
not (natively) part of the Web.

> * I would like this section to limit itself to information that applies to publishing *data*.

It's about identifiers and identifiers are dumb strings, therefore I 
can't see how we can talk about identifiers that only apply to data and 
not everything else.

The BP is about assigning persistent identifiers to datasets, but the 
possible approach to implementation is about much more than that.

Yes, but that's for the reason just given.

  The list items are also not consistent. (one shows use of extensions, 
another says not to do that).

Fair enough, yes, I'd need to expand that and tie it back to the 
multiple formats BP. I'd want to say something along the lines of:

Use an identifier like to link to 
the resource.

Only include the file extension if it refers to a specific 
representation of that resource, like

(btw, a feature of's server set up is that we don't need to 
include file extensions. A URL like actually returns 
a .php file (you can add the extension of you like) ). We make a lot of 
use of conneg.

  I worry that this will open up a holy war about how to implement a 

OK, that we want to avoid and it's being dealt with in another thread. 
But I am prepared to defend the general principles here - it's what 
marks out the Web as a data platform and not a means of transmitting 
datasets that could just as easily be transported by sending a USB stick 
in the post.


For tracker: this is issue-194


Phil Archer
W3C Data Activity Lead
+44 (0)7887 767755

Received on Thursday, 6 August 2015 15:57:52 UTC