RE: Data Identification section (was Re: reviewing the BP doc)

Dear all,


* Definitions according RFC-3986

- URI
A Uniform Resource Identifier (URI) is a compact sequence of characters that identifies an abstract or physical resource.

- URL
The term "Uniform Resource Locator" (URL) refers to the subset of URIs that, in addition to identifying a resource, provide a means of locating the resource by describing its primary access mechanism (e.g., its network "location").


* Definition in RFC-3987
Internationalized Resource Identifier (IRI) by extending the syntax of URIs to a much wider repertoire of characters.


* Interpretation
What makes a URI to be in the subset URL is the providing of means to locate the resource, *not* the nature of the resource.

IRI is just and extension of the repertoire of characters. I am also quite familiar with RFC-3987: look at the acknowledgements.


* Phil example
http://philarcher.org/foaf.rdf#me

This URI is a URL because the scheme HTTP provides a mean to locate the resource. That the resource is abstract or physical does not play a role in making a URI a URL.


* Verification
This *must* be verified, perhaps by contacting the maintainer(s) of RFC-3986. TBL is one of the author, I know Larry Masinter, another author. We should not need clarifications from RFC-3987; I know Martin Duerst.


* More
We must follow the existing specifications: we cannot *redefine* anything in there, though we can *refine* as long as we do break anything. If one wants to express it as a hierarchy, it has to be properly defined. The same goes for the concept of "HTTP URI" as this is just a subset of URL.


Regards
Tomas


-----Original Message-----
From: Phil Archer [mailto:phila@w3.org] 
Sent: Wednesday, August 19, 2015 4:30 PM
To: Annette Greiner; CARRASCO BENITEZ Manuel (DGT)
Cc: public-dwbp-wg@w3.org
Subject: Re: Data Identification section (was Re: reviewing the BP doc)

Sorry Annette, on this rare occasion I must disagree with you.

http://philarcher.org/foaf.rdf#me is a URI. It is not a URL as it 
identifies a resource, me, that, like any other physical object, or 
concept, cannot be obtained over the internet. I do not have a network 
location.

http://philarcher.org/foaf.rdf is a URL, it identifies a resource that 
does have a network location, i.e. it can be obtained directly over the 
internet.

So there's a hierarchy here of URIs, HTTP URIs and URLs.

As evidence, let me quote RFC 3986 (the definition of URIs, 
https://www.ietf.org/rfc/rfc3986.txt), section 1.1.3:


1.1.3. URI, URL, and URN

A URI can be further classified as a locator, a name, or both. The
term "Uniform Resource Locator" (URL) refers to the subset of URIs
that, in addition to identifying a resource, provide a means of
locating the resource by describing its primary access mechanism
(e.g., its network "location").

RFC 3987 introduces the even more general IRI which allows Unicode 
characters outside the limited ASCII set.

The WG has made it clear that it wants to avoid providing any discussion 
of the issue. That seems fine to me as it avoids unnecessary confusion, 
BUT, if we're not going to say something along the lines of "we know all 
these things are different but for simplicity we'll just use the one 
term" then we must use the correct term in the correct place.

Last week we ended up voting on a proposed resolution:

PROPOSED: In general URI should be used in the BP doc, but depending on 
the context, URL may also be used.

This didn't meet with consensus - some people were unsure, Tomas was 
opposed.

Looking at other W3C specs btw, we use IRI pretty much everywhere. See, 
for example, http://www.w3.org/TR/tabular-metadata/.

So the hierarchy is:

IRI
URI
HTTP URI
URL

Therefore, IMO, the correct course of action in this, a technical 
specification document, is to use the term IRI except where context 
dictates that another term be used.

Phil.

On 13/08/2015 19:54, Annette Greiner wrote:
> For our document, URIs and URLs are the same thing, since we are not concerned with entities that don’t have a location on the web. The document uses URI currently. I’m fine with keeping that or using URL instead. Either way, my point is that we don’t need to launch into a discussion of the differences. I’m fine with a footnote referencing RFC 3986 if people feel it’s necessary.
> -Annette
> --
> Annette Greiner
> NERSC Data and Analytics Services
> Lawrence Berkeley National Laboratory
> 510-495-2935
>
> On Aug 13, 2015, at 2:02 AM, Manuel.CARRASCO-BENITEZ@ec.europa.eu wrote:
>
>> Annette,
>>
>> We should just use URL, the subset of URI with a network location mechanism. We *cannot* redefine term such URL and we must just point to the source specifications: we cannot break the existing specifications.
>>
>> I agree that the document is getting to long and hence the proposition to separate the identification: it is easier to produce and consume.
>>
>> Regards
>> Tomas
>>
>>
>> From: Annette Greiner [amgreiner@lbl.gov]
>>
>> Sent: 12 August 2015 20:11
>>
>> To: Phil Archer
>>
>> Cc: CARRASCO BENITEZ Manuel (DGT); public-dwbp-wg@w3.org
>>
>> Subject: Re: Data Identification section (was Re: reviewing the BP doc)
>>
>>
>>
>>
>>
>>
>>
>>
>> On Aug 12, 2015, at 7:56 AM, Phil Archer <phila@w3.org> wrote:
>>
>>
>>
>>
>>
>> * ?R?
>>
>> URI, URL, URN, IRI. Just use URI everywhere and add something like:
>>
>>
>>
>>   "In this specification, the term URI is used for the identification schemes: URI, URL, URN and IRI ..."
>>
>>
>>
>> This is line with the recommendation in RFC3986
>>
>> https://tools.ietf.org/html/rfc3986#section-1.1.3
>>
>>
>>
>>   " ... Future specifications and related documentation should use the general term "URI" rather than the more restrictive terms "URL" and "URN" ..."
>>
>>
>>
>> But
>> we *want* to be restrictive. We're only talking about HTTP URIs, we're not talking about URNs, or even URLs. Hence I think we need to say something, no?
>>
>>
>>
>>
>> Funny, I take the fact that we want to be restricted to discussing URIs as a reason *not* to add a discussion about them vs. URNs or URLs. The fact that we use a term in our document doesn’t mean that we have to define it. It is defined elsewhere in W3C
>> space plenty. Our document is already annoyingly long; let’s help readers get to what is helpful information and leave out discussion that is not unique to publishing data on the web.
>>
>>
>>
>>
>> --
>>
>> Annette Greiner
>>
>> NERSC Data and Analytics Services
>>
>> Lawrence Berkeley National Laboratory
>>
>> 510-495-2935
>>
>>
>>
>>
>>
>>
>>
>
>

-- 


Phil Archer
W3C Data Activity Lead
http://www.w3.org/2013/data/

http://philarcher.org
+44 (0)7887 767755
@philarcher1

Received on Wednesday, 19 August 2015 16:25:35 UTC