Re: dcat:accessURL issue from Richard Cyganiak on 2013-01-31 (public-gld-wg@w3.org from January 2013)

From: Richard Cyganiak <richard@cyganiak.de>
Date: Thu, 31 Jan 2013 22:14:06 +0000
To: Makx Dekkers <mail@makxdekkers.com>
Cc: <fadi.maali@deri.org>, "Public GLD WG" <public-gld-wg@w3.org>
Message-Id: <72EC811C-AF06-42B4-A52A-1117184CBC0F@cyganiak.de>
Hi Makx,

On 31 Jan 2013, at 13:57, Makx Dekkers wrote:
> Now, if I am allowed to express my opinion here, I think an accessURL can only be a rdfs:Literal and not a rdfs:Resource, even if it looks like a URI.
>  
> The main problem that I see is that an accessURL is a string of characters that happen to be a URL. However, the URL is not the thing it points to. In a way, on the theoretical level, saying that the URL is the resource would be equivalent to saying that the range of a name is a resource if it happens to look like a URL.

Sorry to be blunt: That's the kind of angels-on-pinheads distinction that doesn't work in the real world.

>  But on a practical level, while I can say:
>  
> “The foaf:name ‘Makx Dekkers’ contains 12 characters”,
>  
> if I were to say
>  
> “the dcat:accessURL ‘http://t.co/xyz’ contains 15 characters”
>  
> it would mean that this *string* has 15 characters if it is declared as an rdfs:Literal, but it would mean that the *document* at that URL is  15 characters long.

Look, I can do it:

  <http://t.co/xyz> ex:iriLength 15.

What's the problem?

You're going to answer that this doesn't say what I think it says because of semantics, and how owl:sameAs can mess things up.

But the fact is, no one *forces* you to apply owl:sameAs smushing, and as long as you refrain from doing that, if you give me the statement above, I can *in practice* do absolutely everything with it, even if I can't do it *in theory*. Most people, I think, care more about the practice.

The theoretical argument is that IRIs in quotes identify a character string, while IRIs in pointy brackets identify some entity existing on the web or even outside of the web. In practice, it's not the kind of quote that determines the referent of the IRI, but it's what API function your application calls, or what SPARQL function you use, or whether you click on the IRI or copy-pase it.

Even the theory can be fixed up. The iriLength property can be defined like this: "Asserts that the subject resource has an IRI as its identifier that is of the length in characters specified by the object." It's just harder to understand, but should be angels-on-pinheads-compatible.

> Even worse, if it is defined as a resource,  I would not be able to make a statement like:
>  
> “the dcat:accessURL ‘http://t.co/xyz’ is a valid URI” (unless of course the *document*contains text that is a valid URI).

  <http://t.co/xyz> ex:isValidIRI true.

What exactly is the thing we're unable to do with that?

(A valid point here would be that in the real world, sometimes we have data with broken URLs, and if we put those into IRIs, then we have broken RDF. If we would put them into literals, then we may get ill-formed literals, but that will not prevent exchange of the data. This may be a bug or a feature, depending on your view on data quality and Postel's Law.)

> The second problem is that the definition of accessURL as resource seems to use URL and URI interchangeably.

To be precise, RDF 2004 doesn't use URIs but "RDF URI References", which are equivalent to IRIs, with a few historical corner-case exceptions originating in the fact that RDF predates the finalization of the IRI spec (RFC 3987). RDF 1.1 uses IRIs. IRIs and URLs are the same thing, except for some corner cases around character encoding that deal with legacy browser bugs.

See http://www.w3.org/TR/rdf11-concepts/#section-IRIs

> While I agree that it is true that every URL is a valid URI (by definition), the converse is not true. I read in RFC 3986:
>  
> The term "Uniform Resource Locator" (URL) refers to the subset of URIs that, in addition to identifying a resource, provide a means of locating the resource by describing its primary access mechanism (e.g., its network "location").

The more relevant reference here is:
http://www.w3.org/TR/url/

The treatment of RFC 3986 is outdated and doesn't reflect reality deployed on the web. One key difference being that URLs, as the term is generally used these days, can contain Unicode characters, while URIs can't.

> In this sense, I think the value of dcat:accessURL is not always a URI as the issue listed just above the definition of accessURL inhttps://dvcs.w3.org/hg/gld/raw-file/default/dcat/index.html states.

Sorry, I don't follow. The definition you quote states that URLs are a subset of URIs. (It should say that it's a subset of IRIs.) Thus, every URL is an IRI. The definition of dcat:accessURL says that the value must be a URL. Therefore, every legal value of dcat:accessURL is a legal IRI.

Best,
Richard



>  
> Makx.
>  
>  
>  
>  
>  
>  
>  
>  
> [1] https://dvcs.w3.org/hg/gld/raw-file/default/dcat/index.html
> [2] http://www.w3.org/ns/dcat.ttl
>  
>  
> Makx Dekkers
> makx@makxdekkers.com
> +34 639 26 11 46
>  
>
Received on Thursday, 31 January 2013 22:14:35 UTC