Re: Role of URI and HTTP in Linked Data from Nathan on 2010-11-10 (public-lod@w3.org from November 2010)

From: Nathan <nathan@webr3.org>
Date: Wed, 10 Nov 2010 10:44:03 +0000
To: Jiří Procházka <ojirio@gmail.com>
CC: public-lod@w3.org
Message-ID: <4CDA7773.4000305@webr3.org>
Hi Jiří,

Jiří Procházka wrote:
> Hi,
> having read all of the past week and still ongoing discussion about HTTP
> status codes, URIs and most importantly their meaning from Linked Data
> perspective, I want share my thoughts on this topic.
> 
> I don't mean to downplay anyone's work but I think the role of URI and
> HTTP specifications (especially semantics) in Linked Data is
> overemphasized, which unnecessarily complicates things.

The URI is what makes Linked Data, Linked Data, it's the only hook to 
the real world, and via the domain name system + domain registration 
process gives us a hook on accountability, which is critically 
important. "#bar, as described by <http://example.com/foo>" resolves in 
two ways:
(1) <http://example.com/foo> as a name for the literal description/graph
(2) <http://example.com/foo> as a way of saying "the author of the 
description available at <http://example.com/foo>, stated X, and was 
responsible as delegated by the owners of example.com", where X is (1) 
and provable by the HTTP messages and logs. A status code of 200 vs 303 
to some other domain or URI vs 4xx or 5xx plays a big part in that chain 
of accountability / validity / trust.

Also never forget that Linked Data is just Links with literals, a Link 
as in a hyperlink, its the description of a relationship between two 
things (names or literals) which make a link a link, thus each link is a 
statement, statements form descriptions, descriptions are literal 
things. Triples are statements, Graphs are descriptions.

There's a lot more to the simple triple with http URIs than many 
realise, sure it makes a nice RDF data bus for us and gives us an almost 
universal data format, which we can exploit and bring to the fore via 
linked data, but that's just the tip of the iceberg, and ultimately of 
very little use without the URI and HTTP.

a few notes..

> I think we can all agree, that the core idea of Linked Data is that
> information is expressed using unique identifiers (URIs) I can simply
> use to get useful information about the thing the identifier represents
> (thus mandated relatively simple, widely supported transfer protocol HTTP).

as above, that's not the core of linked data, that's the surface.

> So lets stick with this. Lets just treat URIs as RDF does - as simple
> names. When we dereference an URI we get back some useful data and
> that's it.

So, that'll be like mailto: or pop: or tel: then..

> If we want to express, the data fetched are in fact a
> document, we use the wdrs:isDefinedBy property. The data fetched are
> just a data and any info about it should be contain in it.

Expressing that the data fetched is infact a document, is indeed 
optional, but any response is always a message, a description, a 
/literal/ thing, you can't pretend it doesn't exist, it does - to say a 
description is anything other than that is like me saying you're an 
apple and insisting everybody believe me. Literals are self identifying, 
self naming, things.

> Why? Why no Content-Location? There is no reason to require additional
> complexity, building extra information layers. Publishing the document
> information in the data itself most probably would be simpler for both
> the publishing and the consuming party. Treating HTTP as a simple
> blackbox is what is mostly done in practice anyway.

Read only world then?

> What if someone doesn't publish the document data? Would it mean the URI
> we dereferenced refers both to the thing described and the description
> of it? Kind of.

There is no kind of. The description is a literal thing all of it's own, 
it's the same thing regardless of media type or whether you write it on 
a bit of paper, it's a self identifying literal thing.

> What I mean is the consumer side can add additional
> information to the data about the document (when and how fast it was
> fetched etc) and if the data doesn't contain info about the document
> already, it could add it:
>   <uri> wdrs:isDefinedBy [ wdsr:location "uri" ] . # or something like this
> Non-RDF data should use their equivalents.
> That is the most important things I had to say - lets keep semantics in
> the data.
> 
> I believe it is quite important that the range of wdrs:isDefinedBy is a
> document class, which should be domain of wdsr:location.

so one location / graph / description is a document, and the other isn't!?

> I am going to explain why I think so, but beware, at this point I get a
> bit philosophical :)
> 
> What is pretty awesome about RDF, which is something Linked Data could
> learn, is how it dabbled the ontological (used as philosophical term)
> issues - existence, being and reality. In order to support maximum
> expressiveness and compatibility with various world-views it says the
> least about it. Big part of that is dealing with identity - if a
> caterpillar turns into butterfly, is it still the same thing? Am I still
> I when I get older and change? RDF doesn't offer any answers to such
> questions, neither if there are only information resources and other
> resources. There are just names which identify objects or concepts,
> which we describe with names and the final description matches some
> number of objects or concepts we know, while the better the description
> is, the lower the number is.
> 
> RDFS classes are used to describe various aspects of objects or
> concepts, which allow us to express ourselves much less ambiguously,
> using properties with defined domain and range. On the other hand we can
> describe those aspects separately if we consider them a separate entity.
> For example someone can say I am averagely skilled as an English
> speaker, or that my English skill is mediocre, or that I am one of
> averagely skilled English speakers. Similarly one could say <book> is
> long 30000 characters as its content, or that <book> is long 20
> characters as its title, or that <book> is long 3000 characters as the
> description received on dereferencing. It shouldn't matter if I consider
> a book name as part of it or not, if I use as unambiguously defined
> properties as possible. However vocabularies with not very well defined
> terms (consider an example "length" property), which generally mimic
> natural language properties, are used widely, which is why we should
> have wdrs:isDefinedBy.
> The point of this philosophical exercise was to say, that shouldn't be
> saying "an URI represents one resource" or trying to define what
> resources are or what existence is, but recognizing the context of the
> original information when modifying it (especially amending).

indeed, we should just realise that all we can do is describe things by 
  making statements about them, and then provide a way to say how one 
described thing relates to another.

Best,

Nathan
Received on Wednesday, 10 November 2010 10:45:06 UTC