Role of URI and HTTP in Linked Data from Jiří Procházka on 2010-11-10 (public-lod@w3.org from November 2010)

From: Jiří Procházka <ojirio@gmail.com>
Date: Wed, 10 Nov 2010 03:39:05 +0100
To: public-lod@w3.org
Message-ID: <4CDA05C9.20708@gmail.com>
Hi,
having read all of the past week and still ongoing discussion about HTTP
status codes, URIs and most importantly their meaning from Linked Data
perspective, I want share my thoughts on this topic.

I don't mean to downplay anyone's work but I think the role of URI and
HTTP specifications (especially semantics) in Linked Data is
overemphasized, which unnecessarily complicates things.
I think we can all agree, that the core idea of Linked Data is that
information is expressed using unique identifiers (URIs) I can simply
use to get useful information about the thing the identifier represents
(thus mandated relatively simple, widely supported transfer protocol HTTP).

So lets stick with this. Lets just treat URIs as RDF does - as simple
names. When we dereference an URI we get back some useful data and
that's it. If we want to express, the data fetched are in fact a
document, we use the wdrs:isDefinedBy property. The data fetched are
just a data and any info about it should be contain in it.
Why? Why no Content-Location? There is no reason to require additional
complexity, building extra information layers. Publishing the document
information in the data itself most probably would be simpler for both
the publishing and the consuming party. Treating HTTP as a simple
blackbox is what is mostly done in practice anyway.

What if someone doesn't publish the document data? Would it mean the URI
we dereferenced refers both to the thing described and the description
of it? Kind of. What I mean is the consumer side can add additional
information to the data about the document (when and how fast it was
fetched etc) and if the data doesn't contain info about the document
already, it could add it:
  <uri> wdrs:isDefinedBy [ wdsr:location "uri" ] . # or something like this
Non-RDF data should use their equivalents.
That is the most important things I had to say - lets keep semantics in
the data.

I believe it is quite important that the range of wdrs:isDefinedBy is a
document class, which should be domain of wdsr:location.
I am going to explain why I think so, but beware, at this point I get a
bit philosophical :)

What is pretty awesome about RDF, which is something Linked Data could
learn, is how it dabbled the ontological (used as philosophical term)
issues - existence, being and reality. In order to support maximum
expressiveness and compatibility with various world-views it says the
least about it. Big part of that is dealing with identity - if a
caterpillar turns into butterfly, is it still the same thing? Am I still
I when I get older and change? RDF doesn't offer any answers to such
questions, neither if there are only information resources and other
resources. There are just names which identify objects or concepts,
which we describe with names and the final description matches some
number of objects or concepts we know, while the better the description
is, the lower the number is.

RDFS classes are used to describe various aspects of objects or
concepts, which allow us to express ourselves much less ambiguously,
using properties with defined domain and range. On the other hand we can
describe those aspects separately if we consider them a separate entity.
For example someone can say I am averagely skilled as an English
speaker, or that my English skill is mediocre, or that I am one of
averagely skilled English speakers. Similarly one could say <book> is
long 30000 characters as its content, or that <book> is long 20
characters as its title, or that <book> is long 3000 characters as the
description received on dereferencing. It shouldn't matter if I consider
a book name as part of it or not, if I use as unambiguously defined
properties as possible. However vocabularies with not very well defined
terms (consider an example "length" property), which generally mimic
natural language properties, are used widely, which is why we should
have wdrs:isDefinedBy.
The point of this philosophical exercise was to say, that shouldn't be
saying "an URI represents one resource" or trying to define what
resources are or what existence is, but recognizing the context of the
original information when modifying it (especially amending).

Best,
Jiri Prochazka

PS: It might be useful to also have wdrs:isPrimarilyDefinedBy (as
rdfs:subPropertyOf wdrs:isDefinedBy).
Received on Wednesday, 10 November 2010 02:39:40 UTC