Re: Status codes / IR vs. NIR -- 303 vs. 200 from Kingsley Idehen on 2010-11-12 (public-lod@w3.org from November 2010)

From: Kingsley Idehen <kidehen@openlinksw.com>
Date: Fri, 12 Nov 2010 08:17:21 -0500
To: Lars Heuer <heuer@semagia.com>
CC: "public-lod@w3.org" <public-lod@w3.org>
Message-ID: <4CDD3E61.3020002@openlinksw.com>
On 11/12/10 7:22 AM, Lars Heuer wrote:
> Hi Kingsley,
>
> [...]
>>> If I want an RDF/XML representation of the document, I can ask for
>>>
>>>        Accept: application/rdf+xml
>>>
>>> and Wikipedia would (ideally) return an RDF/XML representation of that
>>> resource which tells me that John Lennon is a person who was born at
>>> ... murdered at ... was part of a group named ... etc.
>>>
>> Yes, so you received a document stating all of the above, who is the
>> Subject? How is the Subject Identified?
> I don't understand the question. A person named "John Lennon" is the
> subject. The subject is identified by the IRI.
>
> If I issue a
>
>    GET<http://en.wikipedia.org/wiki/John_Lennon>  Accept: application/x-tm+ctm
>
> and the server responses with (using the Topic Maps syntax CTM since I
> am not that familiar with RDF syntaxes):
>
>    <http://en.wikipedia.org/wiki/John_Lennon>
>       isa ex:person;
>       - "John Lennon";
>       born-at 1940-10-09;
>       died-at 1980-12-08;
>       member-of<http://en.wikipedia.org/wiki/The_Beatles>.
>
> I'd know that the above mentioned IRI represents a NIR (a person)
> which was born at .. died at .. etc.
>
> Where is the problem with that approach?
>
> [...]
>> Have to drop the fact that your non-web-sign-processor (DNA CPU)
>> already groks "John Lennon", and does a lot of fancy processing with
>> frames en route to disambiguation and context manifestation.
> I don't understand that statement. A web agent would also know that
> the IRI represents a person which has the name "John Lennon".
>
> [...]
>>> I see, DBpedia provides different IRIs. That's fine. But it's not
>>> possible to keep<http://en.wikipedia.org/wiki/John_Lennon>   (or
>>> <http://dbpedia.org/resource/John_Lennon>   if that matters) and make
>>> statements about that, right? I cannot make statements which are
>>> interpreted rightly without an Internet connection. I need the status
>>> codes.
>>>
>>> [...]
>>>> Personally, it can be solved at the application level by application
>>>> developers making a decision about the source of semantic fidelity i.e
>>>> HTTP or the Data itself.
> Yes, it can be solved at application level. Maybe on a per domain
> basis, but that's exactly the problem. Neither 303 nor 200 solves the
> identity problem. Unless we'd introduce a concept to distinguish
> between NIRs and IRs (like Topic Maps does with Subject Identifiers
> and Subject Locators).
>

Topic Maps isn't doing anything that isn't being done via Linked Data 
patterns, already. I've never groked this generally held position from 
Topic Maps community.

An Identifier is an Identifier. It has a Referent.
A URI is an Identifier.
You can use an Identifier as Name or an Address.

Trouble is that HTTP is about document location and content 
transmission. Thus, all URLs (Location Identifiers / Addresses) 
ultimately resolve to Data. URIs in the generic sense don't, and you can 
use an HTTP URI as a Name.

The 303 heuristic is how Name | Address disambiguation is handled re. 
Linked Data.

A new option has emerged, which I think is pretty much what you outline 
re. Topic Maps where, based on self-describing structured content (e.g. 
RDF formatted data) transmitted from a URL, a slash terminated URL can 
be treated as an HTTP URI based Name, by an application overriding the 
conventional assumptions culled from HTTP responses i.e., 200 OK, 
becomes Okay.

> I'd tend to agree that 200 seems to be easier to handle than 303 (even
> if it does not solve the identity problem either).

I don't see how it doesn't provide a solution to the Names | Address 
disambiguation problem.

> And fragment IRIs
> do not solve that problem either. It's just a problem shift, imo.

Maybe an imperfect solution since disambiguation isn't handled by the 
data itself.

> [...]
>>> Side note: Each subject/object needs a GET (assuming that predicates
>>> are always NIRs) to interpret the statement correctly... Does it
>>> scale? Let's assume you'd send me a DBpedia dump. I cannot interpret
>>> it correctly, unless I have an Internet connection?
>> What about when I send you DBpedia in the post on a USB key ? :-)
> I don't see how that statement contradicts my statement that I always
> need an Internet connection.
>   If you send me DBpedia offline, I need an
> Internet connection if I want to import the stuff and want to
> interpret the triples correctly if a 200 / 303 status code is
> necessary to handle the IRIs right.

You can work with DBpedia offline, assuming you install Virtuoso + 
DBpedia data from a USB to your local drive. It will work absolutely fine.

The green pages are just browser pages, everything you do online you can 
replicate offline, no problem at all re. DBpedia data. Of course if you 
follow an out-bound link to a resource (descriptor document) outside the 
DBpedia data set, you will need an internet connection if the data isn't 
local.


> Best regards,
> Lars


-- 

Regards,

Kingsley Idehen	
President&  CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen
Received on Friday, 12 November 2010 13:17:54 UTC