Re: A(nother) Guide to Publishing Linked Data Without Redirects from Kingsley Idehen on 2010-11-11 (public-lod@w3.org from November 2010)

From: Kingsley Idehen <kidehen@openlinksw.com>
Date: Thu, 11 Nov 2010 08:29:56 -0500
To: David Wood <david@3roundstones.com>
CC: Richard Light <richard@light.demon.co.uk>, Linked Data community <public-lod@w3.org>
Message-ID: <4CDBEFD4.90608@openlinksw.com>
On 11/11/10 8:07 AM, David Wood wrote:
>
> On Nov 11, 2010, at 07:44, Kingsley Idehen wrote:
>
>> On 11/11/10 4:54 AM, Richard Light wrote:
>>> In message 
>>> <AANLkTikmg=+AUgjHLf-88q-6Jzd7=ZXZ2gsj-QDA1Xd+@mail.gmail.com>, 
>>> Harry Halpin <hhalpin@ibiblio.org> writes
>>>>
>>>> The question is how to build Linked Data on top of *only* HTTP 200 -
>>>> the case where the data publisher either cannot alter their server
>>>> set-up (.htaccess) files or does not care to.
>>>
>>> Might it help to look at this problem from the other end of the 
>>> telescope? So far, the discussion has all been about what is 
>>> returned. How about considering what is requested?
>
>
> Good idea.
>
>
>>>
>>> I assume that we're talking about the situation where a user (human 
>>> or machine) is faced with a URI to resolve.  The implication is that 
>>> they have acquired this URI through some Linked Data activity such 
>>> as a SPARQL query, or reading a chunk of RDF from their own triple 
>>> store. (If we're not - if we're talking about auto-magically 
>>> inferring Linked Data-ness from random URLs, then I would agree that 
>>> sticking RDFa into said random pages is a way to go, and leave the 
>>> discussion.)
>>>
>>> The Linked Data guidelines make the assumption that said user is 
>>> willing and able to indicate what sort of content they want, in this 
>>> case via the Accept header mechanism.  This makes it reasonable to 
>>> further specify that the fallback response, in the absence of a 
>>> suitable Accept header, is to deliver a human-readable resource, 
>>> i.e. an HTML web page. Thus the web of Linked Data behaves like part 
>>> of the web of documents, if users take no special action when 
>>> dereferencing URLs.
>>>
>>> If we agree that it is reasonable for user agents to take some 
>>> action to indicate what type of response they want, then one very 
>>> simple solution for the content-negotiation-challenged data 
>>> publisher would be to establish a convention that adding '.rdf' to a 
>>> URL should deliver an RDF description of the NIR signified by that URL.
>>>
>>> Richard
>> Richard,
>>
>> Yes, we should look at this differently. We should honor the fact 
>> that the burgeoning Web of Linked Data is an evolution of the Web of 
>> Linked Document. To do this effectively, I believe we need to fix the 
>> Document Web and Data Web false dichotomy.
>>
>> There is no Linked Data to exploit without Documents at HTTP 
>> Addresses from which content is streamed.
>
>
> Kingsley, your analysis is solid except for one part:  You seem to 
> forget that the issue that brought us to this point was that the 
> address of an information resource describing something is not the 
> same as the address of the thing itself.  It is that problem that is 
> still worth solving.

David,

I do believe Ian's solution solves the matter of Name / Address 
disambiguation. Using a Document URL (Address) as a Name  requires the 
aforementioned disambiguation.

Question is: who has to do the disambiguation? The user agent or the 
data server? I believe a user agent should perform Name / Address 
disambiguation via it  semantic-fidelity choice. If high, then Ian's 
solution works i.e., the data is self-describing and the user agent 
should interpret accordingly. The semantic fidelity of HTTP stops at the 
Document, the problem at hand takes us into the realm of content 
interpretation. In a sense, like "beauty" this too lies in the eye of 
the beholder (the user agent).

I don't think a new code is necessary since HTTP is doing its job as a 
document location and content access protocol.

Thus, if we reference document URLs from browsers and follow links, 
everything will be fine. If we even go as far as taking a descriptor 
document's Subject URI (slash terminated) and then place that in a 
browser, we will be sorta fine too, depending on which user agent we use.

If today's small pool of Linked Data aware user agents adopt the Ian's 
option, then I'll drop "sorta" from the paragraph above :-)

Hope this helps.

Kingsley
>
> Regards,
> Dave
>
>
>
>>
>> If we put the Web aside for a second, I am hoping we can accept that 
>> in the real world we have Documents with different surface structure 
>> e.g. Blank Paper and Graph Paper.
>>
>> We can scribble and doodle on blank paper. We can even describe 
>> things in sentences and paragraphs on blank paper, but when it comes 
>> to Observations ("Data") Graph Paper is better i.e., it delivers 
>> high-fidelity expression of Observation by letting us place Subject 
>> Identifier, Subject Attributes, and Attribute values into cells.
>>
>> In the real-world, we've been able to make References across both 
>> types of paper (Documents):
>>
>> 1. Reference one Document from another
>> 2. Reference a cell in one Document from a cell in another.
>>
>> Enter the luxury of computers and hypermedia. These innovations allow 
>> us to replicate what I've outline above using hyperlinks. Some examples:
>>
>> 1. Word processors -- you could reference across Microsoft Word 
>> documents on a computer, but never across Word and WordPerfect
>>
>> 2. Spreadsheets -- you could use Reference values (Names or 
>> Addresses) to connect cell content within a single spreadsheet or 
>> across several spreadsheets and workbooks, but you couldn't reference 
>> data across Excel and Lotus 1-2-3
>>
>> 3. Database Tables -- could use Unique Keys to Identify records with 
>> Foreign Keys are the Reference mechanism, but in the case of 
>> relational databases (majority) the tables didn't accept Reference 
>> values i.e., content was typed literals oriented; you could reference 
>> a table in Oracle from a Table in Microsoft SQL Server etc.
>>
>> As you can see from the above:
>>
>> #1 is still about scribbling on blank paper. References are scoped to 
>> entire documents or fragments.
>> #2-3 is about graph paper oriented observation (data) capture and 
>> reference that leverages the fidelity of cells.
>>
>> Enter the luxury of computers, hypermedia, and a network protocols 
>> (HTTP):
>>
>> #1 looses its operating system and application specific scope. We 
>> have blank paper, so when we scribble we do so in HTML which 
>> leverages HTTP for referencing other documents.
>>
>> #2-3 loose their operating system and application specific scope. We 
>> have graph paper, so when we capture observation, leveraging the 
>> fidelity of cell level references, we do so via an EAV/SPO graph.
>>
>> As you can see, the Document hasn't gone anywhere, its structure has 
>> evolved with reference scope becoming more granular.
>>
>> Thus, when you HTTP GET and a sever responds with 200 OK, it's safe 
>> and sound to assume that a Document has been located. It is also safe 
>> and sound for a user agent to express what type of Content it would 
>> expect from a Document, and then interpret the Content retrieved at 
>> varying levels of semantic fidelity.
>>
>> Back to the point of looking at this differently re. user 
>> interaction. I've held the position for a while that the Linked Data 
>> narrative is back to front. I say this for the following reasons:
>>
>> 1. Document vs Data false dichotomy
>> 2. Assumption that anytime soon people will think URIs when they are 
>> already used to URLs.
>>
>> Orderly Linked Data narrative in steps for Humans:
>>
>> 1. Users continue to enter Document URLs into Browsers e.g. 
>> <http://dbpedia.org/page/Paris> instead of 
>> <http://dbpedia.org/resource/Paris>
>> 2. Users will see a human comprehensible document with a clearly 
>> identified subject and all its associated attributes and attribute values
>> 3. They will follow their noses to wherever via the links in the 
>> document take them enjoying the power of serendipitous discovery of 
>> relevant things
>> 4. They will bookmark without confusion i.e. not magical changes in 
>> the Browser address bar
>> 5. They will be also discover human limitations as time, data volume, 
>> data disparity intersect
>> 6. They will be happy and ultimately wiser (i.e., delegate stuff to 
>> smart agents that can exploit these links without human limitations).
>>
>> To conclude, Ian is suggesting a solution for high-semantic-fidelity 
>> user-agents that doesn't break anything, and actually accentuates the 
>> Document vs Data false dichotomy. HTTP is a document location and 
>> content retrieval protocol :-)
>>
>>
>> -- 
>>
>> Regards,
>>
>> Kingsley Idehen	
>> President&  CEO
>> OpenLink Software
>> Web:http://www.openlinksw.com
>> Weblog:http://www.openlinksw.com/blog/~kidehen
>> Twitter/Identi.ca: kidehen
>>
>>
>>
>>
>


-- 

Regards,

Kingsley Idehen	
President&  CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen
Received on Thursday, 11 November 2010 13:30:37 UTC