Re: Comments on Data 3.0 manifesto from Kingsley Idehen on 2010-04-19 (public-lod@w3.org from April 2010)

From: Kingsley Idehen <kidehen@openlinksw.com>
Date: Mon, 19 Apr 2010 13:25:03 -0400
To: Leigh Dodds <leigh.dodds@talis.com>
CC: Richard Cyganiak <richard@cyganiak.de>, public-lod <public-lod@w3.org>
Message-ID: <4BCC91EF.5080103@openlinksw.com>
Leigh Dodds wrote:
> Hi Kingsley,
>
> Thanks for the response. I wanted to clarify a couple of points.
> Response edited and comments line:
>
> On 19 April 2010 16:28, Kingsley Idehen <kidehen@openlinksw.com> wrote:
>   
>> I can replicate every killer ODBC demo I gave circa. 1993 with Linked
>> Data just by opening up a Descriptor URL. But, I can't really pull that
>> off today smoothly because my effort will ultimately get hijacked by RDF
>> issues like:
>>
>> 1. What is this thing you opened via a URL from Access or Excel?
>> 2. Why do those LINKs do the wonderful things we see (e.g., polymorphic
>> resultsets i.e., the pattern you see in snorql or isparql query results
>> tables)
>> 3. etc..
>>
>> The distraction in the scenario above will either come from a confused
>> user or a Semantic Web aficionado.  Ironically, both are equally
>> confused, all of the time, but one party doesn't know it :-(
>>     
>
> I'm not sure I really follow that.
>
> Might the confusion you're seeing just be genuine questions about
> trying to understand what your demos are doing? (You don't always
> provide much context)
>   

I think Danbri articulates the problem very well re. "URLs".

Follow the URL thread re. LINKER or leave the "Locator" alone.

Deemphasizing URL and emphasizing URI is a classic example. Neither are 
actually devoid of confusion.

Nobody hears or conveys the following with clarity:

1. A Structured (EAV model based) Descriptor Document has an 
Address/Location (URL) on an HTTP Network
2. It has a Subject
3. The Subject is Named via a Generic HTTP based Identifier ( a Hybrid 
where Generic HTTP Identifiers are used for Names)
4. The Subject's Attributes are also Named the same way
5. The Attribute values may also be References to Subjects (via their 
Names)
6. Names resolve to Structured Descriptions carried (or borne) by 
Structured Descriptor Documents (accessed via their URLs).

Closed loop.

Here is the norm (when talking to the assumed newbie audience) or what 
the aficionado expects to hear:

1. a Resource Description has a URI
2. Give Resources Names using HTTP
3. Use  RDF S-P-O Triples to Describe Resources
4. Link to other Resources using HTTP URIs.

Basically, Subject is a Resource, and a Resource has a URI. Wow!

Where do I get the Description from? Ah!  A URI.

Hmm I go get <http://xyz.rdf>, what's that? A Resource.

Hmm. what inside the resource <http://xyz.rdf>? Lots of triples that 
have: Resource URIs in the Subject slot, Property Resource URIs in the 
Property slot, and Values or Resources in the Object slot. 

That's the deconstruction of  the *predictable essence* of a typical 
Linked Data conversation.

Then to compound the matter, the ".rdf" files carry no relations back to 
the Subject being described etc.. Thus, you can't even explore the 
Description Graph by starting at the URI of the .rdf resource because in 
a way the URI abstraction as used in this context deems the Descriptor 
Document (or Resource) non existent (it isn't important).

Please do not characterize my concerns as being about people not groking 
my demos, I haven't made a simple reference to my demos. I've made 
references to applications that already exist in other realms that work, 
and in many cases are plenty EAV savvy.

Note this example and specific tweaks explicitly made to DBpedia as part 
of our quest for coherence:

curl -I -H "Accept: text/n3" http://dbpedia.org/data/DBpedia.n3
HTTP/1.1 200 OK
Server: Virtuoso/06.01.3127 (Solaris) x86_64-sun-solaris2.10-64  VDB
Connection: Keep-Alive
Date: Mon, 19 Apr 2010 16:09:15 GMT
Accept-Ranges: bytes
Link: <http://dbpedia.org/data/DBpedia.xml>; rel="alternate"; 
title="Metadata in RDF/XML format",
 <http://dbpedia.org/data/DBpedia.json>; rel="alternate"; 
title="Metadata in JSON+RDF format",
 <http://dbpedia.org/page/DBpedia>; rel="alternate"; title="XHTML+RDFa",
 <http://dbpedia.org/resource/DBpedia>; rev="primarytopic",
 <http://dbpedia.org/resource/DBpedia>; rel="describedby",
 <http://mementoarchive.lanl.gov/dbpedia/timegate/http://dbpedia.org/data/DBpedia.n3>; 
rel="timegate"
X-SPARQL-default-graph: http://dbpedia.org
Content-Type: text/n3; charset=UTF-8
Content-Length: 5919

Here's what's in the HTML based Descriptor Document's <head/>:

<head profile="http://www.w3.org/1999/xhtml/vocab">
    <title>About: DBpedia</title>
  ...
    <link rel="alternate" type="application/rdf+xml" 
href="http://dbpedia.org/data/DBpedia.rdf" title="RDF/XML Representation" />
    <link rel="alternate" type="text/rdf+n3" 
href="http://dbpedia.org/data/DBpedia.n3" title="RDF N3/Turtle 
Representation" />
    <link rel="alternate" type="application/json+rdf" 
href="http://dbpedia.org/data/DBpedia.jrdf" title="RDF/JSON 
Representation" />
    <link rel="alternate" type="application/json" 
href="http://dbpedia.org/data/DBpedia.json" title="RDF/JSON 
Representation" />
    <link rel="timegate" type="text/html" 
href="http://mementoarchive.lanl.gov/dbpedia/timegate/http://dbpedia.org/page/DBpedia" 
title="Time Machine" />
    <link rel="foaf:primarytopic" 
href="http://dbpedia.org/resource/DBpedia"/>
    <link rev="describedby" href="http://dbpedia.org/resource/DBpedia"/>
...
</head>


Goal:

Make it clear that we have:

1. Descriptor Document
2. Subject of the Description carried by the Descriptor Document
3. Descriptor Document carries content in a variety of formats.


This I can explain is very plain language 100% of the time. Even there 
we have EAV in play re. <link/> and @rel.
>   
>> ...
>> RDF inadvertently conflates Data Model and Data Representation Formats.
>>     
>
> Do you really mean that, or that *people* have conflated those two
> aspects of RDF?
>   

I mean that the initial RDF/XML is RDF conflation basically killed all 
routes to Data Model appreciation. The dropping URLs from Linked Data 
paralance and focusing URI solely (without a modicum of qualification) 
compounded the matter.

Trying to convince people that there is a Data Model aspect to RDF 
doesn't wash. No more than trying to convince people there is a 
Hierarchical Data Model aspect to XML. Models don't get accentuated by 
Markup languages, I have XML and RDF history as proof!

Thus, if people won't accept RDF's Data Model aspect do we continue to 
waste time beating that dead horse? Why not flip it around and open up a 
chapter for the all critical data model, the real and coherent basis for 
meshing heterogeneous shaped data across disparate data sources.

In a nutshell its about Data Virtualization, even that moniker is taking 
a life of its own without people instinctively correlating it with RDF 
based Linked Data.

>   
>> This is an old snafu from the first coming of RDF, and sadly we can't
>> fix that in 2010. Simply stating: "RDF is based on a Graph Model..."
>> isn't enough. What Graph Model are we talking about? One that dropped
>> upon us from Space? Or one that we've used since the start of time?
>>     
>
> The one in the RDF Model spec?
>
> I take your point though about context.
>   

It's all about Context.

Context is King!

>   
>> People like to claim they grok the fact that Resource Description
>> Framework is: Graph Data Model and a collection of associated Data
>> Representation Formats, but in the same guise all attention is paid to
>> the latter. Even worse, RDF/XML  is still pitched as the only official
>> variant of latter (even in 2010). Look at how long it's taken RDFa to
>> emerge, and the amount of pressure its taken get it this far etc.
>>     
>
> I think once people move beyond theory they naturally enough look at
> how they're exchanging data, how to parse it, etc. This is when syntax
> issues arise, they do get in the way, but problems aren't
> insurmountable.
>
>   

Problems aren't insurmountable when each problem is taken as a genuine 
learning opportunity.

In my experience old mistakes repeating themselves remains too prevalent 
re. RDF in general, and now Linked Data.

Do remember, Linked Data's bootstrap had next to nothing to do with 
Semantic Web messaging and general mode of operation re. RDF. It 
happened on the pragmatic basis of simply deriving a Corpus of Names 
from Wikipedia that showcased the value of Generic HTTP Identifier based 
Naming above all else. What the likes of BBC, New York Times, Reuters 
etc.. are doing with DBpedia is basically what happens when people grok 
the value of a powerful lookup table in a DBMS (basic or federated).

Imagine if DBpedia was left to go the traditional Semantic Web route, 
our grandkids would be lucky to have the 2007 variant of DBpedia let 
alone what exists today, and I am darned serious when I say that.


> Having some more RDF syntaxes reach Recommendation status would be a
> good thing though.
>   

Even if they don't reach recommendation, they should be at the fore 
front of education oriented communications.

Standards are Retrospective beasts, just like the mythical "Killer 
App.".  The moment you take "Retro" out of standards, you have problems 
(many of which are playing out repeatedly re. RDF).
>   
>>> Even for technical audiences, beginning from EAV or RDF model doesn't
>>> always help.
>>>       
>> Well if the technical audience in question doesn't make the connection
>> between DBMS realm and Linked Data, of course not. Likewise, if  they
>> don't make the connection between standards based Data Access and Linked
>> Data, of course not. The Data 3.0 manifesto or emphasis on the EAV
>> cannot resonate with said audience, and its not who I am actually trying
>> to speak to either.
>>     
>
> Actually what I meant was: people have different approaches to
> learning (new technologies or otherwise). Some will warm to a theory
> first approach, others just want to get their hands dirty. Starting
> with a general introduction to modelling isn't useful for the latter
> audience.
>   

Hmm.

I don't know about "getting your hands dirty" without some basic context.


>   
>> I am much more interested in people that already work with data,  via
>> tools without writing a single line of code.
>>     
>
> Yes I'm interested in how Linked Data can put powerful tools into the
> hands of non-programmers too.
>
>   
>> I am simply saying to the audience above:
>>
>> 1. We have Structured Data
>> 2. Here is how you make Structured Data (i.e. the underlying model)
>> 3. Here is how you share Structured Data (via Descriptor Documents on an
>> HTTP network).
>>
>> When people understand 1-3 (in many cases making links to what they
>> already grok), they can get on with exploiting the kind of
>> individual/enterprise Agility levels that real Open (standards
>> compliant) Data Access and Integration accords.
>>     
>
> To be clear: are you advocating a broader view of Linked Data that
> doesn't use RDF technologies at all?
>
>   

No, again I am saying:  RDF (which is perceived by most as Markup)  is 
not the commencement point re. Linked Data comprehension and appreciation. 

You will lose more people than you gain -- every time -- when the story 
starts with a Markup Language that has its Data Model grounding in a 
vague footnote. As I said, simply saying RDF is Graph Model based 
doesn't cut it. We have to expand, and more importantly, use what 
already exists in the minds of broader audiences. EAV is much more 
widely known, understood, and used than RDF.

RDF based Linked Data builds on EAV by mandating the use of Generic HTTP 
scheme Identifiers for Names across Entity, Attributes, and Attribute 
values (optionally).


> If so what are you recommending that people use, for e.g. "Descriptor
> Documents".
>   

What ever Data Representation works for them, ultimately, they will come 
to comprehend and appreciate RDF, just as they will ultimately come to 
appreciate OWL (yes, and I absolutely mean that about OWL). There 
shouldn't be an RDF Tax.

Paul Houle put it nicely in a Tweet: RDF is very easy and powerful, but 
you only come to realize and appreciate it, post attempting to reinvent it.



Kingsley
> Cheers,
>
> L.
>
>   


-- 

Regards,

Kingsley Idehen	      
President & CEO 
OpenLink Software     
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen
Received on Monday, 19 April 2010 17:25:38 UTC