Re: Facebook Linked Data from Kingsley Idehen on 2011-09-27 (semantic-web@w3.org from September 2011)

From: Kingsley Idehen <kidehen@openlinksw.com>
Date: Tue, 27 Sep 2011 09:40:19 -0400
To: public-lod@w3.org, "semantic-web@w3.org" <semantic-web@w3.org>
Message-ID: <4E81D243.8030208@openlinksw.com>
On 9/27/11 7:43 AM, Sebastian Schaffert wrote:
> Am 27.09.2011 um 09:44 schrieb Norman Gray:
>>> I am disappointed because I asked for data about http://graph.facebook.com/561666514 and got back data about http://graph.facebook.com/561666514# - this is my main concern. Maybe I should ask for http://graph.facebook.com/561666514# in the first place and manually remove the trailing "#" like a browser does. But I prefer a predictable, well-defined, and universal (over all LD services) behaviour.
>> I think you're disappointed because your expectations may be wrong.
> My expectations are my expectations. But I accept that the world maybe does not satisfy them ;-)
>
> But from my experience in developing software together with industry partners out there I have a good guess that my expectations will more-or-less match with the expectations of other developers. Especially those who are not very deep in Semantic Web technologies.
>
> We are working together with many IT companies (with excellent software developers) and trying to convince them that Semantic Web technologies are superior for information integration. They are already overwhelmed when they have to understand that a database ID for an object is not enough.

If they understand what a Database Object Identifier is. Then they sure 
well understand what a Data Object Identifier is. And from their its 
trivial for them to grok why the use of de-referencable URIs == 
SuperKeys++. As I stated recently, RDBMS identifiers such as primary and 
foreign keys promise a lot, but in reality deliver so little. On the 
other hand, URIs deliver on endless promise. We have the World Wide Web 
as exhibit #1.

As is typically the case these days, you can take an alternative 
approach by completely reinventing terminology that comes across as 
gobbledygook to folks that have come to understand these matters in 
different realms.

>   If they have to start distinguishing between the data object and the real world entity the object might be representing, they will be lost completely.

This is all they have to do, which most in the IT realm have actually 
groked for eons modulo use of HTTP based de-referencable URIs:

A Data Object can represent a real-world Entity.
The Data Object must be unambiguously Named.
Its actual Representation (as expression and serialization time) is best 
served via an EAV/SPO based directed graph.
Accessing the actual Data Object (its Representation) occurs via an Address.

You can use URIs as unambiguous Object Names.
You can use URLs (a kind of URI) to unambiguously Name the location of a 
Data Object.
A Data Object Name is distinct from Data Object access Address.
Courtesy of indirection, you can access a Data Object by Name or Address.

Indirection remains a key mechanism for solving problems in computing. 
That didn't start with Linked Data and won't end with Linked Data.

HTTP URIs make resolvable (de-referencable) Name based indirection cheap 
(albeit somewhat unintuitive) due to HTTP based WWW ubiquity.

The answers don't lie in Semantic Web literature, far from it, you have 
to look to the broader realm of computer science fort that.

Linked Data, like the Web in general, boils down to ingenious use of 
Hyperlinks to extend the scale of old concepts.

If Data Object Names and Addresses weren't distinct, we wouldn't even be 
able to send email or use any other computer program. It just so happens 
that what's hidden by the combination of operating systems and 
programming languages is being exposed to a higher level, courtesy of 
WWW ubiquity. Trouble is that in this higher level of exposure we have a 
broad spectrum of audience skills and experience levels re. computer 
science fundamentals and industry history.

Imagine if we didn't end up with "resource" as a tactical replacement 
for "object", with regards to terminology. Imagine that?

Imagine the same if we had EAV triples + power of URIs instead of SPO 
triples where the "Object" simply adds yet another chunk of confusion 
re. computer science. The Object of a literary sentence != computer 
Object, but when you bring it into a computing space, conflation occurs, 
and we end up with 12+ years trying to untangle the mess.


Links:

1. 
http://www.cs.cmu.edu/afs/cs.cmu.edu/user/clamen/OODBMS/Manifesto/htManifesto/node4.html 
- Object Identity

2. http://www.w3.org/Addressing/rfc1630.txt -- Universal Resource 
Identifiers in WWW

3. http://www.w3.org/People/Connolly/9703-web-apps-essay.html - 
Distributed objects are the very heart of the Web, and have been since 
its invention -- Dan Connolly essay from way back

4. http://goo.gl/y7Gq4 -- my G+ note that deconstructs how Facebook have 
implemented a Linked Data Space without disruption to their existing 
infrastructure or business model.


Kingsley

>
>> When you dereference the URL for a person (such as .../561666514#), you get back RDF.  Our _expectation_, of course, is that that RDF will include some remarks about that person (.../561666514#), but there can be no guarantee of this, and no guarantee that it won't include more information than you asked for.  All you can reliably expect is that _something_ will come back, which the service believes to be true and hopes will be useful.  You add this to your knowledge of the world, and move on.
> There I have my main problem. If I ask for "A", I am not really interested in "B". What our client implementation therefore does is to throw away everything that is about B and only keeps data about A. Which is - in case of the FB data - nothing. The reason why we do this is that often you will get back a large amount of irrelevant (to us) data even if you only requested information about a specific resource. I am not interested in the 999 other resources the service might also want to offer information about, I am only interested in the data I asked for. Also, you need to have some kind of "handle" on how to start working with the data you get back, like:
> 1. I ask for information about A, and the server gives me back what it knows about A (there, my expectation again ...)
> 2. From the data I get, I specifically ask for some common properties, like A foaf:name ?N and do something with the bindings of N. Now how would I know how to even formulate the query if I ask for A but get back B?
>
> Of course I should instead of asking for http://graph.facebook.com/561666514 have asked for the person "http://graph.facebook.com/561666514#" and stripped the trailing hash and then applied my filtering on the result. My mistake, but this was also not obvious in the service description sent out by Jesse (ok, my "httpRange-14 alarm" should have signaled a potential danger ...").
>
>
> The concept of "knowledge of the world" is too abstract for practical implementations: the fact that I can only expect that "something" comes back that (in one of the many different syntaxes) somehow corresponds to the RDF model is a very weak contract. It does not really go beyond what e.g. the Facebook OpenGraph API or other services that are not using Semantic Web technologies already offer.
>
> I sometimes have the feeling that most of the Linked Data world is currently concerned with "somehow publishing all data out there" without being too clear about the "somehow" amd without taking into account the people who are supposed to *use* that data. The "somehow" currently includes:
> - about 10 different syntaxes (RDF/XML, N3, Turtle, RDFa, JSON-LD, RDF/JSON, ...), many of which are not really solvable via content negotiation (e.g. JSON-LD and RDF/JSON both have content type application/json, N3 and ntriples has content type text/plain (or sometimes text/rdf+n3; level=XY)
> - the data I get back is not about the resource I requested (discussion above), because there are competing philosophies about httpRange-14 (which is IMHO a never ending problem, unsolvable and also unnecessary in most situations), because there are several different recommendations about how to publish data on the web, or because some service somehow decides that some other data might be more useful or interesting than the one I asked for
> - the data I get back uses different, unconnected vocabularies for the same thing (try getting information about the same person from DBPedia, Freebase, Facebook, and that person's FOAF file - getting the *name* alone is a serious issue with many workarounds
>
> I am not really complaining, I just wanted to point out issues that still need to be solved. And of course the problem is not really only the Linked Data published by Jesse and Facebook, this was just a starting point because I ran into troubles there.
>
>> How much or how little information comes back is an engineering or UI decision on the part of the service.
> ... but this obviously is a serious factor in the usefulness of the service. Which was my initial point.
>
>
>> Or, put another way:
>>
>>> But Linked Data could do better: there could be a uniform way of accessing the data and a unified contract about what comes back.
>>
>> There _is_ a uniform way of accessing the data: you dereference the non-fragment bit of a thing's name and read what comes back.  And there is a uniform contract: the RDF that comes back is something the service believes may be useful/interesting to you, and should include further places to look.
>>
>> Yes, it would be _nice_ if the contract were stronger, but this is the web, and the LD pattern's key insight is that this degree of _very_ loose coupling is practical and useful.
>
> In principle I agree. But the usefulness has yet to be proven, and I fear that the very weak contract is not enough to show the advantages over competing technologies. Maybe this is not necessary as long as the data somehow gets more easily accessible. But as a Semantic Web community we also have a certain hypothesis that the technologies *we* are coming up with are better than what is already out there.
>
> Btw, the statement of _very_ loose coupling is for me in total contradiction with the httpRange-14 discussion: for instance, someone who is interested in "elephants" would probably simply link in his FOAF file to http://dbpedia.org/resource/Elephant, which is of course NOT the proper identifier for the elephant but only the document containing the data. In the same way, I would probably link in my FOAF file to my Facebook account using foaf:holdsAccount http://graph.facebook.com/561666514 and not http://graph.facebook.com/561666514# ...
>
>
> Greetings,
>
> Sebastian


-- 

Regards,

Kingsley Idehen	
President&  CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen
Attachments

application/pkcs7-signature attachment: S/MIME Cryptographic Signature
Received on Tuesday, 27 September 2011 13:41:03 UTC