Re: Facebook Linked Data from Sebastian Schaffert on 2011-09-27 (public-lod@w3.org from September 2011)

From: Sebastian Schaffert <sebastian.schaffert@salzburgresearch.at>
Date: Tue, 27 Sep 2011 13:43:57 +0200
To: "semantic-web@w3.org >> semantic-web@w3.org" <semantic-web@w3.org>, public-lod <public-lod@w3.org>
Message-Id: <07ED55E5-1C95-4D35-A0C3-B45954B26121@salzburgresearch.at>
Am 27.09.2011 um 09:44 schrieb Norman Gray:
>> 
>> I am disappointed because I asked for data about http://graph.facebook.com/561666514 and got back data about http://graph.facebook.com/561666514# - this is my main concern. Maybe I should ask for http://graph.facebook.com/561666514# in the first place and manually remove the trailing "#" like a browser does. But I prefer a predictable, well-defined, and universal (over all LD services) behaviour.
> 
> I think you're disappointed because your expectations may be wrong.

My expectations are my expectations. But I accept that the world maybe does not satisfy them ;-)

But from my experience in developing software together with industry partners out there I have a good guess that my expectations will more-or-less match with the expectations of other developers. Especially those who are not very deep in Semantic Web technologies. 

We are working together with many IT companies (with excellent software developers) and trying to convince them that Semantic Web technologies are superior for information integration. They are already overwhelmed when they have to understand that a database ID for an object is not enough. If they have to start distinguishing between the data object and the real world entity the object might be representing, they will be lost completely.


> 
> When you dereference the URL for a person (such as .../561666514#), you get back RDF.  Our _expectation_, of course, is that that RDF will include some remarks about that person (.../561666514#), but there can be no guarantee of this, and no guarantee that it won't include more information than you asked for.  All you can reliably expect is that _something_ will come back, which the service believes to be true and hopes will be useful.  You add this to your knowledge of the world, and move on.

There I have my main problem. If I ask for "A", I am not really interested in "B". What our client implementation therefore does is to throw away everything that is about B and only keeps data about A. Which is - in case of the FB data - nothing. The reason why we do this is that often you will get back a large amount of irrelevant (to us) data even if you only requested information about a specific resource. I am not interested in the 999 other resources the service might also want to offer information about, I am only interested in the data I asked for. Also, you need to have some kind of "handle" on how to start working with the data you get back, like:
1. I ask for information about A, and the server gives me back what it knows about A (there, my expectation again ...)
2. From the data I get, I specifically ask for some common properties, like A foaf:name ?N and do something with the bindings of N. Now how would I know how to even formulate the query if I ask for A but get back B?

Of course I should instead of asking for http://graph.facebook.com/561666514 have asked for the person "http://graph.facebook.com/561666514#" and stripped the trailing hash and then applied my filtering on the result. My mistake, but this was also not obvious in the service description sent out by Jesse (ok, my "httpRange-14 alarm" should have signaled a potential danger ...").


The concept of "knowledge of the world" is too abstract for practical implementations: the fact that I can only expect that "something" comes back that (in one of the many different syntaxes) somehow corresponds to the RDF model is a very weak contract. It does not really go beyond what e.g. the Facebook OpenGraph API or other services that are not using Semantic Web technologies already offer.

I sometimes have the feeling that most of the Linked Data world is currently concerned with "somehow publishing all data out there" without being too clear about the "somehow" amd without taking into account the people who are supposed to *use* that data. The "somehow" currently includes:
- about 10 different syntaxes (RDF/XML, N3, Turtle, RDFa, JSON-LD, RDF/JSON, ...), many of which are not really solvable via content negotiation (e.g. JSON-LD and RDF/JSON both have content type application/json, N3 and ntriples has content type text/plain (or sometimes text/rdf+n3; level=XY)
- the data I get back is not about the resource I requested (discussion above), because there are competing philosophies about httpRange-14 (which is IMHO a never ending problem, unsolvable and also unnecessary in most situations), because there are several different recommendations about how to publish data on the web, or because some service somehow decides that some other data might be more useful or interesting than the one I asked for
- the data I get back uses different, unconnected vocabularies for the same thing (try getting information about the same person from DBPedia, Freebase, Facebook, and that person's FOAF file - getting the *name* alone is a serious issue with many workarounds

I am not really complaining, I just wanted to point out issues that still need to be solved. And of course the problem is not really only the Linked Data published by Jesse and Facebook, this was just a starting point because I ran into troubles there.

> 
> How much or how little information comes back is an engineering or UI decision on the part of the service.

... but this obviously is a serious factor in the usefulness of the service. Which was my initial point.


> 
> Or, put another way:
> 
>> But Linked Data could do better: there could be a uniform way of accessing the data and a unified contract about what comes back.
> 
> 
> There _is_ a uniform way of accessing the data: you dereference the non-fragment bit of a thing's name and read what comes back.  And there is a uniform contract: the RDF that comes back is something the service believes may be useful/interesting to you, and should include further places to look.
> 
> Yes, it would be _nice_ if the contract were stronger, but this is the web, and the LD pattern's key insight is that this degree of _very_ loose coupling is practical and useful.


In principle I agree. But the usefulness has yet to be proven, and I fear that the very weak contract is not enough to show the advantages over competing technologies. Maybe this is not necessary as long as the data somehow gets more easily accessible. But as a Semantic Web community we also have a certain hypothesis that the technologies *we* are coming up with are better than what is already out there.

Btw, the statement of _very_ loose coupling is for me in total contradiction with the httpRange-14 discussion: for instance, someone who is interested in "elephants" would probably simply link in his FOAF file to http://dbpedia.org/resource/Elephant, which is of course NOT the proper identifier for the elephant but only the document containing the data. In the same way, I would probably link in my FOAF file to my Facebook account using foaf:holdsAccount http://graph.facebook.com/561666514 and not http://graph.facebook.com/561666514# ...


Greetings,

Sebastian
-- 
| Dr. Sebastian Schaffert          sebastian.schaffert@salzburgresearch.at
| Salzburg Research Forschungsgesellschaft  http://www.salzburgresearch.at
| Head of Knowledge and Media Technologies Group          +43 662 2288 423
| Jakob-Haringer Strasse 5/II
| A-5020 Salzburg
Received on Tuesday, 27 September 2011 11:44:44 UTC