Re: See Other from Jesse Weaver on 2012-03-28 (public-lod@w3.org from March 2012)

From: Jesse Weaver <weavej3@rpi.edu>
Date: Wed, 28 Mar 2012 16:25:24 -0400
To: Hugh Glaser <hg@ecs.soton.ac.uk>
Cc: public-lod community <public-lod@w3.org>
Message-Id: <2100F9BB-2FA7-4CF4-9DA5-CEAED30C8C2A@rpi.edu>
Hi Hugh.

I have avoided participating in these httpRange-14 debates, but since  
you have brought the Facebook Linked Data into the discussion, I feel  
compelled to respond.  The goal (or my goal) regarding Facebook's  
Linked Data provided through its Graph API was to allow for sensible  
Linked Data RDF to be published in a way that did not interfere with  
maintenance of existing code and in a way that would require very  
little maintenance in the future.  Please see my inline comments  
below, and also some comments at the end.

On Mar 28, 2012, at 6:44 AM, Hugh Glaser wrote:

> Executive summary:
> TAG, please don't come back with something that does not allow, or  
> even encourage, sites like Facebook to offer RDF back in return for:
> curl -L -H Accept:application/rdf+xml https://www.facebook.com/hugh.glaser
>
> Challenge: Try telling me what to put in sameAs.org for the LD URI  
> for you on Facebook.
>
> Detail:
> I support Jeni et al.'s Proposal, because it is an improvement, and  
> seems to have some chance of success.
> Actually, I am pretty sure I align with Giovanni and his ilk.
> My preference is to lose the whole thing (and these discussions!) -  
> but there is no point, I think, in proposing that because it has no  
> chance of success.
>
> When people talk about "users", they seem to mean developers.

With regard to Facebook's Graph API, it is indeed targeted toward  
developers (Linked Data or otherwise).

> The users I think of are the eyeballs that look at and manipulate  
> the stuff on their screens, usually in a browser.
> Also, when a posting on this list has:
> "Well, if I wanted to do this, " or "Imagine…"
> my own eyeballs sort of glaze over.
> Well, there have been 6 years to do it or for someone else to  
> actually feel the need to do it - if it hasn't blazed a trail in the  
> huge range of Linked Data-enabled applications (irony intended)  
> being used by users out there, then it probably isn't a very  
> important use case.
>
> My slightly shorter story (thanks Dan, that was great, and I read  
> the whole thing!) involves Facebook as a LD site.
> In fact, I think this story is complementary to Dan's, as it gives  
> some view of the experience that Bob's users will get after Alice's  
> consultation and the subsequent implementation.
> This actually happened to me last night.
> Recalling that I now have a LD ID on Facebook, I go to Facebook and  
> get my ID (well, I think of it as my ID, and it's what I give anyone  
> if they ask for a link to "me").
> https://www.facebook.com/hugh.glaser
> (I could stop there, as we all know I already have a problem, but …)
> Being a brave little chap, before putting it in my signature as one  
> of my LD IDs, I decide to check that this is OK, by pasting it into  
> something that wants a LD ID, such as the W3C validator (in this  
> case I use curl -H Accept:application/rdf+xml).
> It actually gave a 200, so it must be OK, right?
> Of course, this doesn't validate because the URI actually does 302 - 
> > 200 and returns text/html in response to my curl.
> 506 would have been possibly less helpful, by the way.
> So I am done - nothing I can do now.
>
> However, being not only brave, but also intrepid, I start googling  
> for support.
> I eventually (it wasn't easy), find that I should be using graph  
> instead of www.
> With excitement, I try
> curl -i -L -H Accept:application/rdf+xml https://graph.facebook.com/hugh.glaser
> Close, but no cigar.
> I get text/javascript back.
> More digging (I'll spare you the details)...
> curl -i -L -H Accept:text/turtle https://graph.facebook.com/ 
> hugh.glaser
> I cannot contain my excitement; I have some RDF at last!
> So I can use https://graph.facebook.com/hugh.glaser as my Facebook  
> LD ID.
> Er, not quite.
> The turtle this returns is
> </720591128#>
> 	user:id "720591128" ;
> Ah yes, I knew I had a numeric ID, 720591128 - so it being late I  
> guess my LD ID is https://graph.facebook.com/720591128
> Of course, er no, not quite again.
> I suddenly notice a little # lurking in the turtle.
> So I finally decide that the URI I should put in my signature is
> https://graph.facebook.com/720591128#
> Of course, this is sufficiently ugly, compared with https://www.facebook.com/hugh.glaser
> that I don't bother, and go to bed.

I'm surprised that perceived ugliness of a URI (although it is not so  
ugly to me; beauty is in the eye of the beholder) would deter someone  
from taking advantage of the Linked Data.  The only differences --- as  
you have pointed out --- is that graph should be used instead of www,  
the FBID 720591128 is used instead of hugh.glaser, and the Linked Data  
URI has (what I call) an empty fragment.  Here are the reasons for  
these differences:
1.  I think (without certainty) that it is Facebook's intention that  
everything at www.facebook.com be for human eyeballs.  Admittedly,  
there could be some RDFa, and for some pages, there is RDFa containing  
Open Graph Protocol markup (do not conflate the Open Graph Protocol  
and the Graph API).  "Raw" data is made available --- targeting  
developers --- via the Graph API at http://graph.facebook.com (if you  
click that link without adding a path, it will redirect to  
documentation).
2. The FBID is used instead of the relative "vanity URL" (e.g., / 
hugh.glaser) because not every user has a vanity URL, and even if each  
user did, not every *thing* has a vanity URL.  The Graph API provides  
more than just data about users, and to quote Facebook's documention ( https://developers.facebook.com/docs/reference/api/ 
  ): "Every object in the social graph has a unique ID."
3. The use of the empty fragment is the easiest way to take advantage  
of how the Graph API works.  Prior to serving up text/turtle, the  
Graph API served up only JSON at, e.g., http://graph.facebook.com/720591128 
  .  That is the place to find data about you.  With little  
interference to existing code, when text/turtle is requested, the JSON  
is merely translated into text/turtle, making use of the internal  
system to provide meaningful semantics.  One of the problems is that a  
URI needs to be minted for instances (e.g., a user), and given  
httpRange-14, I have the choice of using a hash URI and returning 200  
OK or using a slash URI and 303'ing to somewhere else.  Using the  
empty fragment seemed like the most acceptable option.  (See dialogue  
at the end of this email.)

>
> Now I'm not saying that the TAG is going to solve all these issues.
> And there are lots of issues about 303 and # and RDFa …
>
> But I think this is a real Use Case for a user, which should mean  
> that the developer who provides this system (Facebook) is a Use Case  
> for the TAG.

The developer of the Linked Data would be me.  I worked on this while  
interning at Facebook during the summer of 2011.  I have since  
returned to RPI to continue working toward my Ph.D.

> I could have gone through a very similar process with almost any  
> Linked Data site, such as ePrints, myexperiment and dbpdedia  
> (including my own, such as RKBExplorer) - it just happened I wanted  
> Facebook last night.
> And Linked Data people go around saying hows exciting it is that  
> Facebook is offering Linked Data - I can't possibly use this as an  
> example to a customer, such as Dan's Bob.
>
> This whole experience is just crap.

Perhaps that experience was unpleasant.  Here's a marginally better one:
1. When you log into Facebook and go to your timeline (your own page),  
the path of the URL in the browser either looks like, e.g., / 
hugh.glasier or /profile.php?id=720591128 .  In the latter case, you  
have already found your FBID.
2. If you have a vanity URL, like /hugh.glasier , simply do a HTTP GET  
for http://graph.facebook.com/hugh.glasier , and that contains your  
FBID.
3. The URI representing you is http://graph.facebook.com/FBID# , where  
FBID should be the FBID number.

Yes, there is the HTTPS discrepancy, and yes, this probably isn't  
ideal in terms of discovering the URI that identifies a user.

> If I had trouble with this, exactly what does Facebook expect a  
> normal user to do?
> I'm sure we can point out ways in which Facebook might have done  
> things better, but that is not the point.

Although I no longer work at Facebook, I would be interesting in such  
"ways in which Facebook might have done things better."  That  
discussion would be more appropriate in another thread.

> Can they actually make it easy for users using the current or  
> proposed standards?
>
> TAG, please don't come back with something that does not allow, or  
> even encourage, sites like Facebook to offer RDF back in return for:
> curl -H Accept:application/rdf+xml https://www.facebook.com/ 
> hugh.glaser
>
> Best
> Hugh
> PS
> I left the https in, because that is actually what cut and paste  
> gave me.
> I'm guessing that would have been a whole new thread.
>

http works, too, unless you're trying to access permissions-protected  
data, in which case you need to use https and provide a security  
token.  I'm not sure what the implications are regarding http/https  
URIs in Linked Data.  Indeed, that would be a whole new thread.

> PPS
> If you read through to here, or even if you just skipped to here,  
> then if you really do send me your Facebook LD URI (along with one  
> of more other ones to pair it with), I will drop everything and put  
> them in sameAs.org :-)
>
> -- 
> Hugh Glaser,
>             Web and Internet Science
>             Electronics and Computer Science,
>             University of Southampton,
>             Southampton SO17 1BJ
> Work: +44 23 8059 3670, Fax: +44 23 8059 3045
> Mobile: +44 75 9533 4155 , Home: +44 23 8061 5652
> http://www.ecs.soton.ac.uk/~hg/
>
>
>

Finally, I would like to respond to an earlier comment made by Tom  
Heath (sorry for the incomplete-looking cut-and-paste): "a rigorous  
assessment of how difficult people *really* find it to understand  
distinctions such as 'things vs documents about things'. I've heard  
many people claim that they've failed to explain this (or similar)  
successfully to developers/adopters; my personal experience is that  
everyone gets it, it's no big deal (and IRs/NIRs would probably never  
enter into the discussion)."  My experience at Facebook agrees with  
Tom Heath's experience.  Understanding the distinction between  
"things" versus "documents about things" was easily understood.  The  
main source of contention was around its pragmatism and necessity.   
One developer said to me (paraphrase): "I would conflate documents and  
things if I could."  It is a strange statement to me, but  
nevertheless, the distinction was understood.

In the fashion of Dan Brickley, I would like to present another  
_hypothetical_ dialogue, one between a proponent of Linked Data and a  
typical web developer (although perhaps not quite as clever and  
thorough as Dan's).

BEGIN DIALOGUE

Proponent: "I found a way to meaningfully publish our already- 
published data as Linked Data, and I've implemented a prototype."

Developer: "Since you've already done it, let's take a look."

Proponent: "Okay, go to [link]."

Developer: "Hmmmm... [skip discussion about Turtle vs. RDF/XML].   
Everything looks okay, except I notice these URIs have #me at the  
end.  Why?  Can't we just lose the fragment?"

Proponent: "Well, URIs are used to identify things both on and off the  
web.  For example, no HTTP GET will ever squeeze you over a cable and  
pop you up in my browser."

Developer: "Sure.  So what?"

Proponent: "... so we need a way to mint URIs for both things on and  
off the web that makes sense with how the web already works."

Developer: "Okay, but why the fragment?"

Proponent: "I'm getting to that.  The current standard (which shall  
not be named) is based on the notion that any URI for which a HTTP GET  
returns with 200 OK (these are URIs without fragments) represents the  
document that is retrieved, that is, something *on* the web."

Developer: "Okay... seems logical."

Proponent: "So some conventions have been made for how to identify  
things *off* the web.  One is to simply add a fragment (understatement  
meant to avoid confusion at this point), and that can identify  
something *off* the web."

Developer: "So I have to have a fragment?  It seems unnecessary and  
ugly."

Proponent: "There is an alternative.  You can use a URI without a  
fragment, but then doing an HTTP GET on the URI must return a 303  
which redirects to a document about the thing the URI represents."

Developer: "303?  What is that?"

Proponent: "See Other."

Developer: "Never heard of that.  I don't want to have to create  
another service just to 303 redirect to already-available data.  Seems  
superfluous.  Is there any other way?"

Proponent: "Well, we could actually let the URIs 404.  It's not ideal,  
but it's legal."

Developer: "No, I don't want anything to 404.  Never mind then.  What  
about this #me?  Why 'me'?"

Proponent: "Well, that's just a common convention for saying that  
[URL] returns information about [URL]#me.  #this is another common one."

Developer: "Hmmm... I don't know about that."

Proponent: "Well, if we don't want to 404, and we don't want to  
support 303, we'll need some kind of fragment to conform with the  
current standard.  We could just have an empty fragment so that the  
changes are minimal, both in terms of effort and appearance."

Developer: "Okay... I guess... let's go with that, then."

END DIALOGUE

Glean from the dialogue what you will.  How would I describe  
httpRange-14?  Minimally sufficient.

Jesse Weaver
Ph.D. Student, Patroon Fellow
Tetherless World Constellation
Rensselaer Polytechnic Institute
http://www.cs.rpi.edu/~weavej3/index.xhtml
Received on Wednesday, 28 March 2012 20:28:03 UTC