Re: semantic pingback improvement request for foaf from Nathan on 2010-04-17 (public-lod@w3.org from April 2010)

From: Nathan <nathan@webr3.org>
Date: Sun, 18 Apr 2010 00:14:25 +0100
To: Story Henry <henry.story@bblfish.net>
CC: public-lod community <public-lod@w3.org>, foaf-protocols@lists.foaf-project.org, Sören Auer <auer@informatik.uni-leipzig.de>, Philipp Frischmuth <pfrischmuth@googlemail.com>, Sebastian Tramp <tramp@informatik.uni-leipzig.de>, Kingsley Idehen <kidehen@openlinksw.com>, Melvin Carvalho <melvincarvalho@gmail.com>
Message-ID: <4BCA40D1.7020601@webr3.org>
Story Henry wrote:
> Hi,
> 
>    I often get asked how one solve the friend request problem on open social networks that use foaf in the hyperdata way. 
> 
> On the closed social networks when you want to make a friend, you send them a request which they can accept or refuse. It is easy to set up, because all the information is located in the same database, owned by the same company. In a distributed social foaf network anyone can link to you, from anywhere, and your acceptance can be expressed most clearly by linking back. The problem is: you need to find out when someone is linking to you.
> 
> 
>     So then the problem is how does one notify people that one is linking to them. Here are the solutions in order of simplicity.
> 
>    0. Search engine solution
>    -------------------------
> 
>    Wait for a search engine to index the web, then ask the search engine which people are linking to you. 
> 
>  Problems:
> 
>    - This will tend to be a bit slow, as a search engine optimised to search the whole web will need to be notified first, even if this is only of minor interest to them
>    - It makes the search engine a core part of the communication between two individuals, taking on the role of the central database in closed social networks
>    - It will not work when people deploy foaf+ssl profiles, where they access control who can see their friends. Search engines will not have access to that information, and so will not be able to index it.
> 
>    1. HTTP Referer Header
>    ----------------------
> 
>    The absolute simplest solution would be just to use the mis-spelled HTTP Referer Header, that was designed to do this job. In a normal HTTP request the location from which the requested URL was found can be placed in the header of the request.
>  
>     http://en.wikipedia.org/wiki/HTTP_referrer
> 
>    The server receiving the request and serving your foaf profile, can then find the answer to the referrer in the web server logs.
> 
> Perhaps that is all that is needed! When you make a friend request, do the following:
>   
>    1. add the friend to your foaf profile
> 
>   <http://bblfish.net/#hjs> foaf:knows <http://kingsley.idehen.name/dataspace/person/kidehen#this> .
> 
>    2. Then just do a GET on their Web ID with the Referrer header set to your Web Id. They will then find in their apache logs, something like this:
> 
> 93.84.41.131 - - [31/Dec/2008:02:36:54 -0600] "GET /dataspace/person/kidehen HTTP/1.1" 200 19924 "http://bblfish.net/" "Mozilla/5.0 (Windows; U; Windows NT 5.1; ru; rv:1.9.0.5) Gecko/2008120122 Firefox/3.0.5"
> 
>   This can then be analysed using incredibly simple scripts such (as described in [1] for example)
> 
>    3. The server could then just verify that information by 
>      
>   a. doing a GET on the Referer URL to find out if indeed it is linking to the users WebId 
>   b. do some basic trust analysis (is this WebId known by any of my friends?), in order to rank it before presenting it to the user
> 
>    The nice thing about the above method is that it will work even when the initial linker's server does not have a Ping service for WebIDs. If the pages linking are in html with RDFa most browsers will send the referrer field.
> 
>   There is indeed a Wikipedia entry for this: it is called Refback.
>   http://en.wikipedia.org/wiki/Refback
> 
>   Exactly why Refback is more prone to spam than the ping back or linkback solution is still a bit of a mystery to me.
> 
>   2. Referer with foaf+ssl
>   ------------------------
> 
>   In any case the SPAM problem can be reduced by using foaf+ssl [2]. If the WebId is an https WebId - which it really should be! - then the requestor will authentify himself, at least on the protected portion of the foaf profile. So there are the following types of people who could be making the request on your WebId.
>  
>   P1. the person making the friend request
> 
>    Here their WebId and the referer field will match.
>    (this can be useful, as this should be the first request you will receive - a person making a friend request, should at least test the link!) 
> 
>   P2. A friend of the person making the friend request
> 
>    Perhaps a friend of P1 goes to his page, comes across your WebId, clicks on it to find out more, and authentifies himself on your page. If P2 is a friend of yours too, then your service would have something somewhat similar to a LinkedIn introduction!
> 
>   P3. Just someone on the web, a crawler...
> 
>     Then you know that he is making his friendship claim public. :-)
> 
>    The above seems to be just some of the interesting information one could get 
> from the analysing the Referer field logs.
>    
> 
>   3. Pingback
>   -----------
> 
>   For some reason though the Referer Header solution was not enough, and so the pingback protocol was invented. 
>  
>     http://www.hixie.ch/specs/pingback/pingback
> 
> I am still not quite clear what this solution brings in addition to the refback one, other than that 
> 
>  - it declares the method of the pingback declaratively. If there is a ping back header, then it is clear that it can be used. The referer header is so much part of the web, it won't be clear to anyone if the WebId server is using it.
> 
>  - it makes it possible for the web page owner to decide who should process the
> pings, rather than leaving that to the apache server owner (though that is not true of the HTTP Header mechanism proposed)
> 
>  - it makes it easy to chose another server as the ping server
> 
>     
> Looking at the specification one has a feeling that it is pretty well thought through. Mostly. One glaringly archaic piece now is the requirement on the xmlrpc response. Essentially in order to notify someone of something that is referring to them they have set up an xmlrpc system, where a simple HTML FORM would have done! XMLRPC I think is no longer the flavour du jour, and people have moved on. HTML FORMS remain used by everyone everywhere. They don't seem to go out of fashion. They are also really easy to use, and every developer needs to know how to use them.
> 
> 
>    4. Semantic Ping Back
>    ---------------------
> 
> The linked data movement developed an enhancement of the Ping Back service described in 3 above. Essentially it adds an ontology to the link system described in the ping back service above, and the details are described here
> 
>      http://aksw.org/Projects/SemanticPingBack
> 
> Most important perhaps is the pingback service relation
> 
>      http://purl.org/net/pingback/service
> 
> defined as
> 
>  <http://purl.org/net/pingback/service> a owl:ObjectProperty ;
>     :comment "This property is used to link the target resource with a pingpack RPC service URI. It is the RDF " ;
>     :isDefinedBy <http://purl.org/net/pingback/> ;
>     :label "pingback service" .
> 
> 
>   5. Improved Semantic Ping Back
>   ------------------------------
> 
> 
> So my guess is that being the early days of the semantic web, 4 is still new enough that it can be changed. Ie, none of the xmlrpc agents are going to be looking for that relation, and so we have a chance either to add a new relation, or to create a very similar relation to fix the bugs of pingback. Here is what I propose
> 
> @prefix ping: <http://purl.org/net/pingback/> . 
> @prefix owl: <http://www.w3.org/2002/07/owl#> .
> @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
> @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
> 
> 
> service:ping a rdf:Property;
>    rdfs:domain foaf:Agent;  #probably a restriction to be removed, or be refined...
>    rdfs:range xxx:POSTResource;
>    rdfs:comment """
>       This relation specifies a method for services that wish to let 
>     document owner know that they are linking to this resource.
> 
>       The relation relates a WebId to a collection (named ?coll from here
>    on). A new resource of type PingEvent can be created in that collection 
>    by POST ing a  URL that mentions the given WebId.    
> 
>    The content that should be sent to the collection is what would be the result
> of POSTing the following form
> 
>     <form action="POST" action="?coll">
>         referer: <input type="text" name="referer"/><br/>
>         comment: <input type="text" name="comment"/>
>     </form>
> 
>    The representation returned by a GET on the POSTResource can even return 
>    the above html form, making it human readable.
> 
>    ( A nice improvement would be for the form to contain rdfa markup, that
>     would make it clear what the semantics of the form was, by using relations
>     described in this ontology )
> 
>    The resource created should be a named ping request, which itself
>   can be described using this ontology.
>    """ .
> 
> 
> This it seems to me would be so transparently simple as to be self explanatory
> to any web developer, increasing uptake and reducing the need for explanation -
> especially if the resource returns a web form as described above.
> 
> 	
>    6. Improving Semantic PingBack with foaf+ssl
>    --------------------------------------------
> 
> 
>   Just as with 2, semantic ping back can be improved with foaf+ssl, helping
> the ping back service identify the user making the ping request. This can be very
> useful in linked data worlds between large databases that may be pinging each other
> very often. This would allow trusted agent's pings to be accepted more automatically
> than new ones.
> 
> 
> 	Henry Story
>   
> 
> [1] http://www.the-art-of-web.com/system/logs/
> [1] http://esw.w3.org/Foaf%2Bssl/FAQ
> 
> Social Web Architect
> http://bblfish.net/
> 

If I may add another option to the mix, this option is wrong btw and
needs worked on but it's food for thought (?)

In broad strokes the only fully implemented form of "ping" I've ever
seen implemented anywhere is the blog ping, where publishers inform
"ping server" that a resource has been published or updated. We've
already for this in the form of PTSW.

If we consider a use-case of dbpedia/London, to keep the web of linked
data linked, we need a way for dbpedia/London to point to all resources
which mention dbpedia/London - this could easily be delegated to a link
server via rdfs:seeAlso, something like:

<http://dbpedia.org/resource/London> rdfs:seeAlso
<http://seealso.org/http://dbpedia.org/resource/London> .

note: this allow the owner of dbpedia/London to delegate which third
parties you could check for more information.

In order for this to work, the server at seealso.org (let's call it a
"link server") would need to know which resources mention dbpedia/London.

one way for it to do that is to check each resource on the web as it is
published, if said resource mentions dbpedia/London (or any resources it
is watching) then it stores the relation { ?s ?p dbpedia:London } and
updates the resource at
<http://seealso.org/http://dbpedia.org/resource/London> to include it.

in order to check every resource on the web as it is published or
updated it'd need a stream of notifications about published and updated
resources to check, hence PTSW.

By nature, because "link server" is checking every resource on the web,
it automatically knows when to start watching for dbpedia/London,
because when dbpedia/London is published or updated it is scanning the
content, and thus can pick up the rdfs:seeAlso <a resource under its
domain> and add it to the list of watches.

Where this ties is that say in the case of a foaf profile, then because
we've delegated to the link server in the first place via rdfs:seeAlso
<http://seealso.org/http://mydomain.org/myname#me> then we already know
which resource to subscribe to in order to be notified of any new
resources mentioning "me", whether that be via soic:follow or any other
relation; we subscribe to that which we delegated to:
<http://seealso.org/http://mydomain.org/myname#me>.

I'm sure there are flaws in the above (most noticeably the
centralization and dependency on ptsw or similar), and that this can be
improved on, but it does allow the publisher to delegate everything and
require no new anything to be implemented (other than a normal "ping" on
publish/update).

Thoughts?

Best,

Nathan
Received on Saturday, 17 April 2010 23:15:18 UTC