- From: Michiel de Jong <michiel@unhosted.org>
- Date: Tue, 24 Jul 2012 19:07:25 +0200
- To: public-fedsocweb@w3.org
As I progress with the useraddress.net code, i found that Content-Type headers are actually at least as valuable as link relationships in deciding how to process a document. I divide them into the following categories: - json - html - rdf - xrd I'm learning the formats as I go along, and just make them work heuristically, without too many strict rules. Apart from that I take into account the link relation that brought us to the document (if any), which can for instance tell us that something should be interpreted as a poco document. In many other cases, the link relation is useless for the document interpretation. But even using these hints, you can easily get to points where the data is not unambiguously machine-readable. For instance, for facebook and twitter API documents we need to take into account which API they came from. Also I found that quite a few documents are served with the wrong Content-Type (e.g. Diaspora serve their host-meta with an html Content-Type) so for these I think i'll just send pull requests to get them fixed. Supporting StatusNet, Friendica, Diaspora and Google is relatively straightforward, and twitter and facebook are super-simple once you consult their custom and proprietary API documentation. But by far the most work is all the custom domains. I'm trying to support Melvin, Tantek and TimBL, but they each work in different ways. I hope to make some progress on that soon, and try to support all of these before I publish my proof-of-concept version. Also, some people point their sameAs relation to the human-readable profile page (like Tantek, http://www.facebook.com/tantek.celik ), and some point it to the API (like TimBL, http://graph.facebook.com/512908782 ). Both are sub-optimal, because human-readable profile pages are not always marked up, and API documents sometimes require knowledge of the proprietary API used. So this means a lot of the sameAs links i've seen so far are actually useless for building up search-engine data. What I haven't made much progress with is Buddycloud; there is an xmpp client for nodejs, but I haven't dived into how I can retrieve a vcard with that. So that will probably not make it into the current version. I'll try to finish my proof-of-concept by the weekend, and then we can compare it with openfollow.net to see how the two can integrate. Ciao! Michiel
Received on Tuesday, 24 July 2012 17:07:53 UTC