Re: parsing documents that describe users

On 24 July 2012 19:07, Michiel de Jong <michiel@unhosted.org> wrote:

> As I progress with the useraddress.net code, i found that Content-Type
> headers are actually at least as valuable as link relationships in
> deciding how to process a document. I divide them into the following
> categories:
>
> - json
> - html
> - rdf
> - xrd
>
>
> I'm learning the formats as I go along, and just make them work
> heuristically, without too many strict rules. Apart from that I take
> into account the link relation that brought us to the document (if
> any), which can for instance tell us that something should be
> interpreted as a poco document. In many other cases, the link relation
> is useless for the document interpretation.
>
> But even using these hints, you can easily get to points where the
> data is not unambiguously machine-readable. For instance, for facebook
> and twitter API documents we need to take into account which API they
> came from.
>
> Also I found that quite  a few documents are served with the wrong
> Content-Type (e.g. Diaspora serve their host-meta with an html
> Content-Type) so for these I think i'll just send pull requests to get
> them fixed.
>
> Supporting StatusNet, Friendica, Diaspora and Google is relatively
> straightforward, and twitter and facebook are super-simple once you
> consult their custom and proprietary API documentation. But by far the
> most work is all the custom domains. I'm trying to support Melvin,
> Tantek and TimBL, but they each work in different ways. I hope to make
> some progress on that soon, and try to support all of these before I
> publish my proof-of-concept version.
>
> Also, some people point their sameAs relation to the human-readable
> profile page (like Tantek, http://www.facebook.com/tantek.celik ), and
> some point it to the API (like TimBL,
> http://graph.facebook.com/512908782 ). Both are sub-optimal, because
> human-readable profile pages are not always marked up, and API
> documents sometimes require knowledge of the proprietary API used. So
> this means a lot of the sameAs links i've seen so far are actually
> useless for building up search-engine data.
>
> What I haven't made much progress with is Buddycloud; there is an xmpp
> client for nodejs, but I haven't dived into how I can retrieve a vcard
> with that. So that will probably not make it into the current version.
>
> I'll try to finish my proof-of-concept by the weekend, and then we can
> compare it with openfollow.net to see how the two can integrate.
>

There's well established standards for this, such as content negotiation.

Cross application federation will possibly break where people where the
wrong content is served, or you'll have to try to make common sense
decisions to make the library more robust.

WRT timbl

owl:sameAs<http://graphite.ecs.soton.ac.uk/browser/?uri=http%3A%2F%2Fwww.w3.org%2F2002%2F07%2Fowl%23sameAs#http://www.w3.org/2002/07/owl#sameAs>
→
http://graph.facebook.com/512908782#<http://graphite.ecs.soton.ac.uk/browser/?uri=http%3A%2F%2Fgraph.facebook.com%2F512908782%23#http://graph.facebook.com/512908782#>,

http://identi.ca/user/45563<http://graphite.ecs.soton.ac.uk/browser/?uri=http%3A%2F%2Fidenti.ca%2Fuser%2F45563#http://identi.ca/user/45563>,

http://www.advogato.org/person/timbl/foaf.rdf#me<http://graphite.ecs.soton.ac.uk/browser/?uri=http%3A%2F%2Fwww.advogato.org%2Fperson%2Ftimbl%2Ffoaf.rdf%23me#http://www.advogato.org/person/timbl/foaf.rdf#me>,

http://www4.wiwiss.fu-berlin.de/bookmashup/persons/Tim+Berners-Lee<http://graphite.ecs.soton.ac.uk/browser/?uri=http%3A%2F%2Fwww4.wiwiss.fu-berlin.de%2Fbookmashup%2Fpersons%2FTim%2BBerners-Lee#http://www4.wiwiss.fu-berlin.de/bookmashup/persons/Tim+Berners-Lee>,

http://www4.wiwiss.fu-berlin.de/dblp/resource/person/100007<http://graphite.ecs.soton.ac.uk/browser/?uri=http%3A%2F%2Fwww4.wiwiss.fu-berlin.de%2Fdblp%2Fresource%2Fperson%2F100007#http://www4.wiwiss.fu-berlin.de/dblp/resource/person/100007>

1 seems fine, please note the # that you missed off

2 slighly broken in that he's linking to a document, im guessing he was
waiting for identi.ca to adopt rich profiles.  Tim uses twitter now.

3 seems fine

4 seems fine

5 seems fine

This is the kind of problems people were having in linked data many years
ago, hence the creation of standards.  When people start paying attention
to and following these standards, federation becomes that much easier.


>
> Ciao!
> Michiel
>
>

Received on Tuesday, 24 July 2012 18:13:48 UTC