Re: Squaring the HTTP-range-14 circle [was Re: Schema.org in RDF ...]

re

as I was talking about "messy" data, some anecdotes from our work with
foaf-search.net:

-Want to see some people and groups that are an owl:Ontology ?
 http://www.foaf-search.net/SearchRDFType?type=http%3A%2F%2Fwww.w3.org%2F2002%2F07%2Fowl%23Ontology
 Thank god everyone using our website either knows instantly that this is
 wrong or does not have a clue what owl:Ontology is.

-Today, our website spent hours merging thousands of different people into
 one person because our java developer made an update and forgot the code to
 check the inverse functional property foaf:mbox_sha1sum (SHA1-hash of mailbox
 URI) for bad values like 08445a31a78661b5c746feff39a9db6e4e2cc5cf (SHA1-hash
 of "mailto:"). We need these kind of hacks to keep everything running.

-foaf:homepage and foaf:weblog are inverse functional properties in the
 foaf ontology. We excluded them in our reasoners in fear of users having
 shared pages or being sloppy about what to fill in when asked for their
 homepage or weblog. But the very popular livejournal blog software only
 uses foaf:weblog to identify your friends so we had to accept at least
 foaf:weblog.

-This is something I found before our crawler found it - fortunately:
 http://data.totl.net/dave.rdf

-From the same website comes a huge database of many of the world's obscure
 industrial bands. Cool - except they are endless and made up on the fly :)
 http://data.totl.net/musicdb/music.cgi/bands?page=1

-Speaking about fakes: http://fakefriends.me/ makes up fake identities
 including crawlable FOAF RDF data on the fly. And almost every elgg blog our
 FOAF crawler gets to crawl has been taken over by spammers or was installed by
 them in the first place.

-Things can have so many different foaf:names. What is the canonical one ?
 We are currently using the one with the most quads but this is surely not
 the best possible solution.

This list will probably grow much larger in the near future.

Regards,

Michael Brunnbauer

-- 
++  Michael Brunnbauer
++  netEstate GmbH
++  Geisenhausener Straße 11a
++  81379 München
++  Tel +49 89 32 19 77 80
++  Fax +49 89 32 19 77 89 
++  E-Mail brunni@netestate.de
++  http://www.netestate.de/
++
++  Sitz: München, HRB Nr.142452 (Handelsregister B München)
++  USt-IdNr. DE221033342
++  Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer
++  Prokurist: Dipl. Kfm. (Univ.) Markus Hendel

Received on Tuesday, 14 June 2011 21:37:50 UTC