- From: Mischa Tuffield <mmt04r@ecs.soton.ac.uk>
- Date: Wed, 15 Jun 2011 11:24:09 +0100
- To: Richard Cyganiak <richard@cyganiak.de>
- Cc: Michael Brunnbauer <brunni@netestate.de>, public-lod@w3.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hello, <snip/> On 15 Jun 2011, at 11:11, Richard Cyganiak wrote: > Another anecdote, I don't remember whom I heard this from: From FOAF data you can see that a lot of people say that their homepage is … "Google". I am not sure this is on-topic anymore but, these are the following values I blacklisted and flagged when used as an IFP in the FOAF validator I wrote on foaf.qdos.com (I know it is currently down, we are repurposing hardware at the mo - so sorry!). $ifpblacklist = array("<mailto:>",'"da39a3ee5e6b4b0d3255bfef95601890afd80709"','"08445a31a78661b5c746feff39a9db6e4e2cc5cf"','"20cb76cb42b39df43cb616fffdda22dbb5ebba32"','<http://www.google.com/>','<http://www.google.com>','<http://www.bbc.co.uk/>','<http://bbc.co.uk>','"02085a0d20a5f574c1ce6cfe42bba6e85cfe07cf"'); Some of the hashes in the blacklist where added due to copy and pasting errors when people where knocking together handwritten FOAF files, iirc John Domingue shared one of the foaf:mbox_sha1sum's with Tom Heath (probably from the time when they both worked at KMI). Mischa > Best, > Richard > > > On 14 Jun 2011, at 22:37, Michael Brunnbauer wrote: > >> >> re >> >> as I was talking about "messy" data, some anecdotes from our work with >> foaf-search.net: >> >> -Want to see some people and groups that are an owl:Ontology ? >> http://www.foaf-search.net/SearchRDFType?type=http%3A%2F%2Fwww.w3.org%2F2002%2F07%2Fowl%23Ontology >> Thank god everyone using our website either knows instantly that this is >> wrong or does not have a clue what owl:Ontology is. >> >> -Today, our website spent hours merging thousands of different people into >> one person because our java developer made an update and forgot the code to >> check the inverse functional property foaf:mbox_sha1sum (SHA1-hash of mailbox >> URI) for bad values like 08445a31a78661b5c746feff39a9db6e4e2cc5cf (SHA1-hash >> of "mailto:"). We need these kind of hacks to keep everything running. >> >> -foaf:homepage and foaf:weblog are inverse functional properties in the >> foaf ontology. We excluded them in our reasoners in fear of users having >> shared pages or being sloppy about what to fill in when asked for their >> homepage or weblog. But the very popular livejournal blog software only >> uses foaf:weblog to identify your friends so we had to accept at least >> foaf:weblog. >> >> -This is something I found before our crawler found it - fortunately: >> http://data.totl.net/dave.rdf >> >> -From the same website comes a huge database of many of the world's obscure >> industrial bands. Cool - except they are endless and made up on the fly :) >> http://data.totl.net/musicdb/music.cgi/bands?page=1 >> >> -Speaking about fakes: http://fakefriends.me/ makes up fake identities >> including crawlable FOAF RDF data on the fly. And almost every elgg blog our >> FOAF crawler gets to crawl has been taken over by spammers or was installed by >> them in the first place. >> >> -Things can have so many different foaf:names. What is the canonical one ? >> We are currently using the one with the most quads but this is surely not >> the best possible solution. >> >> This list will probably grow much larger in the near future. >> >> Regards, >> >> Michael Brunnbauer >> >> -- >> ++ Michael Brunnbauer >> ++ netEstate GmbH >> ++ Geisenhausener Straße 11a >> ++ 81379 München >> ++ Tel +49 89 32 19 77 80 >> ++ Fax +49 89 32 19 77 89 >> ++ E-Mail brunni@netestate.de >> ++ http://www.netestate.de/ >> ++ >> ++ Sitz: München, HRB Nr.142452 (Handelsregister B München) >> ++ USt-IdNr. DE221033342 >> ++ Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer >> ++ Prokurist: Dipl. Kfm. (Univ.) Markus Hendel >> > > -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.12 (Darwin) iQIcBAEBAgAGBQJN+IhOAAoJEJ7QsE5R8vfvVTsP/0kx9/spxqLciwUAWCRHPT3V SWgsl/Rlk0i4SDOvBcyAdXpuOxQfB06nuY5Bps4RrfZWb5Q5AwYMThGmEDeXq1n+ STlD3eNsXBscaF5Yocnxp22Z2t98d3bNB8Lia5uuEJmq28mG+H3ijNqcDq7+ztnp f/XG+DV5ONXsE2XRmfQ8nFTKm/6Rkaylg49Ndjx0xcybEUXWthpBxdVprsKXHdq7 lIZ4/TtF5i/B37sIx5yOUhXs1d0wR+D+hkOIk0vBHoCbvcOhutE3LjanNAPK/B+f HWG2AAhc3w+syeXs2noABabCO+1Ac2CkKGfA4F2rhdD5xnk/tCEkwZGrqhb4W61k eOYdU1OI9epbayhVTimfRn28/I4/mwNmhuevQYNGmt3DuC7RrgPiH0OOqCuu+Cp3 Aed/lVt4lSyeHNQQCLBy8ZPDTfdPbXL449Dvsz6i/2fwFtFjHmTF/Z0Ac0HOiV0y eqxL+FOb3Qt0VAQ/Abklii282jwC91Wlb+TIifPjF9xD9aUzndbBxBNlPe7mtrIy QMNwgTerGlJx2FX+81v8EvmzjKuolVeMq+NzYA5ohiUZtiSWa7eJwms28aOCWj50 OOz+QTo4VaCcI0UVrWUcAeNHAfKgNV7eKX2wycPOPnjta/DHYAIuzvoTm3WLShSL YT+NT4LxkoRf9u26PRRA =ENLb -----END PGP SIGNATURE-----
Received on Wednesday, 15 June 2011 10:26:04 UTC