- From: Bjoern Hoehrmann <derhoermi@gmx.net>
- Date: Sat, 19 Jun 2010 20:53:51 +0200
- To: Toby Inkster <tai@g5n.co.uk>
- Cc: www-archive@w3.org
* Toby Inkster wrote: >On Fri, 18 Jun 2010 22:58:46 +0200 >Bjoern Hoehrmann <derhoermi@gmx.net> wrote: > >> For 98% of the articles we can predict the location with an error of >> at around 1000km in both cases > >Interesting; 1000km is quite a wide margin of error though. It's about >the distance from where I live, just outside Brighton near the English >south coast, to Prague in the Czech Republic. There are two whole >countries in between - one of them being Germany (not a small place by >any means) - and a fairly big stretch of water too. There is some inherent uncertainty to the location of some objects, yet the german Wikipedia associates coordinates with those articles. Russia for instance, or the pacific ocean. The coordinate template does allow specifying a "dimension" but that is usually omitted. The location in those cases is rather arbitrary (de.wp and en.wp put the pacific ocean at rather different positions, for instance), so those are not errors. Without knowing the uncertainty for the positions however, I cannot tell how many of the far-off guesses are rather natural, and how many really are bad guesses with a large error. Besides, for 80% of the cases, it's within 40-70km depending on which kind of link you use, that's very good for this rather simple approach. A better approach would probably give links different weights depending on where they are on the page, if at the beginning, a lot of weight and if transcluded a lot less, but to do that one would have to parse the pages, which is rather expensive compared to simply going through the db table dumps. -- Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de 25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
Received on Saturday, 19 June 2010 18:54:28 UTC