- From: Phillip Lord <Phillip.Lord@newcastle.ac.uk>
- Date: Thu, 9 Feb 2006 17:15:35 -0000
- To: <matt@biomedcentral.com>, <public-semweb-lifesci@w3.org>
matt@biomedcentral.com wrote: > In terms of preserving the bits: > > Just as, if you believe that the only way to preserve BioMed > Central's content is to scratch it in a shrinking spiral onto 2" > nickel disks (as recommended here > http://www.longnow.org/projects/conferences/10klibrary/ ), then > thanks to the Creative Commons license nothing stops you from doing > so. Same for Wikipedia. > It's the identifers that are the problem; wikipedia only offers URL's, and these don't change when the version changes. If wikipedia offered DOI's, for example, to the pages, then these could update automatically. As you know, versioning is one of the biggest problems in life sciences. Records change constantly, and things go out of date, and you can't tell that is happened. LSID's made an attempt to solve this problem; the version part of the LSID is the only part of the identifer which carries semantics. I really don't want to try and hack something on top of a source which doesn't support version descriptions upfront. > > > In terms of preserving the links: > >> Links can break, or refer to other things than they started out. > > > Yes indeed - that is surely in the very nature of an evolving > ontology. But a core aspect of any semantically enhanced version of > wikipedia (or other semantic wiki project) would certainly be how to > manage the various different possible forms of aliasing which would > be part of managing this evolution. e.g. right now, within Wikipedia, > the primary form of aliasing is that where there were, say, 3 > previous entries that have converted into a single entry, there is a > redirection/rewrite in place > e.g. > http://en.wikipedia.org/wiki/Einstein > actually takes you to > http://en.wikipedia.org/wiki/Albert_Einstein > > This aliasing mechanism would need to get subtler and more structured > to convey the relevant history of the concepts concerned. Agreed. This is very blunt at the moment. Actually, there is another common form of link. Take this URL. http://en.wikipedia.org/wiki/Snare I wrote this page about 5 years ago. When it was created, it referred to the musical usage; I had several incoming links to it. They were all fixed, as it happens, so the creation of a disambiguation page didn't hurt. But this is only possible because wikipedia is a closed world, so it's possible to identify the internal links. > An of course, at any point in time, any organisation may choose to > snapshot wikipedia and comb through the relevant to create a > trustworthy version, backed by their imprimatur. archive.org probably already does this for us. Again, though, snapshotting by crawling just seems a bad way to go. There are ways around this; if you created a wiki which release all changes as reverse diffs, saying using RSS (hey, I knew I could get this back clearly to semantic web technologies!), then by storing the RSS stream, you could recreate the contents of the website and any point in time. If you then updated a version number any time anything changed, the problem might be solved So.... http://en.wikipedia.org/wiki/Snare would point to all (or any) version of the link while, http://en.wikipedia.org/wiki/Snare/7674 would point specifically to the version of that link from version 7674. All of this would be easy to do. But much easier to do, if the original data source supports it. Phil
Received on Thursday, 9 February 2006 17:15:58 UTC