- From: Martin Hepp (UIBK) <martin.hepp@uibk.ac.at>
- Date: Fri, 18 Jan 2008 09:30:55 +0100
- To: Peter Ansell <ansell.peter@gmail.com>
- CC: KANZAKI Masahide <mkanzaki@gmail.com>, Danny Ayers <danny.ayers@gmail.com>, Peter F Brown <peter@pensive.eu>, Bernard Vatant <bernard.vatant@mondeca.com>, Reto Bachmann-Gmür <reto@gmuer.ch>, Leo Sauermann <leo.sauermann@dfki.de>, public-sweo-ig@w3.org, semantic-web@w3.org, Paul Roe <p.roe@qut.edu.au>, James Michael Hogan <j.hogan@qut.edu.au>
Hi Peter: > I speak mainly because there are some editors on wikipedia who would > prefer not to have semantic markup on pages because it makes them ugly First - using Wikipedia URIs as identfiers for concepts on the Semantic Web does not necessarily imply that anything is asserted about these URIs formally. (The Wikipedia page for John Lennon describes clearly enough which conceptual entity (i.e., John Lennon) it refers to, and the only ambiguity that may arise in this context is whether this URI refers to (1) the Wikipedia documents as an information resource or (2) the dead person John Lennon as a non-information resource. This, however, can be resolved easily by opening up a new namespace reserved for the respective non-information resources and creating a derived URI for each Wikipedia URI in this space e.g. http://en.wikipedia.org/wiki/John_lennon -> http://en.wikipedia.org/ontology/John_lennon) DBpedia IDs may also serve this purpose. So even without expanding Wikipedia, we can harvest the enormous amount of identifiers with a human-readable definition for weaving the Semantic Web. It is likely the largest set of consensual identifiers for conceptual entities in the world. > Wikipedia also does not create concepts until there is a sufficient > amount of "reliably published" information about them, and if they are > of no interest to people outside of the immediate community. I disagree, and I have data to support my claim: We have shown in the IEEE Internet Computing paper [1] that the vast amount of Wikipedia URIs keeps on referring to the same meaning from the initial page to the most recent one. So they are not constantly changing. Sometimes the meaning broadens (e.g. if a page turns into aa disambiguation page, which can be understood as a superconcept of the original one). In short, we found the following: - More than 92 % of all 1.8 Mio. URIs we analyzed showed a stable meaning, i.e., they kept on referring to the same meaning. - About 6.7% had a slight but not dramatic change in meaning so that the current definition was broader than the original one. This would still not invalidate earlier annotations of Web content. Most of these turned into disambiguation pages. (Our paper contains more details on that) So the amount of Wikipedia URIs that is not reliable as identifiers is extremely small - the population estimate is between 0.66 and 0.89 %(depending on whether we are using the Laplace or Wilson method). I bet that even centrally administered vocabularies will show inconsistencies in this order of magnitude. > I would be inclined to keep the new and constantly changing > identifiers within an organisations intranet-wiki and then publish > their relationships to outside identifiers when they become > accepted/published/interesting to outsiders. > Postponing the official use of new identifiers just means making our vocabulary lack identifiers for novel concepts. Also, there is no better way of getting identifiers "accepted" than by encouraging other to try to use them in their communication. (We find the same pattern in human language - new terms get established by usage, not by standardization.) Also, it does not hurt if Wikipedia provides URIs for topics that are relevant for a small community only. Still, it is better if there is a single namespace and infrastructure for those (I don't see any gain spreading those over numerous intranet-Wikis). Best Martin [1] http://www.heppnetz.de/harvesting-wikipedia/ ----------------------------------------------- martin hepp, http://www.heppnetz.de > On 17/01/2008, KANZAKI Masahide <mkanzaki@gmail.com> wrote: >> yep, you can think, for example, an Wikipedia page as a Subject Indicator. >> >> :me a foaf:Person; foaf:interest wikipedia:Semantic_Web . >> wikipedia:Semantic_Web foaf:primaryTopic concept:Semantic_Web . >> >> => :me foaf:topic_interest concept:Semantic_Web . >> >> In a sense, foaf:interest uses the object document as *an* indicator >> of the subject(URI of such document is a Subject Identifier). And a >> (P)SI can indicate the subject by using an IFP such as >> foaf:primaryTopic. >> >> So we can almost think that an Wikipedia page is an PSI, except it >> doesn't satisfy the last requirement of PSI: "A Published Subject >> Indicator must explicitly state the unique URI that is to be used as >> its Published Subject Identifier" (3.1.3 in spec). > > This is a clean way to define the identifier without creating a new > standard, other than the ontology. Of course, there is no need to > intrude on wikipedia, as it has its own interests at heart and holds > no claims to keep consistent URI's or to keep articles at any of their > URI's. DBPedia seems like a better option for overlaying the knowledge > in wikipedia with semantics. > > I speak mainly because there are some editors on wikipedia who would > prefer not to have semantic markup on pages because it makes them ugly > (equating wikipedia's infoboxes to semantic content here), and is > possibly incorrect (philosophy of not publishing anything till it is > perfect and correct), and there is nothing a group of outsiders can do > to change their point of view it seems. > > Wikipedia also does not create concepts until there is a sufficient > amount of "reliably published" information about them, and if they are > of no interest to people outside of the immediate community. This > leaves it closed to new information, so semantics can't grow within > its vocabulary framework, and there can never be a proliferation of > identifiers which are not going to be used outside of a small interest > group. An equivalent wiki somewhere based on a specific interest area > could go past the second restriction easily, but may still need to > hold onto the first restriction otherwise it may be seen as > unreliable. > > I would be inclined to keep the new and constantly changing > identifiers within an organisations intranet-wiki and then publish > their relationships to outside identifiers when they become > accepted/published/interesting to outsiders. > > Peter Ansell > >
Received on Friday, 18 January 2008 08:31:32 UTC