Re: HTTP URIs for real world objects from Peter Ansell on 2008-01-19 (public-sweo-ig@w3.org from January 2008)

From: Peter Ansell <ansell.peter@gmail.com>
Date: Sun, 20 Jan 2008 07:04:10 +1000
To: martin.hepp@uibk.ac.at
Cc: "KANZAKI Masahide" <mkanzaki@gmail.com>, "Danny Ayers" <danny.ayers@gmail.com>, "Peter F Brown" <peter@pensive.eu>, "Bernard Vatant" <bernard.vatant@mondeca.com>, "Reto Bachmann-Gmür" <reto@gmuer.ch>, "Leo Sauermann" <leo.sauermann@dfki.de>, public-sweo-ig@w3.org, semantic-web@w3.org, "Paul Roe" <p.roe@qut.edu.au>, "James Michael Hogan" <j.hogan@qut.edu.au>
Message-ID: <a1be7e0e0801191304l45fae1evbafeb5a88a3781c8@mail.gmail.com>

On 18/01/2008, Martin Hepp (UIBK) <martin.hepp@uibk.ac.at> wrote:
> Hi Peter:
>
>  > I speak mainly because there are some editors on wikipedia who would
>  > prefer not to have semantic markup on pages because it makes them ugly
>
> First - using Wikipedia URIs as identfiers for concepts on the Semantic
> Web does not necessarily imply that anything is asserted about these
> URIs formally. (The Wikipedia page for John Lennon describes clearly
> enough which conceptual entity (i.e., John Lennon) it refers to, and the
> only ambiguity that may arise in this context is whether this URI refers
> to (1) the Wikipedia documents as an information resource or (2) the
> dead person John Lennon as a non-information resource. This, however,
> can be resolved easily by opening up a new namespace reserved for the
> respective non-information resources and creating a derived URI for each
> Wikipedia URI in this space e.g.
>
> http://en.wikipedia.org/wiki/John_lennon ->
>
> http://en.wikipedia.org/ontology/John_lennon)
>
> DBpedia IDs may also serve this purpose.
>
> So even without expanding Wikipedia, we can harvest the enormous amount
> of identifiers with a human-readable definition for weaving the Semantic
> Web. It is likely the largest set of consensual identifiers for
> conceptual entities in the world.

My point was only in a pragmatic sense of being able to actually
utilise the URL to expand ones knowledge of the data. If you use the
wikipedia URL you are at best getting an HTML representation of the
concept, with no way to get to a semantic representation as is. If you
use the semantic representation as your default identifier you can
dereference it and retrieve the HTML representation is you require
that. There is a good reason for utilising RDF representations as the
URI of the default identifiers.

>  > Wikipedia also does not create concepts until there is a sufficient
>  > amount of "reliably published" information about them, and if they are
>  > of no interest to people outside of the immediate community.
>
> I disagree, and I have data to support my claim: We have shown in the
> IEEE Internet Computing paper [1] that the vast amount of Wikipedia URIs
> keeps on referring to the same meaning from the initial page to the most
> recent one. So they are not constantly changing. Sometimes the meaning
> broadens (e.g. if a page turns into aa disambiguation page, which can be
> understood as a superconcept of the original one).

I have read your article in the past. In this case it suffers from not
tracing, or even being able to trace, the novel terms which get
created and deleted in Wikipedia constantly as part of their battle
against the big wide world of enthusiatic editors.

> In short, we found the following:
>
> - More than 92 % of all 1.8 Mio. URIs we analyzed showed a stable
> meaning, i.e., they kept on referring to the same meaning.
>
> - About 6.7% had a slight but not dramatic change in meaning so that the
> current definition was broader than the original one. This would still
> not invalidate earlier annotations of Web content. Most of these turned
> into disambiguation pages. (Our paper contains more details on that)
>
> So the amount of Wikipedia URIs that is not reliable as identifiers is
> extremely small - the population estimate is between 0.66 and 0.89
> %(depending on whether we are using the Laplace or Wilson method). I bet
> that even centrally administered vocabularies will show inconsistencies
> in this order of magnitude.

I agree in the main to your argument, but it was not relevant to my
statement. I was querying about the level of certainty needed to
maintain or even get a wikipedia article, you were merely stating the
level of stability for articles that were accepted.

>  > I would be inclined to keep the new and constantly changing
>  > identifiers within an organisations intranet-wiki and then publish
>  > their relationships to outside identifiers when they become
>  > accepted/published/interesting to outsiders.
>  >
> Postponing the official use of new identifiers just means making our
> vocabulary lack identifiers for novel concepts. Also, there is no better
> way of getting identifiers "accepted" than by encouraging other to try
> to use them in their communication. (We find the same pattern in human
> language - new terms get established by usage, not by standardization.)

You don't necessarily have to postpone the use, but you should utilise
different distribution channels for your information if it doesn't
have multiple "independent refereed by top journal publications" yet.

> Also, it does not hurt if Wikipedia provides URIs for topics that are
> relevant for a small community only. Still, it is better if there is a
> single namespace and infrastructure for those (I don't see any gain
> spreading those over numerous intranet-Wikis).
>

You might think that it would not hurt Wikipedia to provide these
URL's, but there are some quite militant groups within the Wikipedia
community who actively remove novel references and those which are not
of interest outside of small communities. Don't rely on Wikipedia for
hosting semantic descriptions of things. They have their own
philosophical battles that will only harm the semantic web if you rely
directly on them to describe content for you.

Peter Ansell

Received on Saturday, 19 January 2008 21:04:24 UTC