RE: Blank nodes for concepts. from Bernard Vatant on 2004-02-23 (public-esw-thes@w3.org from February 2004)

From: Bernard Vatant <bernard.vatant@mondeca.com>
Date: Mon, 23 Feb 2004 14:52:11 +0100
To: "Charles McCathieNevile" <charles@w3.org>
Cc: <public-esw-thes@w3.org>
Message-ID: <GOEIKOOAMJONEFCANOKCCEBADGAA.bernard.vatant@mondeca.com>
Charles, and all

I've been lurking for a few months on this forum and already missed several
occasions to jump in. I would like to bring here, thinking aloud and for
what it's worth, some thoughts from a parallel world. We've been struggling
with similar issues for quite a while now in the Topic Map community, and
singularly around the notion of Published Subjects, in the OASIS PubSubj TC
[1], and also in Mondeca developments where we often have to deal with
Thesaurus legacy.

*Charles wrote on Feb 09:
> I think that "identify by description" is probably more important than we
> realise. After all, the only way that I can tell anyone what my
> concept means is by description.

Maybe it depends on who is "anyone", human (H) or system (S), and what you
mean by "description". The former can make sense of unformal or rather
"semi-formal" descriptions, whereas the latter basically needs formal ones.
But we need efficient identification mechanism in H2H, S2H, and S2S
transactions. In fact most Semantic Web applications will involve those
three kinds of transactions together, so the crucial question is to figure
out which, if any, identification mechanism would be convenient and
non-ambiguous across all of them.
The first cut of Topic Maps Published Subjects approach was that S2S needs
only name-like identifiers (URIs), whereas H2H needs often description-like
identification. The Published Subjects mechanism would address the S2H
interaction, by matching in a non-ambiguous way system-usable "subject
identifiers" (URIs) to human-usable identification information resources
"subject indicators". BTW, yesterday message about WWAAC is more or less
about this kind of approach, subject indicators being symbols, graphics,
sounds or any other meaningful multimedia content.

Now further reflection led some people (including myself) last year in the
TM community to question this approach, along the lines of what Charles
points at. *Both* systems and humans can use and make sense of name-like
identifiers in some contexts, but both will need disambiguating
descriptions in other contexts. Both identifiers and descriptions
(indicators) can make for subject definition, depending on the context.
OTOH, using RDF-OWL can somehow blur the notions of identification,
definition and description, leading to some open issues, for example : If I
use in a local application, as subject (concept) identifier, the URI of a
class or instance in an OWL ontology, or of a descriptor in a SKOS
thesaurus, what is the level of ontological commitment involved by that
use? Does it mean whatever assertion I make locally about this concept has
to be consistent with (all) assertions made in the source ontology or
thesaurus "defining" it? "Consistency" here can be read either from an
(unformal) human user viewpoint, or from a (formal) system's one. Thorny
issue, and IMO the most important one the SW technologies are facing.

To come back to the blank node definition of a concept, although it's clear
to me why and how it could be done inside an RDF file, I'm still a bit
unclear how it could be used by external references (from another thesaurus
or ontology):

*Charles
> We can use identical graphs for two blank concept nodes to assert that
> they are the same, for a given purpose.

I like the idea, and actually this is something we have been working out in
Mondeca to a certain extent. For example in our interface using NLP tools,
we can identify two "acquisition" events coming from different news, by
first identifying the kind of event, then comparing the identity of role
players "buyer" and "bought" in the event association.
This is also the path that Topic Maps Reference Model folks (Steve Newcomb
and al.) have been following, through the notion of "Subject Identity
Discriminating Property"
See http://www.isotopicmaps.org/TMRM/TMRM-latest-clean.html#parid3039

Thanks for your attention

Bernard

[1] http://www.oasis-open.org/committees/tm-pubsubj/

Bernard Vatant
Senior Consultant
Knowledge Engineering
Mondeca - www.mondeca.com
bernard.vatant@mondeca.com
Received on Monday, 23 February 2004 08:52:20 UTC