Subject Identity Discrimination Properties - in Topic Maps and in OWL

From: Bernard Vatant <bernard.vatant@mondeca.com> · Date: Wed, 5 Nov 2003 15:43:15 +0100

Following various exchanges I had lately, please find below a copy of
comments I made today about a draft of
ISO 13250-1 : Topic Maps - Overview and Basic Concepts.
The original post and thread follow-up can be found at:
http://www.isotopicmaps.org/pipermail/sc34wg3/2003-November/001909.html

It deals with subject identification. What is proposed is hopefully also of
interest to the OWL community. The rationale is that so far both Topic Maps
and RDF (hence OWL) use a very restrictive way of establishing subject
identity : use of a single identifier (URI string), although subject
identity could be established on more general basis by identical values for
a specific subset of properties.

Porting the issue from TM land to OWL land, what currently lacks in OWL is
some property type that I would call here for illustration sake, and to use
the same language that draft Topic Map Reference Model "SIDP" (Subject
Identity Discrimination Property) a "DiscriminatingProperty". This type
could be added to any Property in a Class definition, using a specific
restriction, e.g. in the definition of a Class "Document" I would have a
restriction like

onProperty 	dc:creator	rdf:type="DiscriminatingProperty"

And the like for dc:title, dc:publisher, dc:format, dc:date.

The semantics of DiscriminatingProperty being expressed informally as:

"If Sid(X) is the set of all properties having a DiscriminatingProperty
restriction on domain X, then instances of X having the same value for all
properties in Sid(X) are identical individuals."

I don't see any way to declare equivalent semantics using current OWL
vocabulary.
What would be the impact of adding owl:DiscriminatingProperty to the OWL
model?

Thanks for your attention.

Bernard

Bernard Vatant
Senior Consultant
Knowledge Engineering
Mondeca - www.mondeca.com
bernard.vatant@mondeca.com

-----Message d'origine-----
De : sc34wg3-admin@isotopicmaps.org
[mailto:sc34wg3-admin@isotopicmaps.org]De la part de Bernard Vatant
Envoye : mercredi 5 novembre 2003 12:00
A : sc34wg3@isotopicmaps.org
Objet : Section 4.3 Subject Identity RE: [sc34wg3] Strawman draft of ISO
13250-1

Steve, and all

I guess you won't be surprised by the section on which I have comments :)

In fact I am striken by the reduction happening in this section, which
begins very very well, opening the door to a very good and generic
definition of subject identity (although I would prefer subject
identification, as explained below), and then restricts it practically to
the use of subject locator and subject indicator - actually the mechanisms
used by the Standard Data Model - although the Reference Model, through the
notion of SIDP, offers the possibility of an unbounded number of other
identification mechanisms for TM applications. What I propose below is an
alternative wording that does not close the door to those. This is a first
cut coming out of my breakfast coffee. Please fix my prose where needed.

N446 : "Subject identity is a set of properties of a topic that enable
applications (and humans) to know which subject the topic represents and,
in particular, to know when two topics represent the same subject and must
therefore be merged."

I'm uneasy with this absolute definition of subject identity given by the
first "is", because I don't think there is something like an absolute
subject identity, only an interagreement on subject identification process.
And more uneasy with the use of "know" in this context: We don't and will
never know what "know" means for humans, and AFAIK it does not mean
anything for applications :))
The last "therefore must be merged" is superfluous here IMO. This section
should focus on how subject identity is established, not on how it is used.
Applications can do what they want with this information.

Alt:

"The core requirement for semantic interoperability of topic map
applications is interagreement on subject identification mechanisms,
enabling both humans and applications to establish when and how different
topics, either from the same topic map or different ones, should be
interpreted as representing the same subject and processed accordingly.
Such identification mechanisms use specific set of topic properties,
defined by a topic map data model, and constituting the "subject identity"
for applications conformant to that model."

Note the relative and open nature of this definition. There is nothing here
as absolute subject identity, since it relies on agreed-upon mechanisms for
subject identification.

N446 : "In recognition of the distinction between addressable subjects
(i.e., information resources) and subjects in general (i.e., information
resources and non-addressable subjects), Topic Maps provides two mechanisms
for specifying subject identity, both of which use locators."

Are not those mechanisms specific of a TM data model, even if it is the
standard one (TMDM)? If so, it should be specified here, replacing "Topic
Maps" by "the standard Topic Map Data Model"

And something should be added about the possibility of other identification
mechanisms in "non-standard" data models, based on any kind of properties
specific TM applications would agree upon.

It is a frequent case that no single property can be used for
identification, but that a set of well-chosen properties provides identity.
And I think it's a crucial choice for TM standard to decide that
identification mechanisms should be based only on single property values
(like subject locators) or could use "ad hoc" identifying set of
properties. Choosing the latter opens interesting paths, that would be
forbidden by the former. Suppose for example TM applications dedicated to
document management agree upon a set of identifying properties being e.g. a
subset of Dublin Core like:

{dc:creator, dc:title, dc:publisher, dc:date, dc:format}

Default unique identifiers like ISBN or other PSI, it makes sense to use
such a property set as a basis for subject identification: two topics
represent the same document if those five properties are equal. This
specific identifying set of properties for this specific class of topics
(documents) could be formally declared in an ontology. A declaration of
commitment to this ontology for a given topic map (using any relevant
language : TMCL, OWL, whatever) would therefore provide specific ways to
applications who care to establish identity for this specific class of
topics.

They'll correct me if I am wrong, but seems to me, from a recent
conversation with Michel and Steve (Newcomb), that this kind of perspective
is what the Reference Model is about.

Bernard

> -----Message d'origine-----
> De : sc34wg3-admin@isotopicmaps.org
> [mailto:sc34wg3-admin@isotopicmaps.org]De la part de Steve Pepper
> Envoye : lundi 3 novembre 2003 20:31
> A : G. Ken Holman - ISO/IEC JTC 1/SC 34 Secretary
> Cc : sc34wg3@isotopicmaps.org
> Objet : [sc34wg3] Strawman draft of ISO 13250-1
>
>
> Attached please find N446, a first Working Draft of
> Part 1 of ISO 13250.
>
_______________________________________________
sc34wg3 mailing list
sc34wg3@isotopicmaps.org
http://www.isotopicmaps.org/mailman/listinfo/sc34wg3