RE: how to: ordered collection of a Concept from Vladimir Alexiev on 2013-11-17 (public-esw-thes@w3.org from November 2013)

From: Vladimir Alexiev <vladimir.alexiev@ontotext.com>
Date: Sun, 17 Nov 2013 23:08:40 +0200
To: "'Stella Dextre Clarke'" <stella@lukehouse.org>
Cc: <public-esw-thes@w3.org>, <L.Will@willpowerinfo.co.uk>, "'ZENG, MARCIA'" <mzeng@kent.edu>
Message-ID: <01a901cee3d9$31e56a50$95b03ef0$@alexiev@ontotext.com>
Hi Stella!

> In earlier correspondence I think you said there is a commitment to
> apply the ISO 25964 model to the AAT?

As much as possible. Let me give an example:
- Johan has agreed to use xl:prefLabel for the label of an Array (because rdfs:label is not supposed to point to xl:Label)
- AAT has a few secondary labels for a few of the Arrays. We'll use xl:altLabel to represent them, unless Getty totally gives up such labels.
   This is a very minor deviation from the standard, which is an extension (i.e if someone doesn't want such labels, they can just ignore them)

> would not mix up guide terms with concepts,
> but would find a way of ensuring that hierarchical relationships are
> established *only* between concepts

Getty has *one* polyhierarchy that uses different "subject types": Facets, Hierarchies, Guide Terms, and Concepts.
My intention is to:
1. represent all of Getty's Facets, Hierarchies and Guide Terms as iso:ThesaurusArray.
2. represent concepts as skos:Concept
3. represent the hierarchical relations using different properties:
  skos:Concept -skos:narrower-> 
  skos:Concept -iso:subordinateArray-> iso:ThesaurusArray
  iso:ThesaurusArray -skos:member-> skos:Concept
  iso:ThesaurusArray -skos:member-> iso:ThesaurusArray
4. "shortcut" skos:narrower properties to "go through Arrays".
  You can see an example in the attached diagram: the relation 
  containers  -skos:narrower-> vessels
  spans 2 levels in the original AAT hierarchy.

Would you suggest something else?

> the expression "parent"

I mean the node above a given node in the hierarchy, be that a concept or array or whatever.
(3 shows the different properties used)
 
> > Consider these two cases that actually appear in AAT:
> > 1. C1 < C2,C3: C1 (a concept) is parent of C2,C3 which are ordered
> > 2. C1 < GT1 < C2,C3: C1 is parent of GT1 (a guide term), which in turn
> > is parent of C2,C3 which are ordered

You give good examples in your doc. Here they are, I've added the things in *...*

Case 1
cycles
    *inferior array with no node label*
 monocycles
 bicycles
 tricycles

Case 2
milk
     <milk by fat content> : *Guide Term represented as Array with node label*
 skimmed milk
 semi-skimmed milk
 full fat milk

> > case1: it's an *inferior* array: it does not exist
> > separately from C1, it is the *same* as C1. I agree with Leonard's
> > suggestion to use an Array without node label

*Anonymous* is the word I use to describe an Array with no node label.
*Inferior* is the word I use to describe an Array that sprung into existence only to cause a Concept's children to be ordered.

> Whenever a thesaurus
> concept has more than one narrower concept at one level down, those
> narrower concepts form a subordinate array. (But I would not judge the
> subordinate array to be "the same as" its broader concept.)

In the Getty database there are 2  "subject records" for case2: "milk" and <milk by fat content>.

But there is only 1 "subject record" for case1: the concept "cycles". 
So by *same* I mean 1 record in the Getty database.

In the RDF representation I'll have 2 nodes in both cases (there's no other way since Concept and Array are disjoint),
but in case1 they'll share the ID in their URL, e.g.:
  aat:300123456    # concept "cycles"
  aat:300123456/array    # inferior/anonymous array used to order the children of "cycles"

> Case 1 in the attachment shows an array with no node label. What's the
> problem?

The display is ok: it shows no blank line (no level) for the array with no label.

> Most of the thesauri I encounter do not have any node labels 

As soon as people don't display anonymous arrays as a level in the hierarchy, I'm happy.

> AAT guide terms
> that are not really node labels (because they are intended to show
> intermediate concepts in the hierarchy that are not recommended for use
> in indexing. e.g. "<emergency vessels>" ID  300232863)

None of the AAT Facets, Hierarchies, or Guide Terms are intended for indexing 
(but Getty's converting some Guide Terms to true Concepts).

>  > This will work fine for AAT, but if someone makes a whole tree
> > of Arrays without labels, what would that mean? Oh well, that's for
> > thesaurus consumers to worry about :-)
> Take a look at the  MeSH Browser and you will find very extensive trees
> of concepts without node labels. <http://www.nlm.nih.gov/cgi/mesh/2013/MB_cgi>

This URL doesn’t work.

> (a) in cases
> like the one of "emergency vessels" cited above, the concepts were
> recognised as such

If Getty decides to turn the Guide Term <emergency vessels> into a true Concept, it will get mapped to skos:Concept.
Else it'll be mapped to iso:ThesaurusArray.

Would you suggest something else?

> As I see it part of your challenge arises from wanting to display guide
> terms as though they were concepts, and thus eligible for participating
> in hierarchical relationships. One workaround might be to ignore all
> those angle brackets and treat all the guide terms as true concepts.
> But if  a hierarchy like that is used for automatic
> inferencing, as in the Semantic Web, it would generate some peculiar
> inferences, such as: ' "watercraft by specific type" is a type of
> watercraft')

Exactly! Everything mapped to skos:Concept is fair game for indexing, 
and skos:narrower can be used in query expansion.
So treating Guide Terms as skos:Concepts is not an option.

--

> > If ISO does not pose constraints on notations, how did you judge that
> > "300106739" is not a notation?
> The first clue is that it looks typical of the sort of string commonly
> used for thesaurus identifiers. Confirmation comes from the label "ID"
> shown on the AAT online.

My point is the same as Marcia's: if ISO does not want "random" IDs to be used as notation, it should say so.
You quoted from ISO definitions and I didn’t see such restriction.
It's in examples: but I think it should also be in the definitions.
But that's a very minor point: I'm happy to use dc:identifier if Marcia says so.

Cheers! Vladimir
Received on Sunday, 17 November 2013 21:09:03 UTC