RE: how to: ordered collection of a Concept from ZENG, MARCIA on 2013-11-16 (public-esw-thes@w3.org from November 2013)

From: ZENG, MARCIA <mzeng@kent.edu>
Date: Sat, 16 Nov 2013 17:25:21 +0000
To: Stella Dextre Clarke <stella@lukehouse.org>, "vladimir.alexiev@ontotext.com" <vladimir.alexiev@ontotext.com>
CC: "public-esw-thes@w3.org" <public-esw-thes@w3.org>, "L.Will@willpowerinfo.co.uk" <L.Will@willpowerinfo.co.uk>, Joan Cobb <JCobb@getty.edu>, "PHarpring@getty.edu" <PHarpring@getty.edu>, "Garcia, Gregg" <GGarcia@getty.edu>
Message-ID: <C3BA22256F96C146A48BF10DBBE208F636C5BCE4@CH1PRD0810MB360.namprd08.prod.outlook.>
Hi, Stella,
There have been two threats going on for the same questions. I am including all of them in this thread so they could see what you suggested to Vladimir.
I summarize two sorts of issues:

Issue 1. Regarding ordered siblings. 
As I indicated before, 'ordered children' is the 'ordered siblings' issue. Patricia explained clearly:
(1) In AAT, the siblings are by default alphabetical except if another order is strongly warranted (e.g., due to a time-based orientation, in cases where it would be confusing and seem wrong to expert end-users if the order were alphabetical).  
(2) The order is coded in the database, so the siblings being either a) alpha or b) forced. 
(3) Gregg did the scan of those ordered siblings. They are spread among 194 families, total about 2000 individuals. 

I did not follow through the final decision after we indicated that skos:notation does not apply in AAT's case. 
However I think this still needs to be addressed and implied correctly: In principle, AAT does not employ a notation system, like almost all thesauri. The identifiers used by Gatty Vocabs do not possess semantics or systematic ordering meanings. 
Re: Vladmir's reply "I think that [AAT] identifiers quite match the definition of skos:notation given in the SKOS Primer and SKOS Reference (they don't say a notation should be sortable)." November 11, 2013 12:06 PM.  Now I think the meaning of skos:notation is broader than the best practices in structured vocabularies because we always think of a notation system (where 'system' implies the minimum characteristics). But in terms of definitions, both ISO 25964's and SKOS definitions did not emphasize on the systematic part. Maybe this could be re-visited?

Issue 2. Regarding the node labels (and guide terms) 
I sent some suggestions last weekend, similar to yours regarding node labels and guide terms, after the discussions in the third threads among skos-iso members, especially Leonard's suggestions. I also sent the extracted definitions/explanations from ISO 25964-1 for some of the concepts discussed. 
My suggestions were: 
(1) Treat true node labels as node labels, keep one preferred in each language, no alternative label for any language. (--That was one of the questions.)
(2) Some of the guide terms are clear concepts and AAT team is already dealing with them.
(3) Some other guide terms are representing very general concepts but AAT does not want to use in indexing. I consider they are the labels for general concepts. (This is similar to your suggestion, Stella, right? "One workaround might be to ignore all those angle brackets and treat all the guide terms as true concepts.")

Marcia
________________________________________
From: Stella Dextre Clarke [stella@lukehouse.org]
Sent: Saturday, November 16, 2013 10:00 AM
To: vladimir.alexiev@ontotext.com
Cc: public-esw-thes@w3.org; L.Will@willpowerinfo.co.uk; ZENG, MARCIA
Subject: Re: how to: ordered collection of a Concept

Dear Vladimir,
In earlier correspondence I think you said there is a commitment to
apply the ISO 25964 model to the AAT? In my opinion the AAT is a
wonderful vocabulary with many excellent features. But there are some
challenges when applying the standard because in some respects the AAT
does not follow ISO25964. I will not attempt to set out how you
could/should represent the data in RDF, but I will try to pinpoint some
of the challenges.
Mostly I'll be using ISO25964 parlance, which differs slightly from
AAT-speak. I hope we can overcome any confusion!

Addressing your points one by one:

On 15/11/2013 03:58, Vladimir Alexiev wrote:
>> I don't know how the AAT nowadays ensures the order of siblings in
>> an array
>
> There's a field sortOrder. If the values are the same, that means
> "not ordered", and AAT displays in alphabetical order of the EN
> label.
Ah yes, that sounds sensible.
>
>> Optionally, an array may have a node label. Optionally also, it may
>> have a superordinate concept.
>
> Consider these two cases that actually appear in AAT:
>
> 1. C1 < C2,C3: C1 (a concept) is parent of C2,C3 which are ordered 2.
> C1 < GT1 < C2,C3: C1 is parent of GT1 (a guide term), which in turn
> is parent of C2,C3 which are ordered
>
> Case 2 is clear: we represent GT1 as an Array that is ordered.
>
> My question is how to represent case 1, so it can be distinguished
> from case 2. In case 1 we also need to use an Array (there's nothing
> else that can be ordered, since a skos:OrderedCollection can't be put
> under anything). But it's an *inferior* array: it does not exist
> separately from C1, it is the *same* as C1. I agree with Leonard's
> suggestion to use an Array without node label (which I called
> *anonymous*, sorry if that caused any confusion). And we'll connect
> that inferior array to C1 using subordinateArray. Is that the best
> practice then?
I'm having difficulty understanding what you mean, probably because you
and I may be using different terminology to describe the same situation.
For example, take the expression "parent". For some people "parent"
means the broader concept in a BT/NT relationship; for others it just
means  up one level somehow in a visual hierarchical display.

I'm also struggling to understand what is meant by an "inferior" array.
Most of the thesauri I encounter do not have any node labels (or guide
terms). When node labels are present they can help to articulate a
hierarchical display, but do not cause the associated arrays to be
superior or inferior. Maybe "inferior array" is another way of saying
"subordinate array"? In that case, no problem. Whenever a thesaurus
concept has more than one narrower concept at one level down, those
narrower concepts form a subordinate array. (But I would not judge the
subordinate array to be "the same as" its broader concept.)

Would it all be clearer if we use some specific examples? I've concocted
some in the attachment herewith, hoping they illustrate your Case 1 and
Case 2. (And I've made it an attachment to avoid the indentation getting
messed up by our email clients.)

Please note that in my parlance, a node label is not part of an array,
nor is it a parent of an array. It is simply a label associated with an
array, and is conventionally shown in the line preceding the first
term/concept in the array.

Do these examples illustrate what you mean? If not, you could point to
some real examples in the online AAT? We might need another example in
any case, to illustrate the different situation with AAT guide terms
that are not really node labels (because they are intended to show
intermediate concepts in the hierarchy that are not recommended for use
in indexing. e.g. "<emergency vessels>" ID  300232863)

Clause 11 of ISO 25964 has more examples and explanations about node
labels, which are useful if facet analysis is to be applied in a more
elaborate way.
>
>> Implementation would proceed more comfortably, I suggest, if the
>> treatment of arrays does not depend on existence of some kind of
>> parent.
>
> I'm not sure what that means. For a thesaurus consumer (e.g.
> implementer of a TMS or thesaurus visualization) it's important to
> understand when to display a level. In case 1 above, he should *not*
> display an extra level between the concepts. Which will happen if we
> institute a practice "If an Array has no label, then don't display
> it".
Case 1 in the attachment shows an array with no node label. What's the
problem?
 > This will work fine for AAT, but if someone makes a whole tree
> of Arrays without labels, what would that mean? Oh well, that's for
> thesaurus consumers to worry about :-)
Take a look at the  MeSH Browser and you will find very extensive trees
of concepts without node
labels.<http://www.nlm.nih.gov/cgi/mesh/2013/MB_cgi>
>
>> Array must have at least one member concept
>
This is what we can see in the ISO 25964 model (see
<http://www.niso.org/schemas/iso25964/Model_2011-06-02.jpg>)
> Conceivably, it may have only member arrays, and the concepts may
> come some levels further down?
With the AAT, which displays guide terms almost as though they were
concepts, it is possible to find arrays of guide terms only (NB a guide
term alone is not an array). But this could be avoided if (a) in cases
like the one of "emergency vessels" cited above, the concepts were
recognised as such, and (b) the ISO 25964 definition of "hierarchical
relationship" were adopted (relationship between a pair of concepts of
which one has a scope falling completely within the scope of the other).

As I see it part of your challenge arises from wanting to display guide
terms as though they were concepts, and thus eligible for participating
in hierarchical relationships. One workaround might be to ignore all
those angle brackets and treat all the guide terms as true concepts. For
the human reader, there is no problem interpreting the resultant
display. (For example, in the hierarchical display for emergency
vessels, it is easy to work out what is happening between watercraft
and, say, fireboats. But if  a hierarchy like that is used for automatic
inferencing, as in the Semantic Web, it would generate some peculiar
inferences, such as: ' "watercraft by specific type" is a type of
watercraft')

A more logical workaround would not mix up guide terms with concepts,
but would find a way of ensuring that hierarchical relationships are
established *only* between concepts (not between terms, nor between a
concept and a term, nor between guide terms, nor between a guide term
and a concept). It should still be possible to display the guide terms
"outdented" from their associated arrays (see the alternative
presentation of Case 2 in my attachment), but a bit more programming
would be needed to achieve this.
>
> ------
>
>> identifier "300106739" for "Iron Age" is not designed for use as a
>> notation... the form taken by the notation system of a particular
>> thesaurus can be highly idiosyncratic. ISO 25964 ...does not make
>> any assumptions about the way that notation will be used, either
>> for ordering or anything else.
>
> If ISO does not pose constraints on notations, how did you judge that
> "300106739" is not a notation?
The first clue is that it looks typical of the sort of string commonly
used for thesaurus identifiers. Confirmation comes from the label "ID"
shown on the AAT online.
For more detailed discussion, look at the ISO25964 definitions of
notation and identifier. Even if you don't have a copy, you can find all
the definitions freely at <https://www.iso.org/obp/ui/>.

I've mapped it to skos:notation
> because it satisfies the description for notation given in the SKOS
> Primer and SKOS Reference. Anyway: when Marsha raised this issue,
> I've recorded it as an AAT Question, and we'll resolve it a bit
> later. If so decided, I'll turn that to dc:identifier.
A bit of confusion is understandable, since in some systems, especially
older ones, there is no ID separate from the notation. But better
practice is to keep the ID separate from the notation (and the problem
is completely removed if the thesaurus does not have any notation).

Sorry my attempts at explanation seem rather long, but I hope the
examples will help.
Stella Dextre Clarke


--
*****************************************************
Stella Dextre Clarke
Information Consultant and Project Leader, ISO NP 25964
Luke House, West Hendred, Wantage, OX12 8RR, UK
Tel: 01235-833-298
Fax: 01235-863-298
stella@lukehouse.org
*****************************************************
Received on Saturday, 16 November 2013 17:25:51 UTC