Re: how to: ordered collection of a Concept from Stella Dextre Clarke on 2013-11-16 (public-esw-thes@w3.org from November 2013)

From: Stella Dextre Clarke <stella@lukehouse.org>
Date: Sat, 16 Nov 2013 15:00:01 +0000
To: vladimir.alexiev@ontotext.com
CC: public-esw-thes@w3.org, L.Will@willpowerinfo.co.uk, "'ZENG, MARCIA'" <mzeng@kent.edu>
Message-ID: <52878871.3050904@lukehouse.org>
Dear Vladimir,
In earlier correspondence I think you said there is a commitment to 
apply the ISO 25964 model to the AAT? In my opinion the AAT is a 
wonderful vocabulary with many excellent features. But there are some 
challenges when applying the standard because in some respects the AAT 
does not follow ISO25964. I will not attempt to set out how you 
could/should represent the data in RDF, but I will try to pinpoint some 
of the challenges.
Mostly I'll be using ISO25964 parlance, which differs slightly from 
AAT-speak. I hope we can overcome any confusion!

Addressing your points one by one:

On 15/11/2013 03:58, Vladimir Alexiev wrote:
>> I don't know how the AAT nowadays ensures the order of siblings in
>> an array
>
> There's a field sortOrder. If the values are the same, that means
> "not ordered", and AAT displays in alphabetical order of the EN
> label.
Ah yes, that sounds sensible.
>
>> Optionally, an array may have a node label. Optionally also, it may
>> have a superordinate concept.
>
> Consider these two cases that actually appear in AAT:
>
> 1. C1 < C2,C3: C1 (a concept) is parent of C2,C3 which are ordered 2.
> C1 < GT1 < C2,C3: C1 is parent of GT1 (a guide term), which in turn
> is parent of C2,C3 which are ordered
>
> Case 2 is clear: we represent GT1 as an Array that is ordered.
>
> My question is how to represent case 1, so it can be distinguished
> from case 2. In case 1 we also need to use an Array (there's nothing
> else that can be ordered, since a skos:OrderedCollection can't be put
> under anything). But it's an *inferior* array: it does not exist
> separately from C1, it is the *same* as C1. I agree with Leonard's
> suggestion to use an Array without node label (which I called
> *anonymous*, sorry if that caused any confusion). And we'll connect
> that inferior array to C1 using subordinateArray. Is that the best
> practice then?
I'm having difficulty understanding what you mean, probably because you 
and I may be using different terminology to describe the same situation. 
For example, take the expression "parent". For some people "parent" 
means the broader concept in a BT/NT relationship; for others it just 
means  up one level somehow in a visual hierarchical display.

I'm also struggling to understand what is meant by an "inferior" array. 
Most of the thesauri I encounter do not have any node labels (or guide 
terms). When node labels are present they can help to articulate a 
hierarchical display, but do not cause the associated arrays to be 
superior or inferior. Maybe "inferior array" is another way of saying 
"subordinate array"? In that case, no problem. Whenever a thesaurus 
concept has more than one narrower concept at one level down, those 
narrower concepts form a subordinate array. (But I would not judge the 
subordinate array to be "the same as" its broader concept.)

Would it all be clearer if we use some specific examples? I've concocted 
some in the attachment herewith, hoping they illustrate your Case 1 and 
Case 2. (And I've made it an attachment to avoid the indentation getting 
messed up by our email clients.)

Please note that in my parlance, a node label is not part of an array, 
nor is it a parent of an array. It is simply a label associated with an 
array, and is conventionally shown in the line preceding the first 
term/concept in the array.

Do these examples illustrate what you mean? If not, you could point to 
some real examples in the online AAT? We might need another example in 
any case, to illustrate the different situation with AAT guide terms 
that are not really node labels (because they are intended to show 
intermediate concepts in the hierarchy that are not recommended for use 
in indexing. e.g. "<emergency vessels>" ID  300232863)

Clause 11 of ISO 25964 has more examples and explanations about node 
labels, which are useful if facet analysis is to be applied in a more 
elaborate way.
>
>> Implementation would proceed more comfortably, I suggest, if the
>> treatment of arrays does not depend on existence of some kind of
>> parent.
>
> I'm not sure what that means. For a thesaurus consumer (e.g.
> implementer of a TMS or thesaurus visualization) it's important to
> understand when to display a level. In case 1 above, he should *not*
> display an extra level between the concepts. Which will happen if we
> institute a practice "If an Array has no label, then don't display
> it".
Case 1 in the attachment shows an array with no node label. What's the 
problem?
 > This will work fine for AAT, but if someone makes a whole tree
> of Arrays without labels, what would that mean? Oh well, that's for
> thesaurus consumers to worry about :-)
Take a look at the  MeSH Browser and you will find very extensive trees 
of concepts without node 
labels.<http://www.nlm.nih.gov/cgi/mesh/2013/MB_cgi>
>
>> Array must have at least one member concept
>
This is what we can see in the ISO 25964 model (see 
<http://www.niso.org/schemas/iso25964/Model_2011-06-02.jpg>)
> Conceivably, it may have only member arrays, and the concepts may
> come some levels further down?
With the AAT, which displays guide terms almost as though they were 
concepts, it is possible to find arrays of guide terms only (NB a guide 
term alone is not an array). But this could be avoided if (a) in cases 
like the one of "emergency vessels" cited above, the concepts were 
recognised as such, and (b) the ISO 25964 definition of "hierarchical 
relationship" were adopted (relationship between a pair of concepts of 
which one has a scope falling completely within the scope of the other).

As I see it part of your challenge arises from wanting to display guide 
terms as though they were concepts, and thus eligible for participating 
in hierarchical relationships. One workaround might be to ignore all 
those angle brackets and treat all the guide terms as true concepts. For 
the human reader, there is no problem interpreting the resultant 
display. (For example, in the hierarchical display for emergency 
vessels, it is easy to work out what is happening between watercraft 
and, say, fireboats. But if  a hierarchy like that is used for automatic 
inferencing, as in the Semantic Web, it would generate some peculiar 
inferences, such as: ' "watercraft by specific type" is a type of 
watercraft')

A more logical workaround would not mix up guide terms with concepts, 
but would find a way of ensuring that hierarchical relationships are 
established *only* between concepts (not between terms, nor between a 
concept and a term, nor between guide terms, nor between a guide term 
and a concept). It should still be possible to display the guide terms 
"outdented" from their associated arrays (see the alternative 
presentation of Case 2 in my attachment), but a bit more programming 
would be needed to achieve this.
>
> ------
>
>> identifier "300106739" for "Iron Age" is not designed for use as a
>> notation... the form taken by the notation system of a particular
>> thesaurus can be highly idiosyncratic. ISO 25964 ...does not make
>> any assumptions about the way that notation will be used, either
>> for ordering or anything else.
>
> If ISO does not pose constraints on notations, how did you judge that
> "300106739" is not a notation?
The first clue is that it looks typical of the sort of string commonly 
used for thesaurus identifiers. Confirmation comes from the label "ID" 
shown on the AAT online.
For more detailed discussion, look at the ISO25964 definitions of 
notation and identifier. Even if you don't have a copy, you can find all 
the definitions freely at <https://www.iso.org/obp/ui/>.

I've mapped it to skos:notation
> because it satisfies the description for notation given in the SKOS
> Primer and SKOS Reference. Anyway: when Marsha raised this issue,
> I've recorded it as an AAT Question, and we'll resolve it a bit
> later. If so decided, I'll turn that to dc:identifier.
A bit of confusion is understandable, since in some systems, especially 
older ones, there is no ID separate from the notation. But better 
practice is to keep the ID separate from the notation (and the problem 
is completely removed if the thesaurus does not have any notation).

Sorry my attempts at explanation seem rather long, but I hope the 
examples will help.
Stella Dextre Clarke


-- 
*****************************************************
Stella Dextre Clarke
Information Consultant and Project Leader, ISO NP 25964
Luke House, West Hendred, Wantage, OX12 8RR, UK
Tel: 01235-833-298
Fax: 01235-863-298
stella@lukehouse.org
*****************************************************
Attachments

application/msword attachment: Examples_of_arrays.doc
Received on Saturday, 16 November 2013 15:00:27 UTC