RE: how to: ordered collection of a Concept from Johan De Smedt on 2013-11-18 (public-esw-thes@w3.org from November 2013)

From: Johan De Smedt <johan.de-smedt@tenforce.com>
Date: Mon, 18 Nov 2013 07:25:18 +0100
To: "'Stella Dextre Clarke'" <stella@lukehouse.org>, "'ZENG, MARCIA'" <mzeng@kent.edu>
Cc: <vladimir.alexiev@ontotext.com>, <public-esw-thes@w3.org>, <L.Will@willpowerinfo.co.uk>, "'Joan Cobb'" <JCobb@getty.edu>, <PHarpring@getty.edu>, "'Garcia, Gregg'" <GGarcia@getty.edu>
Message-ID: <043001cee426$f4101d30$dc305790$@tenforce.com>
Hi Marcia, Stella, Vladimir,

My understanding is that the current schema provides all that is needed concerning the discussed problems.
1) ordering.
- For ordering, "skos" provides OrderedCollections.
  "iso-thes" provides for ThesaurusArray is a "skos" Collection that can be a "skos" OrderedCollection.
  This is an explicit ordering - so has priority.
- Representation of identifiers and notation are documented in "iso-thes" to be represented by "dct" identifier and "skos" notation.
  Any instruction on ordering or lack of order is specific for the used "notation" system; identifier on the other hand typically has no implied ordering.
  If the thesaurus systems implies (by documentation) an ordering on "notation", this could be used in absence of ("iso-thes" ThesaurusArray that are) "skos" OrderedCollection.
  Such documentation cannot be in scope of either SKOS or ISO-25864. It is in scope of a particular thesaurus application.
  This would be the first fallback scenario for ordering, in case there are no ordered collections or these cannot be handled by a display system.
- Concepts have preferred terms per language.  Typically one preferred term is required per language.
  In general, for multi-lingual thesauri with a high number of supported languages, a limited number of those languages is seen as "assured for having a preferred term on all concepts - whatever the state".
  Sorting by preferred term is the ultimate fallback.
2) node label
- The iso-thes model documents a "skos-xl" prefLabel (with range "skos-xl" Label) can be used to represent the node label of an "iso-thes" ThesaurusArray (and ConceptGroup).
- As per ISO 25964, the prefLabel is optional and there as at most one per language.
- It is possible for any specific application to use additional labels (e.g. "skos-xl" altLabel) if needed. 
  Documentation and semantics of such additonal label usage is outside of ISO 25964 and outside of SKOS.
- By "skos-xl" rule S55, existence of a skos-xl:prefLabel implies a skos:prefLabel will exist for the same subject (occasionally a ThesaurusArray).
  By "skos" rule 11, existence of a skos:prefLabel implies also an rdfs:label will exist for the same subject (occasionally a ThesaurusArray).

Is my understanding that this provides sufficient tools for AAT correct?

Kind Regards,

Johan De Smedt 
> -----Original Message-----
> From: Stella Dextre Clarke [mailto:stella@lukehouse.org]
> Sent: Saturday, 16 November, 2013 21:37
> To: ZENG, MARCIA
> Cc: vladimir.alexiev@ontotext.com; public-esw-thes@w3.org; L.Will@willpowerinfo.co.uk; Joan Cobb;
> PHarpring@getty.edu; Garcia, Gregg
> Subject: Re: how to: ordered collection of a Concept
> 
> On 16/11/2013 17:25, ZENG, MARCIA wrote:
> > Hi, Stella, There have been two threats going on for the same
> > questions.
> Hopefully they are Threads rather than Threats :-)
> But yes it's hard to reply to any of it without feeling a bit lost.
> 
>  > I am including all of them in this thread so they could see what you
>  > suggested to Vladimir.
> > I summarize two sorts of issues:
> >
> > Issue 1. Regarding ordered siblings. As I indicated before, 'ordered
> > children' is the 'ordered siblings' issue. Patricia explained
> > clearly: (1) In AAT, the siblings are by default alphabetical except
> > if another order is strongly warranted (e.g., due to a time-based
> > orientation, in cases where it would be confusing and seem wrong to
> > expert end-users if the order were alphabetical). (2) The order is
> > coded in the database, so the siblings being either a) alpha or b)
> > forced. (3) Gregg did the scan of those ordered siblings. They are
> > spread among 194 families, total about 2000 individuals.
> I don't believe there is any issue or problem about ordering. It's a
> great feature, when you have the resources to apply it.
> >
> > I did not follow through the final decision after we indicated that
> > skos:notation does not apply in AAT's case. However I think this
> > still needs to be addressed and implied correctly: In principle, AAT
> > does not employ a notation system, like almost all thesauri. The
> > identifiers used by Gatty Vocabs do not possess semantics or
> > systematic ordering meanings. Re: Vladmir's reply "I think that [AAT]
> > identifiers quite match the definition of skos:notation given in the
> > SKOS Primer and SKOS Reference (they don't say a notation should be
> > sortable)." November 11, 2013 12:06 PM.  Now I think the meaning of
> > skos:notation is broader than the best practices in structured
> > vocabularies because we always think of a notation system (where
> > 'system' implies the minimum characteristics). But in terms of
> > definitions, both ISO 25964's and SKOS definitions did not emphasize
> > on the systematic part. Maybe this could be re-visited?
> I don't believe there is much problem here either. The ISO 25964
> definition of notation is supported by examples that make it pretty
> clear. Maybe the SKOS definition could be improved (but I hope to be
> lazy and leave that to someone else!) If any work is to be done, it
> should be in the context of standardizing classification schemes rather
> than thesauri.
> >
> > Issue 2. Regarding the node labels (and guide terms) I sent some
> > suggestions last weekend, similar to yours regarding node labels and
> > guide terms, after the discussions in the third threads among
> > skos-iso members, especially Leonard's suggestions. I also sent the
> > extracted definitions/explanations from ISO 25964-1 for some of the
> > concepts discussed. My suggestions were: (1) Treat true node labels
> > as node labels, keep one preferred in each language, no alternative
> > label for any language. (--That was one of the questions.) (2) Some
> > of the guide terms are clear concepts and AAT team is already dealing
> > with them. (3) Some other guide terms are representing very general
> > concepts but AAT does not want to use in indexing. I consider they
> > are the labels for general concepts. (This is similar to your
> > suggestion, Stella, right? "One workaround might be to ignore all
> > those angle brackets and treat all the guide terms as true
> > concepts.")
> Marcia, it's plain that Getty has a project under way for dealing with
> node labels and I don't know enough about it to comment. My remarks
> about workarounds  were pretty limited, since I don't know what size of
> budget/workforce is available to overhaul the whole thesaurus. Your
> categories (1) and (2) sound straightforward enough, provided someone
> has the time/resource to sort them all out. But as for category (3) -
> general concepts - I'd prefer to look at some specific examples before
> making any suggestions. (The workaround you have quoted above is not one
> I'd really recommend, for the reason explained in my original message.)
> 
> Finally, it's great to know Patricia and her team are taking this
> project so seriously; I wish you every success in sorting it out.
> Regards to All,
> Stella
> *****************************************************
> Stella Dextre Clarke
> Information Consultant and Project Leader, ISO NP25964
> Luke House, West Hendred, Wantage, OX12 8RR, UK
> Tel: 01235-833-298
> Fax: 01235-863-298
> stella@lukehouse.org
> *****************************************************
> 
> 
> >
> > Marcia ________________________________________ From: Stella Dextre
> > Clarke [stella@lukehouse.org] Sent: Saturday, November 16, 2013 10:00
> > AM To: vladimir.alexiev@ontotext.com Cc: public-esw-thes@w3.org;
> > L.Will@willpowerinfo.co.uk; ZENG, MARCIA Subject: Re: how to: ordered
> > collection of a Concept
> >
> > Dear Vladimir, In earlier correspondence I think you said there is a
> > commitment to apply the ISO 25964 model to the AAT? In my opinion the
> > AAT is a wonderful vocabulary with many excellent features. But there
> > are some challenges when applying the standard because in some
> > respects the AAT does not follow ISO25964. I will not attempt to set
> > out how you could/should represent the data in RDF, but I will try to
> > pinpoint some of the challenges. Mostly I'll be using ISO25964
> > parlance, which differs slightly from AAT-speak. I hope we can
> > overcome any confusion!
> >
> > Addressing your points one by one:
> >
> > On 15/11/2013 03:58, Vladimir Alexiev wrote:
> >>> I don't know how the AAT nowadays ensures the order of siblings
> >>> in an array
> >>
> >> There's a field sortOrder. If the values are the same, that means
> >> "not ordered", and AAT displays in alphabetical order of the EN
> >> label.
> > Ah yes, that sounds sensible.
> >>
> >>> Optionally, an array may have a node label. Optionally also, it
> >>> may have a superordinate concept.
> >>
> >> Consider these two cases that actually appear in AAT:
> >>
> >> 1. C1 < C2,C3: C1 (a concept) is parent of C2,C3 which are ordered
> >> 2. C1 < GT1 < C2,C3: C1 is parent of GT1 (a guide term), which in
> >> turn is parent of C2,C3 which are ordered
> >>
> >> Case 2 is clear: we represent GT1 as an Array that is ordered.
> >>
> >> My question is how to represent case 1, so it can be distinguished
> >> from case 2. In case 1 we also need to use an Array (there's
> >> nothing else that can be ordered, since a skos:OrderedCollection
> >> can't be put under anything). But it's an *inferior* array: it does
> >> not exist separately from C1, it is the *same* as C1. I agree with
> >> Leonard's suggestion to use an Array without node label (which I
> >> called *anonymous*, sorry if that caused any confusion). And we'll
> >> connect that inferior array to C1 using subordinateArray. Is that
> >> the best practice then?
> > I'm having difficulty understanding what you mean, probably because
> > you and I may be using different terminology to describe the same
> > situation. For example, take the expression "parent". For some people
> > "parent" means the broader concept in a BT/NT relationship; for
> > others it just means  up one level somehow in a visual hierarchical
> > display.
> >
> > I'm also struggling to understand what is meant by an "inferior"
> > array. Most of the thesauri I encounter do not have any node labels
> > (or guide terms). When node labels are present they can help to
> > articulate a hierarchical display, but do not cause the associated
> > arrays to be superior or inferior. Maybe "inferior array" is another
> > way of saying "subordinate array"? In that case, no problem. Whenever
> > a thesaurus concept has more than one narrower concept at one level
> > down, those narrower concepts form a subordinate array. (But I would
> > not judge the subordinate array to be "the same as" its broader
> > concept.)
> >
> > Would it all be clearer if we use some specific examples? I've
> > concocted some in the attachment herewith, hoping they illustrate
> > your Case 1 and Case 2. (And I've made it an attachment to avoid the
> > indentation getting messed up by our email clients.)
> >
> > Please note that in my parlance, a node label is not part of an
> > array, nor is it a parent of an array. It is simply a label
> > associated with an array, and is conventionally shown in the line
> > preceding the first term/concept in the array.
> >
> > Do these examples illustrate what you mean? If not, you could point
> > to some real examples in the online AAT? We might need another
> > example in any case, to illustrate the different situation with AAT
> > guide terms that are not really node labels (because they are
> > intended to show intermediate concepts in the hierarchy that are not
> > recommended for use in indexing. e.g. "<emergency vessels>" ID
> > 300232863)
> >
> > Clause 11 of ISO 25964 has more examples and explanations about node
> > labels, which are useful if facet analysis is to be applied in a
> > more elaborate way.
> >>
> >>> Implementation would proceed more comfortably, I suggest, if the
> >>> treatment of arrays does not depend on existence of some kind of
> >>> parent.
> >>
> >> I'm not sure what that means. For a thesaurus consumer (e.g.
> >> implementer of a TMS or thesaurus visualization) it's important to
> >> understand when to display a level. In case 1 above, he should
> >> *not* display an extra level between the concepts. Which will
> >> happen if we institute a practice "If an Array has no label, then
> >> don't display it".
> > Case 1 in the attachment shows an array with no node label. What's
> > the problem?
> >> This will work fine for AAT, but if someone makes a whole tree of
> >> Arrays without labels, what would that mean? Oh well, that's for
> >> thesaurus consumers to worry about :-)
> > Take a look at the  MeSH Browser and you will find very extensive
> > trees of concepts without node
> > labels.<http://www.nlm.nih.gov/cgi/mesh/2013/MB_cgi>
> >>
> >>> Array must have at least one member concept
> >>
> > This is what we can see in the ISO 25964 model (see
> > <http://www.niso.org/schemas/iso25964/Model_2011-06-02.jpg>)
> >> Conceivably, it may have only member arrays, and the concepts may
> >> come some levels further down?
> > With the AAT, which displays guide terms almost as though they were
> > concepts, it is possible to find arrays of guide terms only (NB a
> > guide term alone is not an array). But this could be avoided if (a)
> > in cases like the one of "emergency vessels" cited above, the
> > concepts were recognised as such, and (b) the ISO 25964 definition of
> > "hierarchical relationship" were adopted (relationship between a pair
> > of concepts of which one has a scope falling completely within the
> > scope of the other).
> >
> > As I see it part of your challenge arises from wanting to display
> > guide terms as though they were concepts, and thus eligible for
> > participating in hierarchical relationships. One workaround might be
> > to ignore all those angle brackets and treat all the guide terms as
> > true concepts. For the human reader, there is no problem interpreting
> > the resultant display. (For example, in the hierarchical display for
> > emergency vessels, it is easy to work out what is happening between
> > watercraft and, say, fireboats. But if  a hierarchy like that is used
> > for automatic inferencing, as in the Semantic Web, it would generate
> > some peculiar inferences, such as: ' "watercraft by specific type" is
> > a type of watercraft')
> >
> > A more logical workaround would not mix up guide terms with
> > concepts, but would find a way of ensuring that hierarchical
> > relationships are established *only* between concepts (not between
> > terms, nor between a concept and a term, nor between guide terms, nor
> > between a guide term and a concept). It should still be possible to
> > display the guide terms "outdented" from their associated arrays (see
> > the alternative presentation of Case 2 in my attachment), but a bit
> > more programming would be needed to achieve this.
> >>
> >> ------
> >>
> >>> identifier "300106739" for "Iron Age" is not designed for use as
> >>> a notation... the form taken by the notation system of a
> >>> particular thesaurus can be highly idiosyncratic. ISO 25964
> >>> ...does not make any assumptions about the way that notation will
> >>> be used, either for ordering or anything else.
> >>
> >> If ISO does not pose constraints on notations, how did you judge
> >> that "300106739" is not a notation?
> > The first clue is that it looks typical of the sort of string
> > commonly used for thesaurus identifiers. Confirmation comes from the
> > label "ID" shown on the AAT online. For more detailed discussion,
> > look at the ISO25964 definitions of notation and identifier. Even if
> > you don't have a copy, you can find all the definitions freely at
> > <https://www.iso.org/obp/ui/>.
> >
> > I've mapped it to skos:notation
> >> because it satisfies the description for notation given in the
> >> SKOS Primer and SKOS Reference. Anyway: when Marsha raised this
> >> issue, I've recorded it as an AAT Question, and we'll resolve it a
> >> bit later. If so decided, I'll turn that to dc:identifier.
> > A bit of confusion is understandable, since in some systems,
> > especially older ones, there is no ID separate from the notation. But
> > better practice is to keep the ID separate from the notation (and the
> > problem is completely removed if the thesaurus does not have any
> > notation).
> >
> > Sorry my attempts at explanation seem rather long, but I hope the
> > examples will help. Stella Dextre Clarke
> >
> >
> > -- ***************************************************** Stella
> > Dextre Clarke Information Consultant and Project Leader, ISO NP
> > 25964 Luke House, West Hendred, Wantage, OX12 8RR, UK Tel:
> > 01235-833-298 Fax: 01235-863-298 stella@lukehouse.org
> > *****************************************************
> >
> 
> 
> --
Received on Monday, 18 November 2013 06:26:36 UTC