RE: ISSUE-160: Allowing collections in semantic relationships

Hi Al

Thanks for getting back.
I agree with your points below as representing best practice

However we still have some concerns surrounding SKOS collections which I think is useful to discuss as longer term issues.


1. Our main concern is facilitating SKOS representations for legacy vocabularies which already have electronic representations (or possibly where 'skosification' is a significant challenge for the vocabulary provider). 

I am guessing that many existing electronic representations will follow your second aeroplane thesaurus example format. Examples include the various MDA and cultural heritage thesauri we have worked with (see our report on SKOS conversion http://hypermedia.research.glam.ac.uk/media/files/documents/2008-07-05/Additional-report-wp5.pdf <http://hypermedia.research.glam.ac.uk/media/files/documents/2008-07-05/Additional-report-wp5.pdf> ) and the current AAT XML sample data available online
(http://www.getty.edu/research/conducting_research/vocabularies/download.html <http://www.getty.edu/research/conducting_research/vocabularies/download.html> ) also exhibits the second aeroplane example structure. There is a 'record type' attribute that can be one of 4 possible values: Concept, Facet, GuideTerm, HierarchyName. Something flagged as a 'Concept' can have a parent that is marked as a 'GuideTerm', and the two are linked via a 'Parent/child' relationship. 

If the second aeroplane format is considered consistent with the SKOS data model, although not best practice, then this could potentially meet part of the concern. However the second aeroplane format would not capture the semantics that: [aeroplanes_by_wing_number should not be used for indexing]. As you mentioned in the first response, this could be achieved by a local extension to SKOS. If this does prove to be a common feature then at the least some (best practice) example would assist interoperability by encouraging common practice.

It may be that this will not turn out to be a problem as practice develops. However it was an issue in our experience with existing cultural heritage thesaurus representations and a similar issue surfaced in BSI Part 5 discussions when converting to ZThes as a legacy term-based thesaurus example (I've appended an extract from our email on this as a PP below).

I think it would be worth considering how to provide assistance to vocabulary owners in creating SKOS representations, including patterns for creating collection structures. This could be associated with the Primer, or as part of best practice examples.

 

2. After just reviewing the current SKOS collection model and the relevant sections in the Reference and Primer, we're still unclear as to how SKOS collections are envisaged to be used. Its not clear to us what precisely an application could do with a SKOS collection on importing a SKOS file. 

The BSI Standard has a Superordinate relationship from an Array to "A higher-level concept to which this array is subordinated". There is no such link in SKOS. In BSI, the Array is intended to represent groupings of sibling concepts (mainly for display purposes). Is that the main intention in SKOS? 

The reference has a disclaimer in the text on Collections: EG "Furthermore, where "node labels" are used in the systematic display, it may not always be possible to fully reconstruct the systematic display from a SKOS representation alone. Fully representing all of the information represented in a systematic display of a thesaurus or other knowledge organization system, including details of layout and presentation, is beyond the scope of SKOS. " 

It would certainly not be easy and several assumptions would have to be made to create the link between a member of a Collection and the superordinate concept. There are no constraints on what concepts can be members of a Collection (eg that they are all siblings), nor that concepts must belong to only one Collection. 

If you cannot recreate the original use (thesaurus node labels) what is the purpose of Collections? Are SKOS collections intended to serve a wider purpose than capturing thesaurus node labels - are there use cases? If there is a consensus on their intended purpose then perhaps the documentation could be extended to reflect this.

regards

Doug

PS  Answers to specific questions IN CAPS INLINE below

PPS Extract from one of our contributions to BSI Part 5 discussions where we (Ceri) provided a BSI-ZThes round trip conversion

Node labels

Zthes uses a flag to indicate termType - one of the possible values for this is "NL" - intended to indicate node labels / facet indicators / guide terms. Rather than being part of a separate "array" structure these terms form part of the main hierarchy in Zthes. This practice also occurs in some published thesauri that are not based on Zthes. The BS8723 'core' format has no support for node labels and arrays, so it was unclear how to flag a guide term in the core format, apart from using some convention to modify the display term itself (e.g. angled brackets). A possible suggestion is the use of an optional concept attribute to denote guide terms, so legacy thesauri may be modelled without altering the existing hierarchy. The use of thesaurus arrays could then be an optional (preferred) alternative to the attribute in the (full) format. This would allow the standard to make a recommendation of 'best practice' while facilitating the mapping of many existing thesauri to the revised without requiring changes to the hierarchical structure.



-----Original Message-----
From: Alistair Miles [mailto:alistair.miles@zoo.ox.ac.uk <mailto:alistair.miles@zoo.ox.ac.uk> ]
Sent: 22 November 2008 11:08
To: Tudhope D S (AT)
Cc: public-swd-wg@w3.org
Subject: Re: ISSUE-160: Allowing collections in semantic relationships

Hi Doug,

Thanks for your response. Further comments inline.

On Sun, Nov 16, 2008 at 12:56:39PM -0000, Tudhope D S (AT) wrote:
> Hi Al
> 
> thanks for getting back
> I take your points about indexing and correspondence between SKOS and BSI.
> However, they don't address the main issue I wanted to raise. I see your personal note deals with it more and I think indicates that this issue is still somewhat under consideration?
> 
> I may have confused things by suggesting a work around. Let's set aside the non-indexing issue for now.
> I'll restate the issue:
> Concern: Insufficient support/guidance for legacy systems wrt guide terms / facet indicators
> 
> My concern is that SKOS collections do NOT represent common practice in most existing thesauri
> and (if this is true) there is a danger that they might constitute a significant barrier to take up of SKOS by vocabulary owners who would otherwise wish to do so, unless appropriate guidance/alternatives are available.
> 
> I think conversion of legacy thesauri to SKOS is an important application for SKOS and its wider take up.
> Do we know how many thesauri actually follow the SKOS collections for such structures?
> I don't think I know of any though I expect a few exist.
> Most that I know incorporate facet indicators as part of the hierarchy.  (I'm happy to be corrected if this not the case)

I'm not sure what you mean by "part of the hierarchy".
I MEAN YOUR SECOND AEROPLANE EXAMPLE
>>> The following choice would also be consistent with the SKOS data
>>> model, although in my opinion is not best practice:
I AGREE IT WOULD NOT BE BEST PRACTICE

Consider the following example. The systematic display for my example
aeroplane thesaurus looks like this:

---
aeroplanes
.<aeroplanes by wing number>
..monoplanes
..biplanes
..triplanes
---

The alphabetic display for my thesaurus looks like this:

---
aeroplanes
  NT biplanes
  NT monoplanes
  NT triplanes

biplanes BT aeroplanes

monoplanes BT aeroplanes

triplanes BT aeroplanes
---

Now, is "aeroplanes by wing number" part of "the hierarchy"?

My point is, for a thesaurus like this, you have an *open choice* about
how to represent the underlying data using SKOS.

The following choice would be compatible with the above displays,
would be consistent with the SKOS data model, and in my opinion
follows best practice (also consistent with BS8723-5):

---
ex:aeroplanes rdf:type skos:Concept ;
  skos:narrower ex:monoplanes, ex:biplanes, ex:triplanes .

ex:aeroplanes_by_wing_number rdf:type skos:Collection ;
  skos:member ex:monoplanes, ex:biplanes, ex:triplanes .
---

The following choice would also be consistent with the SKOS data
model, although in my opinion is not best practice:

---
ex:aeroplanes rdf:type skos:Concept ;
  skos:narrower ex:aeroplanes_by_wing_number .

ex:aeroplanes_by_wing_number rdf:type skos:Concept ;
  skos:narrower ex:monoplanes, ex:biplanes, ex:triplanes .
---

What I'm trying to say is, in my experience, for a thesaurus where
node labels have been used, *either* of the above approaches could
reasonably be taken.

> 
> What do we expect vocabulary owners who do not follow the SKOS collections semantics to do?
> If we expect them to change their vocabulary structure is that a realistic expectation?

Again, I'm not sure what you mean by "change their vocabulary structure"?
I MEAN CHANGE AN EXISTING REPRESENTATION FROM SECOND AEROPLANE EXAMPLE FORMAT TO THE FIRST

For a thesaurus such as the example above, either choice could
reasonably be made wrt to the SKOS representation. In either case, no
change would be required to the systematic or alphabetic displays.

How the data is represented within whatever thesaurus management
system is used to manage the thesaurus is essentially irrelevant, and
need not be changed either. How you structure and manage your data
within your systems, and how you expose your data to the rest of the
world, need not be the same.
AGREE - MY POINT IS FACILITATING PROCESS OF 'SKOSIFICATION'

> I personally like the SKOS collections semantics but the issue is a concern because I'd like to see wide take up of SKOS by existing vocabularies. Successful standards need to strike a balance between best practice and legacy practice. Antoine's extensions [your ref 6 below] seem to go towards meeting this issue thought I'm not sure what their status is?
> 
> I think though at least some guidance is needed in the primer with some suggestions for what to do if legacy vocabularies owners do not want to completely restructure for guide terms/facet indicators. Maybe this could be considered for final primer version?

To reiterate, I don't believe that using the SKOS collections
framework as illustrated in the first option above requires any legacy
vocabularies to restructure anything. How they structure their data
internally and how they expose their data to the world could be (and
often are) different.
AGREE IN IDEAL WORLD. MY CONCERN IS WHERE VOCABULARIES ALREADY HAVE AN ELECTRONIC REPRESENTATION OR WHERE SKOS CONVERSION MAY BE CONSIDERED DIFFICULT. FOR THESE CASES PROVIDING PATTERNS OF SKOS CONVERSION WOULD BE USEFUL.

Does this make sense?

Kind regards,

Alistair


> ________________________________
>
> From: Alistair Miles [mailto:alistair.miles@zoo.ox.ac.uk <mailto:alistair.miles@zoo.ox.ac.uk> ]
> Sent: Thu 06/11/2008 09:34
> To: Tudhope D S (AT)
> Cc: public-swd-wg@w3.org
> Subject: ISSUE-160: Allowing collections in semantic relationships
>
>
>
> Dear Doug,
>
> Thank you for your support and your helpful comments. In response to
> the comment below:
>
> On Sat, Oct 04, 2008 at 01:54:26PM +0000, SWD Issue Tracker wrote:
> >
> >
> > ISSUE-160: Allowing collections in semantic relationships
> >
> > http://www.w3.org/2006/07/SWD/track/issues/160 <http://www.w3.org/2006/07/SWD/track/issues/160> 
> >
> > Raised by: Antoine Isaac
> > On product: All
> >
> > Raised by Doug Tudhope in [1]
> >
> > While SKOS collections represents best practice in thesaurus construction, many
> > prominent existing thesauri (and related KOS) do not follow the SKOS collections
> > semantics. Instead, they model guide terms, facet indicators etc as part of a
> > hierarchy using standard Broader/Narrower relationships. This creates a problem
> > in converting such existing KOS into SKOS. From discussions it appears other
> > people have come to a similar judgment in converting such cases to SKOS - being
> > reluctant to change the existing structure of a KOS designed by a third party.
> > The pragmatic decision is often to create a (nonSKOS) property of a concept, to
> > say essentially, 'NOT_FOR_INDEXING'. This allows a basic distinction to be made
> > between a facet indicator (or guide term) and a concept available for indexing.
> >
> > Can we consider if something like this could be introduced into SKOS to
> > facilitate conversion of many legacy KOS? The primer can always encourage the
> > full collections approach as best practice.
>
> The requirement to indicate that some concepts are not intended for
> use in indexing was raised in the SKOS Use Cases and Requirements
> document [2]. Meeting this requirement was then discussed as
> ISSUE-46. The working group resolved to close this requirement because
> all matters related to indexing were deemed out of scope for SKOS, and
> better treated by vocabularies such as Dublin Core [3] or other third
> party vocabularies. We propose to make no change to the SKOS
> Reference, can you live with this?
>
> Kind regards,
>
> Alistair
> Sean
>
> Personal comment by Alistair: I realise that the treatment of KOS
> elements such as guide terms, facet indicators and node labels, and
> the choice of whether to use the SKOS collections framework or whether
> model as you describe, remains a difficult issue, and requires careful
> judgment. However, on a positive note, I was pleased to learn recently
> of the very close correspondance between the modeling of node labels
> in the BS 8723-5 UML model and the modeling of collections in
> SKOS. Nicolas Cochard did an excellent job of illustrating the
> alignment between these two models at the ISKO event in July [4,5]. I
> hope that extensions to SKOS and best practices based on the new BS
> 8723-5 data model will help to clear up some of the difficulties here
> in the near future.
>
> See also Antoine's message [6] for some suggestions for the
> development of extensions to meet your requirement.
>
> [1] http://lists.w3.org/Archives/Public/public-swd-wg/2008Oct/0062.html <http://lists.w3.org/Archives/Public/public-swd-wg/2008Oct/0062.html> 
> [2] http://www.w3.org/TR/2007/WD-skos-ucr-20070516/#R-IndexingAndNonIndexingConcepts <http://www.w3.org/TR/2007/WD-skos-ucr-20070516/#R-IndexingAndNonIndexingConcepts> 
> [ISSUE-46] http://www.w3.org/2006/07/SWD/track/issues/46 <http://www.w3.org/2006/07/SWD/track/issues/46> 
> [3] http://www.w3.org/2008/05/07-swd-minutes.html#item10 <http://www.w3.org/2008/05/07-swd-minutes.html#item10> 
> [4] http://www.iskouk.org/presentations/cochard_BS8723-exchange-format.pdf <http://www.iskouk.org/presentations/cochard_BS8723-exchange-format.pdf> 
> [5] http://www.iskouk.org/SKOS_July2008.htm <http://www.iskouk.org/SKOS_July2008.htm> 
> [6] http://lists.w3.org/Archives/Public/public-swd-wg/2008Oct/0286.html <http://lists.w3.org/Archives/Public/public-swd-wg/2008Oct/0286.html> 
>
> --
> Alistair Miles
> Senior Computing Officer
> Image Bioinformatics Research Group
> Department of Zoology
> The Tinbergen Building
> University of Oxford
> South Parks Road
> Oxford
> OX1 3PS
> United Kingdom
> Web: http://purl.org/net/aliman <http://purl.org/net/aliman> 
> Email: alistair.miles@zoo.ox.ac.uk
> Tel: +44 (0)1865 281993
>
>
> ----- End forwarded message -----
>
> --
> Alistair Miles
> Senior Computing Officer
> Image Bioinformatics Research Group
> Department of Zoology
> The Tinbergen Building
> University of Oxford
> South Parks Road
> Oxford
> OX1 3PS
> United Kingdom
> Web: http://purl.org/net/aliman <http://purl.org/net/aliman> 
> Email: alistair.miles@zoo.ox.ac.uk
> Tel: +44 (0)1865 281993
>
>

--
Alistair Miles
Senior Computing Officer
Image Bioinformatics Research Group
Department of Zoology
The Tinbergen Building
University of Oxford
South Parks Road
Oxford
OX1 3PS
United Kingdom
Web: http://purl.org/net/aliman <http://purl.org/net/aliman> 
Email: alistair.miles@zoo.ox.ac.uk
Tel: +44 (0)1865 281993

Received on Wednesday, 3 December 2008 16:11:29 UTC