W3C home > Mailing lists > Public > public-esw-thes@w3.org > December 2008

Re: ISSUE-160: Allowing collections in semantic relationships

From: Antoine Isaac <aisaac@few.vu.nl>
Date: Thu, 04 Dec 2008 16:04:10 +0100
Message-ID: <4937F16A.4080308@few.vu.nl>
To: Alistair Miles <alistair.miles@zoo.ox.ac.uk>
CC: "Tudhope D S (AT)" <dstudhope@glam.ac.uk>, public-swd-wg@w3.org, "Binding C (AT)" <cbinding@glam.ac.uk>, public-esw-thes@w3.org

Hi,

I think we are almost in agreement, in fact, just playing on details. But if such things have to make their way in the Primer I'd prefer to be sure of a consensus first, if this can be obtained :-)

We have two possible mapping/"transformation" to SKOS
1. the one I would call "natural", stick to the structure presented in the visualization and electronic representation and documentation althogether. In AAT (and here I disagree with Alistair) the only structural link you have in an *explicit* manner is the broader/narrower links between monoplanes and <aeroplanes by wing number>. Again, I call this transformation most natural because it's really the closest to the structure explicitly given in the KOS. The alternative requires indeed more process wrt. structure, both when creating SKOS data from the original format, and creating a display from the SKOS data.

2. the one which uses skos:Collection, which I would tend to find also "conceptually cleaner", because it relates by semantic relationships only things which have the same semantic nature.

I actually really don't know if one of these two solutions can be labelled as "best practice". They clearly focus on different points, and I think the choice should be the entire responsability of the KOS provider, being best informed about the primary application context of the KOS at hand, and therefore the relative usefuless of both options.

Cheers,

Antoine


> Hi Doug,
> 
> Here are some further thoughts...
> 
> On Wed, Dec 03, 2008 at 04:10:41PM -0000, Tudhope D S (AT) wrote:
>> Hi Al
>>
>> Thanks for getting back.
>> I agree with your points below as representing best practice
>>
>> However we still have some concerns surrounding SKOS collections which I think is useful to discuss as longer term issues.
>>
>>
>> 1. Our main concern is facilitating SKOS representations for legacy vocabularies which already have electronic representations (or possibly where 'skosification' is a significant challenge for the vocabulary provider). 
>>
>> I am guessing that many existing electronic representations will follow your second aeroplane thesaurus example format. Examples include the various MDA and cultural heritage thesauri we have worked with (see our report on SKOS conversion http://hypermedia.research.glam.ac.uk/media/files/documents/2008-07-05/Additional-report-wp5.pdf <http://hypermedia.research.glam.ac.uk/media/files/documents/2008-07-05/Additional-report-wp5.pdf> ) and the current AAT XML sample data available online
>> (http://www.getty.edu/research/conducting_research/vocabularies/download.html <http://www.getty.edu/research/conducting_research/vocabularies/download.html> ) also exhibits the second aeroplane example structure. There is a 'record type' attribute that can be one of 4 possible values: Concept, Facet, GuideTerm, HierarchyName. Something flagged as a 'Concept' can have a parent that is marked as a 'GuideTerm', and the two are linked via a 'Parent/child' relationship. 
> 
> I still don't see how the existing electronic representation can be
> said to "follow" or "exhibit" the structure in the second aeroplane
> thesaurus in SKOS example.
> 
> Take the AAT sample data, for example.
> 
> You quite nicely describe the metamodel underlying the AAT data, where
> the data is structured as Records of one of four types (Concept,
> Facet, GuideTerm, HierarchyName), and where parent/child relationships
> can exist between Records of any type.
> 
> You are still left with an open choice about how to define a
> transformation which will map this metamodel onto the SKOS data model.
> 
> For example, your transformation could be as follows: for each AAT
> Record, generate an instance of skos:Concept, regardless of the type
> of the Record; for each parent/child relationship between AAT Records,
> generate a triple X skos:broader Y.
> 
> Alternatively, your transformation could be as follows: for each AAT
> Record of type Concept, generate an instance of skos:Concept. For each
> AAT Record of type Concept, walk up the parent/child relationships
> until you find another AAT Record of type Concept, and generate a
> triple X skos:broader Y.
> 
> These are not complete descriptions of each transformation, but I hope
> they illustrate the point that the AAT metamodel *does not constrain
> you* with respect to how you represent the same data as SKOS. Just
> because there is a "parent/child" relationship between "records" in
> the AAT data, doesn't mean you must generate a triple X skos:broader Y
> in the SKOS representation.
> 
> Similarly with English Heritage's cultural heritage thesauri. The
> metamodel for their data does not constrain you; you have an open
> choice with respect to the pattern of representation you
> choose. I.e. you are not forced to choose a particular pattern, there
> is enough information in the metamodel to allow you to choose. See
> e.g. the choice I made in [1].
> 
> Cheers,
> 
> Alistair
> 
> [1] http://www.w3.org/2001/sw/Europe/reports/thes/8.8/#4.2
> 
> 
> 
>> If the second aeroplane format is considered consistent with the SKOS data model, although not best practice, then this could potentially meet part of the concern. However the second aeroplane format would not capture the semantics that: [aeroplanes_by_wing_number should not be used for indexing]. As you mentioned in the first response, this could be achieved by a local extension to SKOS. If this does prove to be a common feature then at the least some (best practice) example would assist interoperability by encouraging common practice.
>>
>> It may be that this will not turn out to be a problem as practice develops. However it was an issue in our experience with existing cultural heritage thesaurus representations and a similar issue surfaced in BSI Part 5 discussions when converting to ZThes as a legacy term-based thesaurus example (I've appended an extract from our email on this as a PP below).
>>
>> I think it would be worth considering how to provide assistance to vocabulary owners in creating SKOS representations, including patterns for creating collection structures. This could be associated with the Primer, or as part of best practice examples.
>>
>>  
>>
>> 2. After just reviewing the current SKOS collection model and the relevant sections in the Reference and Primer, we're still unclear as to how SKOS collections are envisaged to be used. Its not clear to us what precisely an application could do with a SKOS collection on importing a SKOS file. 
>>
>> The BSI Standard has a Superordinate relationship from an Array to "A higher-level concept to which this array is subordinated". There is no such link in SKOS. In BSI, the Array is intended to represent groupings of sibling concepts (mainly for display purposes). Is that the main intention in SKOS? 
>>
>> The reference has a disclaimer in the text on Collections: EG "Furthermore, where "node labels" are used in the systematic display, it may not always be possible to fully reconstruct the systematic display from a SKOS representation alone. Fully representing all of the information represented in a systematic display of a thesaurus or other knowledge organization system, including details of layout and presentation, is beyond the scope of SKOS. " 
>>
>> It would certainly not be easy and several assumptions would have to be made to create the link between a member of a Collection and the superordinate concept. There are no constraints on what concepts can be members of a Collection (eg that they are all siblings), nor that concepts must belong to only one Collection. 
>>
>> If you cannot recreate the original use (thesaurus node labels) what is the purpose of Collections? Are SKOS collections intended to serve a wider purpose than capturing thesaurus node labels - are there use cases? If there is a consensus on their intended purpose then perhaps the documentation could be extended to reflect this.
>>
>> regards
>>
>> Doug
>>
>> PS  Answers to specific questions IN CAPS INLINE below
>>
>> PPS Extract from one of our contributions to BSI Part 5 discussions where we (Ceri) provided a BSI-ZThes round trip conversion
>>
>> Node labels
>>
>> Zthes uses a flag to indicate termType - one of the possible values for this is "NL" - intended to indicate node labels / facet indicators / guide terms. Rather than being part of a separate "array" structure these terms form part of the main hierarchy in Zthes. This practice also occurs in some published thesauri that are not based on Zthes. The BS8723 'core' format has no support for node labels and arrays, so it was unclear how to flag a guide term in the core format, apart from using some convention to modify the display term itself (e.g. angled brackets). A possible suggestion is the use of an optional concept attribute to denote guide terms, so legacy thesauri may be modelled without altering the existing hierarchy. The use of thesaurus arrays could then be an optional (preferred) alternative to the attribute in the (full) format. This would allow the standard to make a recommendation of 'best practice' while facilitating the mapping of many existing thesauri to the
 revised without requiring changes to the hierarchical structure.
>>
>>
>>
>> -----Original Message-----
>> From: Alistair Miles [mailto:alistair.miles@zoo.ox.ac.uk <mailto:alistair.miles@zoo.ox.ac.uk> ]
>> Sent: 22 November 2008 11:08
>> To: Tudhope D S (AT)
>> Cc: public-swd-wg@w3.org
>> Subject: Re: ISSUE-160: Allowing collections in semantic relationships
>>
>> Hi Doug,
>>
>> Thanks for your response. Further comments inline.
>>
>> On Sun, Nov 16, 2008 at 12:56:39PM -0000, Tudhope D S (AT) wrote:
>>> Hi Al
>>>
>>> thanks for getting back
>>> I take your points about indexing and correspondence between SKOS and BSI.
>>> However, they don't address the main issue I wanted to raise. I see your personal note deals with it more and I think indicates that this issue is still somewhat under consideration?
>>>
>>> I may have confused things by suggesting a work around. Let's set aside the non-indexing issue for now.
>>> I'll restate the issue:
>>> Concern: Insufficient support/guidance for legacy systems wrt guide terms / facet indicators
>>>
>>> My concern is that SKOS collections do NOT represent common practice in most existing thesauri
>>> and (if this is true) there is a danger that they might constitute a significant barrier to take up of SKOS by vocabulary owners who would otherwise wish to do so, unless appropriate guidance/alternatives are available.
>>>
>>> I think conversion of legacy thesauri to SKOS is an important application for SKOS and its wider take up.
>>> Do we know how many thesauri actually follow the SKOS collections for such structures?
>>> I don't think I know of any though I expect a few exist.
>>> Most that I know incorporate facet indicators as part of the hierarchy.  (I'm happy to be corrected if this not the case)
>> I'm not sure what you mean by "part of the hierarchy".
>> I MEAN YOUR SECOND AEROPLANE EXAMPLE
>>>>> The following choice would also be consistent with the SKOS data
>>>>> model, although in my opinion is not best practice:
>> I AGREE IT WOULD NOT BE BEST PRACTICE
>>
>> Consider the following example. The systematic display for my example
>> aeroplane thesaurus looks like this:
>>
>> ---
>> aeroplanes
>> .<aeroplanes by wing number>
>> ..monoplanes
>> ..biplanes
>> ..triplanes
>> ---
>>
>> The alphabetic display for my thesaurus looks like this:
>>
>> ---
>> aeroplanes
>>   NT biplanes
>>   NT monoplanes
>>   NT triplanes
>>
>> biplanes BT aeroplanes
>>
>> monoplanes BT aeroplanes
>>
>> triplanes BT aeroplanes
>> ---
>>
>> Now, is "aeroplanes by wing number" part of "the hierarchy"?
>>
>> My point is, for a thesaurus like this, you have an *open choice* about
>> how to represent the underlying data using SKOS.
>>
>> The following choice would be compatible with the above displays,
>> would be consistent with the SKOS data model, and in my opinion
>> follows best practice (also consistent with BS8723-5):
>>
>> ---
>> ex:aeroplanes rdf:type skos:Concept ;
>>   skos:narrower ex:monoplanes, ex:biplanes, ex:triplanes .
>>
>> ex:aeroplanes_by_wing_number rdf:type skos:Collection ;
>>   skos:member ex:monoplanes, ex:biplanes, ex:triplanes .
>> ---
>>
>> The following choice would also be consistent with the SKOS data
>> model, although in my opinion is not best practice:
>>
>> ---
>> ex:aeroplanes rdf:type skos:Concept ;
>>   skos:narrower ex:aeroplanes_by_wing_number .
>>
>> ex:aeroplanes_by_wing_number rdf:type skos:Concept ;
>>   skos:narrower ex:monoplanes, ex:biplanes, ex:triplanes .
>> ---
>>
>> What I'm trying to say is, in my experience, for a thesaurus where
>> node labels have been used, *either* of the above approaches could
>> reasonably be taken.
>>
>>> What do we expect vocabulary owners who do not follow the SKOS collections semantics to do?
>>> If we expect them to change their vocabulary structure is that a realistic expectation?
>> Again, I'm not sure what you mean by "change their vocabulary structure"?
>> I MEAN CHANGE AN EXISTING REPRESENTATION FROM SECOND AEROPLANE EXAMPLE FORMAT TO THE FIRST
>>
>> For a thesaurus such as the example above, either choice could
>> reasonably be made wrt to the SKOS representation. In either case, no
>> change would be required to the systematic or alphabetic displays.
>>
>> How the data is represented within whatever thesaurus management
>> system is used to manage the thesaurus is essentially irrelevant, and
>> need not be changed either. How you structure and manage your data
>> within your systems, and how you expose your data to the rest of the
>> world, need not be the same.
>> AGREE - MY POINT IS FACILITATING PROCESS OF 'SKOSIFICATION'
>>
>>> I personally like the SKOS collections semantics but the issue is a concern because I'd like to see wide take up of SKOS by existing vocabularies. Successful standards need to strike a balance between best practice and legacy practice. Antoine's extensions [your ref 6 below] seem to go towards meeting this issue thought I'm not sure what their status is?
>>>
>>> I think though at least some guidance is needed in the primer with some suggestions for what to do if legacy vocabularies owners do not want to completely restructure for guide terms/facet indicators. Maybe this could be considered for final primer version?
>> To reiterate, I don't believe that using the SKOS collections
>> framework as illustrated in the first option above requires any legacy
>> vocabularies to restructure anything. How they structure their data
>> internally and how they expose their data to the world could be (and
>> often are) different.
>> AGREE IN IDEAL WORLD. MY CONCERN IS WHERE VOCABULARIES ALREADY HAVE AN ELECTRONIC REPRESENTATION OR WHERE SKOS CONVERSION MAY BE CONSIDERED DIFFICULT. FOR THESE CASES PROVIDING PATTERNS OF SKOS CONVERSION WOULD BE USEFUL.
>>
>> Does this make sense?
>>
>> Kind regards,
>>
>> Alistair
>>
>>
>>> ________________________________
>>>
>>> From: Alistair Miles [mailto:alistair.miles@zoo.ox.ac.uk <mailto:alistair.miles@zoo.ox.ac.uk> ]
>>> Sent: Thu 06/11/2008 09:34
>>> To: Tudhope D S (AT)
>>> Cc: public-swd-wg@w3.org
>>> Subject: ISSUE-160: Allowing collections in semantic relationships
>>>
>>>
>>>
>>> Dear Doug,
>>>
>>> Thank you for your support and your helpful comments. In response to
>>> the comment below:
>>>
>>> On Sat, Oct 04, 2008 at 01:54:26PM +0000, SWD Issue Tracker wrote:
>>>>
>>>> ISSUE-160: Allowing collections in semantic relationships
>>>>
>>>> http://www.w3.org/2006/07/SWD/track/issues/160 <http://www.w3.org/2006/07/SWD/track/issues/160> 
>>>>
>>>> Raised by: Antoine Isaac
>>>> On product: All
>>>>
>>>> Raised by Doug Tudhope in [1]
>>>>
>>>> While SKOS collections represents best practice in thesaurus construction, many
>>>> prominent existing thesauri (and related KOS) do not follow the SKOS collections
>>>> semantics. Instead, they model guide terms, facet indicators etc as part of a
>>>> hierarchy using standard Broader/Narrower relationships. This creates a problem
>>>> in converting such existing KOS into SKOS. From discussions it appears other
>>>> people have come to a similar judgment in converting such cases to SKOS - being
>>>> reluctant to change the existing structure of a KOS designed by a third party.
>>>> The pragmatic decision is often to create a (nonSKOS) property of a concept, to
>>>> say essentially, 'NOT_FOR_INDEXING'. This allows a basic distinction to be made
>>>> between a facet indicator (or guide term) and a concept available for indexing.
>>>>
>>>> Can we consider if something like this could be introduced into SKOS to
>>>> facilitate conversion of many legacy KOS? The primer can always encourage the
>>>> full collections approach as best practice.
>>> The requirement to indicate that some concepts are not intended for
>>> use in indexing was raised in the SKOS Use Cases and Requirements
>>> document [2]. Meeting this requirement was then discussed as
>>> ISSUE-46. The working group resolved to close this requirement because
>>> all matters related to indexing were deemed out of scope for SKOS, and
>>> better treated by vocabularies such as Dublin Core [3] or other third
>>> party vocabularies. We propose to make no change to the SKOS
>>> Reference, can you live with this?
>>>
>>> Kind regards,
>>>
>>> Alistair
>>> Sean
>>>
>>> Personal comment by Alistair: I realise that the treatment of KOS
>>> elements such as guide terms, facet indicators and node labels, and
>>> the choice of whether to use the SKOS collections framework or whether
>>> model as you describe, remains a difficult issue, and requires careful
>>> judgment. However, on a positive note, I was pleased to learn recently
>>> of the very close correspondance between the modeling of node labels
>>> in the BS 8723-5 UML model and the modeling of collections in
>>> SKOS. Nicolas Cochard did an excellent job of illustrating the
>>> alignment between these two models at the ISKO event in July [4,5]. I
>>> hope that extensions to SKOS and best practices based on the new BS
>>> 8723-5 data model will help to clear up some of the difficulties here
>>> in the near future.
>>>
>>> See also Antoine's message [6] for some suggestions for the
>>> development of extensions to meet your requirement.
>>>
>>> [1] http://lists.w3.org/Archives/Public/public-swd-wg/2008Oct/0062.html <http://lists.w3.org/Archives/Public/public-swd-wg/2008Oct/0062.html> 
>>> [2] http://www.w3.org/TR/2007/WD-skos-ucr-20070516/#R-IndexingAndNonIndexingConcepts <http://www.w3.org/TR/2007/WD-skos-ucr-20070516/#R-IndexingAndNonIndexingConcepts> 
>>> [ISSUE-46] http://www.w3.org/2006/07/SWD/track/issues/46 <http://www.w3.org/2006/07/SWD/track/issues/46> 
>>> [3] http://www.w3.org/2008/05/07-swd-minutes.html#item10 <http://www.w3.org/2008/05/07-swd-minutes.html#item10> 
>>> [4] http://www.iskouk.org/presentations/cochard_BS8723-exchange-format.pdf <http://www.iskouk.org/presentations/cochard_BS8723-exchange-format.pdf> 
>>> [5] http://www.iskouk.org/SKOS_July2008.htm <http://www.iskouk.org/SKOS_July2008.htm> 
>>> [6] http://lists.w3.org/Archives/Public/public-swd-wg/2008Oct/0286.html <http://lists.w3.org/Archives/Public/public-swd-wg/2008Oct/0286.html> 
>>>
>>> --
>>> Alistair Miles
>>> Senior Computing Officer
>>> Image Bioinformatics Research Group
>>> Department of Zoology
>>> The Tinbergen Building
>>> University of Oxford
>>> South Parks Road
>>> Oxford
>>> OX1 3PS
>>> United Kingdom
>>> Web: http://purl.org/net/aliman <http://purl.org/net/aliman> 
>>> Email: alistair.miles@zoo.ox.ac.uk
>>> Tel: +44 (0)1865 281993
>>>
>>>
>>> ----- End forwarded message -----
>>>
>>> --
>>> Alistair Miles
>>> Senior Computing Officer
>>> Image Bioinformatics Research Group
>>> Department of Zoology
>>> The Tinbergen Building
>>> University of Oxford
>>> South Parks Road
>>> Oxford
>>> OX1 3PS
>>> United Kingdom
>>> Web: http://purl.org/net/aliman <http://purl.org/net/aliman> 
>>> Email: alistair.miles@zoo.ox.ac.uk
>>> Tel: +44 (0)1865 281993
>>>
>>>
>> --
>> Alistair Miles
>> Senior Computing Officer
>> Image Bioinformatics Research Group
>> Department of Zoology
>> The Tinbergen Building
>> University of Oxford
>> South Parks Road
>> Oxford
>> OX1 3PS
>> United Kingdom
>> Web: http://purl.org/net/aliman <http://purl.org/net/aliman> 
>> Email: alistair.miles@zoo.ox.ac.uk
>> Tel: +44 (0)1865 281993
>>
>>
> 
Received on Thursday, 4 December 2008 15:04:52 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 7 December 2009 10:39:02 GMT