W3C home > Mailing lists > Public > public-esw-thes@w3.org > May 2005

FW: XML schemas and thesauri question

From: Miles, AJ \(Alistair\) <A.J.Miles@rl.ac.uk>
Date: Mon, 23 May 2005 18:34:02 +0100
Message-ID: <F5839D944C66C049BDB45F4C1E3DF89DEE9D69@exchange31.fed.cclrc.ac.uk>
To: <public-esw-thes@w3.org>
... and an initial response from me ...
 
 -----Original Message-----
From: Miles, AJ (Alistair) 
Sent: 23 May 2005 18:29
To: 'RichardN'
Subject: RE: XML schemas and thesauri question


Hi Richard,
 
SKOS Core is in fact an RDF vocabulary for building an RDF description of the content and structure of what we call 'concept schemes', which is a blanket term to include a family of things related to thesauri, classification schemes &c. that share a similar underlying model.  SKOS Core is not an XML format for thesauri, although of course any set of RDF statements can be serialised using the RDF/XML syntax.  For a better introduction, take a look at the SKOS Core Guide:
 
http://www.w3.org/TR/swbp-skos-core-guide
 
Why bother with RDF?  Whereas XML is well suited for reliable point-to-point transfer of data, it does not provide a framework for managing data within a globally distributed, decentralised, environment.  The paradigm shift is from sending XML files down the wire, to publishing RDF statements on the (semantic) web.  'Data' published as RDF statements can become a part of a distributed network of statements.  This shift in perspective is analagous to the shift for information sharing from using email to using hypertext and the world wide web.  Of course sending email has its place, but nobody doubts that the web is a good thing.  Imagine if every time you wanted a copy of the BBC's latest news you had to email somebody to get it!
 
Anyway, there's more.  What you want to do is not easy in XML.  It's possible, but requires some fairly ugly hacking I think.  Specifying these kind of constraints is a lot easier to do if your data model is defined in RDF/OWL.  I'll try an expand more on this further, but am going to leave it there for now ... I'm away at XTech conference until next week. 
 
Cheers for now,
 
Alistair.
 
 

--- 
Alistair Miles 
Research Associate 
CCLRC - Rutherford Appleton Laboratory 
Building R1 Room 1.60 
Fermi Avenue 
Chilton 
Didcot 
Oxfordshire OX11 0QX 
United Kingdom 
Email:        a.j.miles@rl.ac.uk 
Tel: +44 (0)1235 445440 

-----Original Message-----
From: RichardN [mailto:RichardN@sfwindows.co.uk]
Sent: 23 May 2005 16:32
To: Miles, AJ (Alistair)
Subject: RE: XML schemas and thesauri question



Hello Alistair,

 

I don't have any objection in principle to discussing this via the public mailing list.  I should point out that while I made the XML schema up in my last email, the thesaurus segment was pasted in from bits of the US NAL thesaurus ( http://agclass.nal.usda.gov/agt/agt.htm), which is downloadable in a flat text file format.

 

I guess I wasn't sure how closely my question is related to your work on SKOS.  I can see that SKOS is a way of expressing a thesaurus in XML (I thought that RDF by itself could do this, however).  Since a thesaurus like NAL contains over 60,000 entries, I think the master copy of it would be likely to be held in a relational database rather than in what is essentially a flat file format (XML).  But how can I relate it to my XML schema?

 

Thanks for your help,

 

Richard

 

Richard Northedge

SFW Ltd.

 


  _____  


From: Miles, AJ (Alistair) [mailto:A.J.Miles@rl.ac.uk] 
Sent: 23 May 2005 15:46
To: RichardN
Subject: RE: XML schemas and thesauri question

 

Hi Richard, 

 

Thanks alot for detailing this for me.  This kind of concrete use case is extremely useful to those of us working on the SKOS Core recommendations [1][2][3].  

 

I do have some suggestions re your requirement, but before I make them I was wondering if you'd mind if we moved this discussion onto the SKOS mailing list?  We do all SKOS work on a public mailing list [2], and well defined use cases are like gold dust to us :)  If not I totally understand, and I'd be happy to continue discussing this with you in private. 

 

Cheers,

 

Alistair.

 

[1] SKOS Core Guide http://www.w3.org/TR/swbp-skos-core-guide/

[2] SKOS Core Vocabulary Specification http://www.w3.org/TR/swbp-skos-core-spec/

[3] Quick Guide to Publishing a Thesaurus on the Semantic Web http://www.w3.org/TR/swbp-thesaurus-pubguide/

[4] public-esw-thes@w3.org archive http://lists.w3.org/Archives/Public/public-esw-thes/

--- 
Alistair Miles 
Research Associate 
CCLRC - Rutherford Appleton Laboratory 
Building R1 Room 1.60 
Fermi Avenue 
Chilton 
Didcot 
Oxfordshire OX11 0QX 
United Kingdom 
Email:        a.j.miles@rl.ac.uk 
Tel: +44 (0)1235 445440 

-----Original Message-----
From: RichardN [mailto:RichardN@sfwindows.co.uk]
Sent: 23 May 2005 13:24
To: Miles, AJ (Alistair)
Subject: RE: XML schemas and thesauri question

Hello Alistair,

 

Thank you for your reply.  Apologies for not responding sooner; I have been out of the office for a couple of days.  Let me see if I can give you a more specific example.  Imagine a scientific lab is taking in samples of animal tissue for analysis.  Each sample is a data entity with three data items (obviously in real life it would be more than that): an ID, a weight in grams, and a tissue type field.  You could set up a schema to define your entity like this:

 

<xs:schema version="1.0" xmlns:xs="http://www.w3.org/2001/XMLSchema">

          <xs:annotation>

                    <xs:documentation>Tissue sample schema</xs:documentation>

          </xs:annotation>

          <xs:element name="sample">

                    <xs:annotation>

                              <xs:documentation>

                                        Data entity for a tissue sample (simple example)

                              </xs:documentation>

                    </xs:annotation>

                    <xs:complexType>

                              <xs:attribute name="id" use="required">

                                        <xs:annotation>

                                                  <xs:documentation>

                                                                      identifier for this sample

                                                            </xs:documentation>

                                        </xs:annotation>

                                        <xs:simpleType>

                                                  <xs:restriction base="xs:integer">

                                                            <xs:minInclusive value="1" />

                                                  </xs:restriction>

                                        </xs:simpleType>

                              </xs:attribute>

                              <xs:sequence>

                                        <xs:element name="weight" minOccurs="0" maxOccurs="1">

                                                  <xs:annotation>

                                                            <xs:documentation>

                                                                      Weight of the tissue sample in grams

                                                            </xs:documentation>

                                                  </xs:annotation>

                                                  <xs:simpleType>

                                                            <xs:restriction base="xs:integer">

                                                                      <xs:minInclusive value="1" />

                                                                      <xs:maxInclusive value="999" />

                                                            </xs:restriction>

                                                  </xs:simpleType>

                                        </xs:element>

                                        <xs:element name="type" minOccurs="1" maxOccurs="1">

                                                  <xs:annotation>

                                                            <xs:documentation>

                                                                      Type of the tissue sample

                                                            </xs:documentation>

                                                  </xs:annotation>

                                                  <xs:simpleType>

                                                            <xs:restriction base="xs:string">

                                                                      <xs:minLength value="1" />

                                                                      <xs:maxLength value="80" />

                                                            </xs:restriction>

                                                  </xs:simpleType>

                                        </xs:element>

                              </xs:sequence>

                    </xs:complexType>

          </xs:element>

</xs:schema>

 

Now imagine there is a thesaurus with lots of terms covering the whole of veterinary science, but which contains the following hierarchy:

 

animal anatomy

  NT1: (animal secretions, body fluids, excretions and exudates)

  NT1: animal organs

    NT2: animal glands

    NT2: brain

    NT2: gall bladder

    NT2: gills

    NT2: heart

    NT2: hepatopancreas

    NT2: kidneys

    NT2: liver

    NT2: lungs

    NT2: sense organs

    NT2: shell gland

    NT2: spleen

    NT2: sting apparatus

    NT2: stomach

      NT3: gastric fundus

      NT3: gastric mucosa

      NT3: pylorus

      NT3: ruminant stomach

    NT2: tonsils

  NT1: animal tissues

    NT2: animal tissue extracts

    NT2: basement membrane

    NT2: bone marrow

    NT2: cell membranes

    NT2: connective tissues

    NT2: epithelium

    NT2: gingiva

    NT2: imaginal discs

    NT2: laminae (animals)

    NT2: muscle tissues

    NT2: nerve tissue

    NT2: serosa

  NT1: circulatory system

    NT2: cardiovascular system

    NT2: hemolymph

    NT2: lymphatic system

  NT1: digestive system

    

Etc. Each of these narrower terms has an extended hierarchy underneath it.  The kind of requirement I am talking about is being able to say (for example): tissue sample type, which has been declared in the XML schema as a string between 1 and 80 characters in length, should be constrained to one of the thesaurus terms, either a narrower term of "animal organs" or a narrower term of "animal tissues".  To do this, I guess, you would need to represent the thesaurus in some form of XML format (which is best?) and then the bit I don't see at all - declaring somewhere (in the XML schema?) that the tissue sample type data item should be constrained in the manner I explained.

 

For added complication, you could add the rule - all narrower terms of "animal organs" or "animal tissues", EXCEPT "stomach" or any narrower term of "stomach".

 

Does that make any sense?

 

Thanks for any help you can provide me,

 

Richard

 

Richard Northedge

SFW Ltd. 

 

 

 

 

 


  _____  


From: Miles, AJ (Alistair) [mailto:A.J.Miles@rl.ac.uk] 
Sent: 18 May 2005 17:41
To: RichardN
Subject: RE: XML schemas and thesauri question

 

Hi Richard,

 

This is a very interesting use case!  Can you give more details, more specific examples of the kinds of constraint you want to enforce?

 

Cheers,

 

Alistair.

 

--- 
Alistair Miles 
Research Associate 
CCLRC - Rutherford Appleton Laboratory 
Building R1 Room 1.60 
Fermi Avenue 
Chilton 
Didcot 
Oxfordshire OX11 0QX 
United Kingdom 
Email:        a.j.miles@rl.ac.uk 
Tel: +44 (0)1235 445440 

-----Original Message-----
From: RichardN [mailto:RichardN@sfwindows.co.uk]
Sent: 18 May 2005 15:00
To: Miles, AJ (Alistair)
Subject: XML schemas and thesauri question

Hello Alistair,

 

I am one of Daniel Whymark's colleagues, and he mentioned to me that you might be able to help with an XML / semantic web type question that I have.  To give you some indication of my current level of understanding: I am comfortable with XML and XML schemas, and have been reading up about semantic web concepts, but I don't have a strong grasp of any of the semantic web -type XML languages such as RDF, XML topic maps, OWL etc.  I have come across SCOS, but that's about as far as it goes.

 

We need to define a standard format for our data entities.  The obvious way of doing this is to define the format using XML schemas.

 

We also have an ISO 2788 style thesaurus with BT (broader term), NT (narrower term) etc.  Some of the data items in the data entities should have their values restricted to a set of preferred terms in the thesaurus.  For example, a data item might need to be restricted so that the set of allowable values includes the thesaurus term "United Kingdom" and all of the narrower terms belonging to the "United Kingdom" term.  In some cases, it may be necessary to restrict the levels of narrower terms underneath the root term that are allowable.

 

My question is: what is the best way of encoding the thesaurus in standards-compliant XML in such a way that it can be linked to the XML schemas, so that we can enforce the data item restrictions I have outlined?

 

Any help you can give me would be much appreciated,

 

Regards,

Richard

 

Richard Northedge

SFW Ltd.

 
Received on Monday, 23 May 2005 17:34:12 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 7 December 2009 10:38:53 GMT