[Fwd: Re: XML schemas and thesauri question]

Hi Richard,

A bit late response but I hope I can help. At least, I'm not sure I
understand the problem so let me try to recap what I think I understood:

- you have data in XML
- you want to be able to constrain elements in that data to (parts of)
a thesaurus
- you're wondering whether to use an XML or RDF version of a thesaurus
to do that
- Stella suggested you keep the thesaurus and the XML data separate.

Now I can imagine two situations:

1) both the data and the thesaurus are in the same format (XML). THen
you can use available XML (Schema) techniques to define and check the
constraints that you want to enforce

2) the data is in XML and the thesaurus in another format (database,
RDF, ...), i.e. Stella's suggestion. This case is more complex. You
need some way to reference the thesaurus term in your XML data. The
reference will be dependent on the format of the thesaurus, e.g. URI
in the RDF case. In the data you need some custom tag to represent a
constraint on a specific URI, e.g. something like

<belowTerm>http://www.example.com/thes/term1/</belowTerm>

(Probably there's a nicer way to represent this in XML, but the basic
workings would be the same I guess.)

Now when you check the data you need to use the URI to go to the
thesaurus and check if the URI in the data is indeed below term1. I.e.
you need a custom program to do this.

A third solution would be to represent everything in OWL and then
check using a DL reasoner, but this will probably not be very scalable
to large datasets...

Hope I am making some sense here. Maybe some people on the list that
have worked more with XML that can provide a better solution?

Regards,
Mark.


>     Thanks for your response.  I think you’ve addressed the two
>     questions that I asked Alistair, but not in enough detail for me to
>     understand what you mean.  I think the reason the issue is a
>     “problem” for me is down to my lack of understanding – hopefully you
>     will bear with my ignorance.
> 
>      
> 
>> >To do this, I guess, you would need to represent the thesaurus in
>     some form of XML format (which is best?)
> 
>      
> 
>>On the one hand, set up the thesaurus as a Namespace, available at
>     all times over the Internet or intranet.
> 
>      
> 
>     I don’t understand what you mean by this, sorry.  Would it be
>     possible for you to give an example of how you would set up a short
>     section of thesaurus “as a namespace”?
> 
>      
> 
>> >and then the bit I don’t see at all – declaring somewhere (in the
>     XML schema?) that the tissue sample type data item should be
>     constrained in the manner I explained.
> 
>      
> 
>>  The schema must include the full details that allow this to be
>     connected up with the Namespace where this term may be found.
> 
>      
> 
>     The example schema I provided had a definition for a child element
>     of “sample” called “type”, whose value I wanted to constrain to one
>     of a set of values found in my thesaurus.  How would I amend the
>     definition in the schema below in order to do this?
> 
>      
> 
>     Thanks for any help,
> 
>      
> 
>     Richard
> 
>      
> 
>      
> 
>     ------------------------------------------------------------------------
> 
>     *From:* Stella Dextre Clarke [mailto:sdclarke@lukehouse.demon.co.uk]
>     *Sent:* 24 May 2005 09:02
>     *To:* 'Miles, AJ (Alistair)'; public-esw-thes@w3.org
>     *Cc:* RichardN
>     *Subject:* RE: XML schemas and thesauri question
> 
>      
> 
>     Not sure I've understood why this is a problem. What's wrong with
>     the following course of action?:
> 
>      
> 
>     On the one hand, set up the thesaurus as a Namespace, available at
>     all times over the Internet or intranet.
> 
>     Secondly, develop the XML schema below a little further, to include
>     an element "tissue type" (or it could be "thesaurus term", if the
>     schema is to be used more broadly than just for tissue types).
> 
>     Thirdly, when the schema is put to use, presumably a sample comes
>     in, the appropriate values are added to each of the elements,
>     including the one that will name the tissue type. The appropriate
>     thesaurus term is filled in, say "spleen". The schema must include
>     the full details that allow this to be connected up with the
>     Namespace where this term may be found.
> 
>      
> 
>     Is something more complicated than this required? To me, sanity lies
>     in keeping the thesaurus (with all its internal complications)
>     completely separate from the application where terms from the
>     thesaurus are going to be used - but still available for reference
>     when needed.
> 
>      
> 
>     Stella
> 
>     *****************************************************
>     Stella Dextre Clarke
>     Information Consultant
>     Luke House, West Hendred, Wantage, Oxon, OX12 8RR, UK
>     Tel: 01235-833-298
>     Fax: 01235-863-298
>     SDClarke@LukeHouse.demon.co.uk
>     *****************************************************
> 
>         -----Original Message-----
>         *From:* public-esw-thes-request@w3.org
>         [mailto:public-esw-thes-request@w3.org] *On Behalf Of *Miles, AJ
>         (Alistair)
>         *Sent:* 23 May 2005 18:32
>         *To:* public-esw-thes@w3.org
>         *Cc:* RichardN@sfwindows.co.uk
>         *Subject:* FW: XML schemas and thesauri question
> 
>         Further elaboration from Richard:
> 
>          
> 
>          -----Original Message-----
>         *From:* RichardN [mailto:RichardN@sfwindows.co.uk]
>         *Sent:* 23 May 2005 13:24
>         *To:* Miles, AJ (Alistair)
>         *Subject:* RE: XML schemas and thesauri question
> 
>         Hello Alistair,
> 
>          
> 
>         Thank you for your reply.  Apologies for not responding sooner;
>         I have been out of the office for a couple of days.  Let me see
>         if I can give you a more specific example.  Imagine a scientific
>         lab is taking in samples of animal tissue for analysis.  Each
>         sample is a data entity with three data items (obviously in real
>         life it would be more than that): an ID, a weight in grams, and
>         a tissue type field.  You could set up a schema to define your
>         entity like this:
> 
>          
> 
>         <xs:schema version="1.0"
>         xmlns:xs="http://www.w3.org/2001/XMLSchema">
> 
>                   <xs:annotation>
> 
>                             <xs:documentation>Tissue sample
>         schema</xs:documentation>
> 
>                   </xs:annotation>
> 
>                   <xs:element name="sample">
> 
>                             <xs:annotation>
> 
>                                       <xs:documentation>
> 
>                                                 Data entity for a tissue
>         sample (simple example)
> 
>                                       </xs:documentation>
> 
>                             </xs:annotation>
> 
>                             <xs:complexType>
> 
>                                       <xs:attribute name="id"
>         use="required">
> 
>                                                 <xs:annotation>
> 
>                                                           <xs:documentation>
> 
>                                                                    
>                   identifier for this sample
> 
>                                                                    
>         </xs:documentation>
> 
>                                                 </xs:annotation>
> 
>                                                 <xs:simpleType>
> 
>                                                          
>         <xs:restriction base="xs:integer">
> 
>                                                                    
>         <xs:minInclusive value="1" />
> 
>                                                           </xs:restriction>
> 
>                                                 </xs:simpleType>
> 
>                                       </xs:attribute>
> 
>                                       <xs:sequence>
> 
>                                                 <xs:element
>         name="weight" minOccurs="0" maxOccurs="1">
> 
>                                                           <xs:annotation>
> 
>                                                                    
>         <xs:documentation>
> 
>                                                                    
>                   Weight of the tissue sample in grams
> 
>                                                                    
>         </xs:documentation>
> 
>                                                           </xs:annotation>
> 
>                                                           <xs:simpleType>
> 
>                                                                    
>         <xs:restriction base="xs:integer">
> 
>                                                                    
>                   <xs:minInclusive value="1" />
> 
>                                                                    
>                   <xs:maxInclusive value="999" />
> 
>                                                                    
>         </xs:restriction>
> 
>                                                           </xs:simpleType>
> 
>                                                 </xs:element>
> 
>                                                 <xs:element name="type"
>         minOccurs="1" maxOccurs="1">
> 
>                                                           <xs:annotation>
> 
>                                                                    
>         <xs:documentation>
> 
>                                                                    
>                   Type of the tissue sample
> 
>                                                                    
>         </xs:documentation>
> 
>                                                           </xs:annotation>
> 
>                                                           <xs:simpleType>
> 
>                                                                    
>         <xs:restriction base="xs:string">
> 
>                                                                    
>                   <xs:minLength value="1" />
> 
>                                                                    
>                   <xs:maxLength value="80" />
> 
>                                                                    
>         </xs:restriction>
> 
>                                                           </xs:simpleType>
> 
>                                                 </xs:element>
> 
>                                       </xs:sequence>
> 
>                             </xs:complexType>
> 
>                   </xs:element>
> 
>         </xs:schema>
> 
>          
> 
>         Now imagine there is a thesaurus with lots of terms covering the
>         whole of veterinary science, but which contains the following
>         hierarchy:
> 
>          
> 
>         animal anatomy
> 
>           NT1: (animal secretions, body fluids, excretions and exudates)
> 
>           NT1: animal organs
> 
>             NT2: animal glands
> 
>             NT2: brain
> 
>             NT2: gall bladder
> 
>             NT2: gills
> 
>             NT2: heart
> 
>             NT2: hepatopancreas
> 
>             NT2: kidneys
> 
>             NT2: liver
> 
>             NT2: lungs
> 
>             NT2: sense organs
> 
>             NT2: shell gland
> 
>             NT2: spleen
> 
>             NT2: sting apparatus
> 
>             NT2: stomach
> 
>               NT3: gastric fundus
> 
>               NT3: gastric mucosa
> 
>               NT3: pylorus
> 
>               NT3: ruminant stomach
> 
>             NT2: tonsils
> 
>           NT1: animal tissues
> 
>             NT2: animal tissue extracts
> 
>             NT2: basement membrane
> 
>             NT2: bone marrow
> 
>             NT2: cell membranes
> 
>             NT2: connective tissues
> 
>             NT2: epithelium
> 
>             NT2: gingiva
> 
>             NT2: imaginal discs
> 
>             NT2: laminae (animals)
> 
>             NT2: muscle tissues
> 
>             NT2: nerve tissue
> 
>             NT2: serosa
> 
>           NT1: circulatory system
> 
>             NT2: cardiovascular system
> 
>             NT2: hemolymph
> 
>             NT2: lymphatic system
> 
>           NT1: digestive system
> 
>            
> 
>         Etc. Each of these narrower terms has an extended hierarchy
>         underneath it.  The kind of requirement I am talking about is
>         being able to say (for example): tissue sample type, which has
>         been declared in the XML schema as a string between 1 and 80
>         characters in length, should be constrained to one of the
>         thesaurus terms, either a narrower term of “animal organs” or a
>         narrower term of “animal tissues”.  To do this, I guess, you
>         would need to represent the thesaurus in some form of XML format
>         (which is best?) and then the bit I don’t see at all – declaring
>         somewhere (in the XML schema?) that the tissue sample type data
>         item should be constrained in the manner I explained.
> 
>          
> 
>         For added complication, you could add the rule – all narrower
>         terms of “animal organs” or “animal tissues”, EXCEPT “stomach”
>         or any narrower term of “stomach”.
> 
>          
> 
>         Does that make any sense?
> 
>          
> 
>         Thanks for any help you can provide me,
> 
>          
> 
>         Richard
> 
>          
> 
>         Richard Northedge
> 
>         SFW Ltd.
> 
>          
> 
>          
> 
>          
> 
>          
> 
>          
> 
>         ------------------------------------------------------------------------
> 
>         *From:* Miles, AJ (Alistair) [mailto:A.J.Miles@rl.ac.uk]
>         *Sent:* 18 May 2005 17:41
>         *To:* RichardN
>         *Subject:* RE: XML schemas and thesauri question
> 
>          
> 
>         Hi Richard,
> 
>          
> 
>         This is a very interesting use case!  Can you give more details,
>         more specific examples of the kinds of constraint you want to
>         enforce?
> 
>          
> 
>         Cheers,
> 
>          
> 
>         Alistair.
> 
>          
> 
>         ---
>         Alistair Miles
>         Research Associate
>         CCLRC - Rutherford Appleton Laboratory
>         Building R1 Room 1.60
>         Fermi Avenue
>         Chilton
>         Didcot
>         Oxfordshire OX11 0QX
>         United Kingdom
>         Email:        a.j.miles@rl.ac.uk
>         Tel: +44 (0)1235 445440
> 
>             -----Original Message-----
>             *From:* RichardN [mailto:RichardN@sfwindows.co.uk]
>             *Sent:* 18 May 2005 15:00
>             *To:* Miles, AJ (Alistair)
>             *Subject:* XML schemas and thesauri question
> 
>             Hello Alistair,
> 
>              
> 
>             I am one of Daniel Whymark’s colleagues, and he mentionedto
>             me that you might be able to help with an XML / semantic web
>             type question that I have.  To give you some indication of
>             my current level of understanding: I am comfortable with XML
>             and XML schemas, and have been reading up about semantic web
>             concepts, but I don’t have a strong grasp of any of the
>             semantic web –type XML languages such as RDF, XML topic
>             maps, OWL etc.  I have come across SCOS, but that’s aboutas
>             far as it goes.
> 
>              
> 
>             We need to define a standard format for our data entities. 
>             The obvious way of doing this is to define the format using
>             XML schemas.
> 
>              
> 
>             We also have an ISO 2788 style thesaurus with BT (broader
>             term), NT (narrower term) etc.  Some of the data items in
>             the data entities should have their values restricted to a
>             set of preferred terms in the thesaurus.  For example, a
>             data item might need to be restricted so that the set of
>             allowable values includes the thesaurus term “United
>             Kingdom” and all of the narrower terms belonging to the
>             “United Kingdom” term.  In some cases, it may be necessary
>             to restrict the levels of narrower terms underneath the root
>             term that are allowable.
> 
>              
> 
>             My question is: what is the best way of encoding the
>             thesaurus in standards-compliant XML in such a way that it
>             can be linked to the XML schemas, so that we can enforce the
>             data item restrictions I have outlined?
> 
>              
> 
>             Any help you can give me would be much appreciated,
> 
>              
> 
>             Regards,
> 
>             Richard
> 
>              
> 
>             Richard Northedge
> 
>             SFW Ltd.
> 
>              
> 

-- 
  Mark F.J. van Assem - Vrije Universiteit Amsterdam
        mark@cs.vu.nl - http://www.cs.vu.nl/~mark



-- 
  Mark F.J. van Assem - Vrije Universiteit Amsterdam
        mark@cs.vu.nl - http://www.cs.vu.nl/~mark

Received on Thursday, 9 June 2005 09:00:43 UTC