Re: Best Practices - Semantic Tagging from Tim Cook on 2013-03-05 (public-openannotation@w3.org from March 2013)

From: Tim Cook <tim@mlhim.org>
Date: Tue, 5 Mar 2013 10:29:39 -0300
To: Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk>
Cc: Robert Sanderson <azaroth42@gmail.com>, public-openannotation <public-openannotation@w3.org>
Message-ID: <CA+=OU3Xstv5B+NUYkJRaV9EHhBDPn2tfeA8apLDcCg=ZyqOB3w@mail.gmail.com>

On Tue, Mar 5, 2013 at 9:36 AM, Stian Soiland-Reyes
<soiland-reyes@cs.manchester.ac.uk> wrote:

> Hm.. that sounds more like provenance than tagging or identifying.
>

I think of  provenance more as ownership or a history of ownership.
Whereas here we are attempting to set a permanent reference for deeper
explanation of the term/concept.   This is where it gets messy. The
CCD is designed to represent a clinical concept and the metadata
section hopefully describes that along with some provenance regarding
authorship, etc.  However, due to the granualrity of healthcare
vocabularies, the individual complexTypes can be defined from them as
well.  These complexTypes can actually be reused across multiple CCDs.
 This is because different modellers may build CCDs that are similar
in concept to others but vary in overall structure.
This has been the center point of the semantic interoperability issue
in healthcare.  The more experts you had to the discussion of a
concept model, the more version you will get.  CCDs are designed to
allow them to all build their own models and still be able to exchange
valid data instances with semantics.

> with the same xml:id trick. If you use the xml:id you can choose to
> have either a single rdf:RDF for the whole document (as in the CCDs
> example) which describes all complex types, or you can nest this
> inside each of the complex types - but I would still mark the
> identifiers in the types so anyone extracting this don't accidentally
> merge all description or have floating descriptions with unknown
> subjects.
>

(thinking and writing so please excuse anything disjointed)  :-)

I think the example you gave is *VERY* attractive: using an xml:ID on
each complexType and then referencing all of them in one metadata
section of the CCD.  I am producing a tool to just build these
complexTypes outside of a CCD, since this is the approach to
reproducing the models from SQL databases and dictionaries ie.
https://wiki.nci.nih.gov/display/caDSR/CDE+Browser for
interoperability with and between legacy systems.

These complexType stubs or "Pluggable Complex Types" as we have
started to call them; are not really valid schemas.  The are just a
text file named after the complexType name
(ct-f6c5ea6e-6458-4799-874d-7f3d365d260d.pct) I can put the
inforamtion that will go into the CCD metadata section in this file
as:

     <rdf:Description rdf:ID="ct-f6c5ea6e-6458-4799-874d-7f3d365d260d">
          <rdfs:isDefinedBy
rdf:resource="http://purl.bioontology.org/ontology/SNOMEDCT/365761000"/>
         < and any other references the modeller wants to create>
   </rdf:Description>

I can then add the functionality to the CCD editor that will extract
these and put them into the CCD rdf:RDF metadata section.  So
everyhting ends up being much ncier and neater, all in one place.

Thoughts?

> I would not say it's particularly verbose there, it's almost directly
> to the Dublin core data. In your example you would just have to
> introduce internal identifiers as you want to describe the individual
> complex type rather than the whole schema.
>
> Look how easily the Dublin Core data can be extracted as RDF from the
> above using CWM :
>
> https://gist.github.com/stain/5090021

This is very cool.  Thanks for the introduction to a new line of
design thought.

--Tim

============================================
Timothy Cook, MSc           +55 21 94711995
MLHIM http://www.mlhim.org
Like Us on FB: https://www.facebook.com/mlhim2
Circle us on G+: http://goo.gl/44EV5
Google Scholar: http://goo.gl/MMZ1o
LinkedIn Profile:http://www.linkedin.com/in/timothywaynecook

Received on Tuesday, 5 March 2013 13:30:08 UTC