Re: Best Practices - Semantic Tagging

On Tue, Mar 5, 2013 at 12:19 PM, Tim Cook <tim@mlhim.org> wrote:
> Hi All,
> Thanks for the feedback so far.
>
> Robert:
> Frankly I found the syntax that you presented quite confusing.  But
> maybe that is why I am engaging this community, to flesh out the best
> way to do this in an area where I am not familiar.  As far as
> processors not understanding the approach, I agree that there aren't
> any in existence today.  Apparently the use case hasn't been presented
> before?  However, there is nothing preventing applications from being
> able to look up the attribute to get the semantic link. I understand
> that solving this may not be within the scope of OA.
>
> Stian: The  xs:annotation/xs:appinfo approach does seem to be the
> appropriate way.  Plus, I had missed the fact that appinfo takes a
> source attribute.  My only concern with this approach is that it
> limits the modeller to one link and "source" doesn't really say
> how/what function the referenced link is to be interpreted.  This is
> why I had thought to use rdf:/rdfs: attributes.
>
> I may be repeating myself, however the goal is to indicate to users of
> the instances what the modeller was taking into consideration when
> defining the concept.  The complexType is a restriction of a
> complexType from a generic base schema.  Therefore linking to a
> specific node in a specific version of "one or more" controlled
> vocabularies is important.  This is healthcare data that shall endure
> in its meaning even as the science around it changes over time.

Hm.. that sounds more like provenance than tagging or identifying.

Then perhaps just:

<!--
common namespaces:
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:pav="http://purl.org/pav/"
-->

<xs:appinfo>
    <rdf:RDF>
        <rdf:Description rdf:ID="ct-f6c5ea6e-6458-4799-874d-7f3d365d260d">
          <pav:sourceAccessedAt
rdf:resource="http://purl.bioontology.org/ontology/SNOMEDCT/365761000"
/>
        </rdf:Description>
    </rdf:RDF>
</xs:appinfo>

with the same xml:id trick. If you use the xml:id you can choose to
have either a single rdf:RDF for the whole document (as in the CCDs
example) which describes all complex types, or you can nest this
inside each of the complex types - but I would still mark the
identifiers in the types so anyone extracting this don't accidentally
merge all description or have floating descriptions with unknown
subjects.


> I understand your point about going all the way with RDF and in fact,
> there is a metadata section of the schema that does just that using
> Dublin Core.  Examples: http://www.hkcr.net/CCDs  Maybe that is the
> "right way" to do it here as well?  It just seems awfully verbose.
> :-)

I would not say it's particularly verbose there, it's almost directly
to the Dublin core data. In your example you would just have to
introduce internal identifiers as you want to describe the individual
complex type rather than the whole schema.

Look how easily the Dublin Core data can be extracted as RDF from the
above using CWM :

https://gist.github.com/stain/5090021



-- 
Stian Soiland-Reyes, myGrid team
School of Computer Science
The University of Manchester

Received on Tuesday, 5 March 2013 12:37:32 UTC