Notes from today's HCLS call on Value Sets in RDF

http://www.w3.org/2014/09/09-hcls-minutes.html
and below as plain text.

David Booth

-------------------------------------------------------
    [1]W3C

       [1] http://www.w3.org/

                                - DRAFT -

                                   HCLS

09 Sep 2014

    See also: [2]IRC log

       [2] http://www.w3.org/2014/09/09-hcls-irc

Attendees

    Present
           +1.978.794.aaaa, tim_w, Mehmet, DBooth, Tony, claude,
           ericP, +1.469.226.aabb, Neda, mscottm2, Ingeborg,
           [IPcaller]

    Regrets
    Chair
           DBooth (by default)

    Scribe
           dbooth

Contents

      * [3]Topics
          1. [4]Value Sets in RDF
          2. [5]Validation Working Group
      * [6]Summary of Action Items
      __________________________________________________________

    <mscottm2> +

Value Sets in RDF

    Claude: Reviewed ballot submission on standardizing value sets
    in DSTU. They mean an expression that can generate a collection
    of terms.
    ... YOu can say you want to intentionally define a value set,
    or extensionally, which means you enumerate the values.
    ... Think of a valueset as a knowledge document: metadata
    (author, date published, version,etc); an expression that
    allows you to define this extension.
    ... I noticed in the HL7 proposal that the expression itself
    was defined in UML. But I would think that if you were to
    define the expression of a valueset you'd do it in an
    expression language, because valuesets are a way to define
    members of a set.
    ... How would we address this in the semantic world?
    ... Given in OWL, you use DL to define sets of concepts, could
    that be used in the definition of valuesets?

    <ericP> [7]http://www.w3.org//2013/02/ODM/

       [7] http://www.w3.org//2013/02/ODM/

    Claude: One way to define a VS is to enumerate all the terms --
    a list of terms.

    <mscottm2> Here's something that Eric wrote up about value
    sets:
    [8]https://www.w3.org/wiki/HCLS/ClinicalObservationsInteroperab
    ility/FDATherapeuticAreaOntologies/Validation#Value_set

       [8] 
https://www.w3.org/wiki/HCLS/ClinicalObservationsInteroperability/FDATherapeuticAreaOntologies/Validation#Value_set

    <ericP> [9]example of context-neutral value set in OWL

       [9] http://www.w3.org//2013/02/ODM/

    Claude: The second way is a regex that allows you to find all
    the terms you want.

    <ericP> [10]example of "indirect hierarchy"

      [10] 
https://www.w3.org/wiki/HCLS/ClinicalObservationsInteroperability/FDATherapeuticAreaOntologies#Clinical_compatibility

    Claude: Another way is "this term and all its children, but
    only the leaves"
    ... Also have a concept in a vocabulary, and you filter on that
    property ( :diabetes ) and you get all diabetes-related terms.
    ... Also by unioning, intersecting or differencing sets.
    ... Also by a query: the result set is the valueset.
    ... What the evaluation returns is the valueset

    Eric: In ShEx we can't talk about "only the leaves"
    ... But can express valuesets by globbing or by an HTTP GET.
    ... for creating a valueset by query.
    ... If you're trying to validate some data to ensure that
    things are in the right range, then you can plug the thing that
    you're testing against into the query.
    ... But that's beyond what you'll get out of ShEx or Resource
    Shapes normally. You'd need an extension.
    ... Expecting a SNOMED term that is a morphoolgy, there are 50k
    morphology terms, that's more work if you do a GET to get them
    all.

    Claude: There are queries with filters -- like SPARQL -- and
    the other way is set operations.
    ... If ShEx supports this then it would seem to cover most of
    the query/filter side of things, and if there's a way to
    represent set operations then you could cover most of these
    cases.

    Eric: What are the use cases and their expressivity
    requirements? ShEx itself can't do intersection or union, but
    you could pass that to an extension function via a URI that
    captures that functionality
    ... For instance validation, you can say X must be in some VS:
    The object of this predicate is in this VS, and it checks to
    see if it is in it.
    ... How can you make it get the right VS? (Intersections,
    diffs, etc)
    ... But downside is that that looks like a big ugly URL in the
    ShEx, when it's in fact a URL-encoded SPARQL query.
    ... But I think you also want in the language, a URL-composer,
    so that you can have the URL in a human-readable form.

    Claude: But to have the URL, someone has to have already
    defined the query.
    ... Need to say: here's my intent. I want this set, and this
    expression, and this set, and the subsumption, and union them.
    Maybe DL could do this. On the LHS you could say the set of all
    terms that matches this regex, unioned with the set of all
    terms including or descendent from some other term.
    ... I can take this expression and convert to SPARQL. On one
    side I have the expression, and the other i can execute it. But
    how woudl I define it if the expression has not yet been built?

    Eric: Closest available: a small amount of code to get to this.
    Take SheX or Resource Shapes, and say the valueset is this URL
    that ends in "?=", plus the URL-encoding of this triple-quoted
    string (which is sparql), and by that you're able to talk to
    any sparql endpoint.
    ... so you could do all of the set operationsn in sparql.
    ... Slightly further away is to do them in OWL DL-Query that's
    built into Protege, but you'd need deployed DL-query engines.
    And both of those approaches require embedded URLs and humann
    readable descriptions.
    ... Harold, does that make sense?

    Harold: I'm grasping at context here.

    Eric: You have some Resource Shapes or ShEx that describes a
    valid instance, and in that it says the VS that must match is
    enumerated here, where "here" is the result of a query.

    Harold: What would be useful in that situation would be if that
    URL resolves to a set of URLs. Come up with a simple
    representation of what that URL returns. E.g., Turtle, or an
    RDF list.

    David: Or a SPARQL result set.

    Eric: You could add to your favorite validator the ability to
    add a URL to a "query=" URL.

    David: Like a URL template

    Harold: I'd like to separate the mechanism for generating the
    VS from the VS reference.
    ... We want to enable it to reference a simple flat list -- no
    sparql involved.
    ... But some valuesets are complex with more structure, and
    it's good to allow pepople to be clever too.
    ... One challenge is that some valuesets are large, and it's in
    our interest if we can ask "is this a valid value" rather than
    returning all 300k values.

    Claude: Suppose you have a huge VS, and you want to know if a
    given term is in it.
    ... If you define the VS a certain way, can you do that check
    without ever looking at a term in it?
    ... E.g., if the VS is defined by a rule, maybe just plug the
    given term into the rule to see if it is satisfied.

    David: Two use cases: validation (whether a term is allowed in
    a VS) versus generation of all terms in a VS (e.g. for
    displaying in a dro-down list)

    <ericP> analyte: C-reactive peptide; source: CSF

    David: So you specify the term by giving a set of properties of
    it.

    Harold: Need to include in expressions the URI and a block of
    text that indicates what we expect from it: "This yields any
    LOINC code with the following characteristics ... "
    ... In the process of offereing any data models, you get to the
    VS and we need to record it in a human readable way to document
    it.

    David: So the expressionsn themselves do not make it obvious to
    the human reader what will come back?

    Harold: Yes, someone must do the work to turn it into a URI and
    resolve it to possible values. e.g., countries that possess a
    particular characteristic.
    ... i can describe that in a shape, then as a secondary test
    someone can figure out the best way to determine that, and they
    they give me a URI that will give me that list.

    Eric: You don't want to put making a semantic representation of
    medra in the critical path?

    Harold: Right.
    ... We had a proposal from Bodenreider for putting Medra into
    shapes.

    <mscottm2>
    [11]https://www.w3.org/wiki/HCLS/ClinicalObservationsInteropera
    bility/FDATherapeuticAreaOntologies/Validation#Value_set

      [11] 
https://www.w3.org/wiki/HCLS/ClinicalObservationsInteroperability/FDATherapeuticAreaOntologies/Validation#Value_set

    Scott: Re a URI for a VS and it's filled in by a procedural
    attachment, Eric wrote about CDISK code for marital status, he
    itemizes the VS in this fragment. In ShEx you could do it that
    way, but I'd like to say that a shape can take on a value from
    the set of marital codes without having to enumerate them.
    ... This shape takes a value from Observations.

    Eric: One way is to enumerate directly. Another is to do a GET
    that returns line-feed delimited list of the terms. (Where that
    URL is not too ugly.) Another is ShEx with a hideous URL that
    is triple-quoted and goes at the end of the query URL. A fourth
    way is to give it a URL and the system knows the URL but it
    isn't mechanically connected to the web, it's an internal hook
    to a database, for example.

    Claude: Another use case: terminologist wants to define a VS.
    How do we support them in doing this? SPIN is SPARQL-like. Need
    a way for that terminologies to define and share that VS.

    <hsolbrig> MeSh in RDF. Temporary endpoint.
    [12]http://mor2.nlm.nih.gov/conductor account: meshdemo
    password: demoofmeshinrdf - select iSQL button -- gives a
    SPARQL window

      [12] http://mor2.nlm.nih.gov/conductor

    Claude: Clinical person needs to be able to capture this,
    saying what the VS should be.

    David: Need a well-known notion of ValueSet, so that someone
    can say "this is a ValueSet".

    Eric: You could have something like purl.org that dereferences
    to a valueset, perhaps with a description like Harold wants,
    and a way to compose new ones from old ones.

    <hsolbrig> Agree. ValueSet should consist of a set of URL's +
    metadata about source / dates / etc.

    David: This need to compose sets from other sets is not unique
    to healthcare. How is it normally done?

    Eric: If you represent the VS list as a set of triples, rather
    than a list of values, then you could use SPARQL to do the set
    operations.

    <hsolbrig> Apologies, I need to go...

    Eric: Where have you seen these set operations in VS?

    Claude: Reaction might be caused by food or medication or latex
    gloves, etc. So you may want to say that you can unioin all of
    these substances as the terms in the VS.
    ... Another case is that there's an existing VS, but many are
    not relevant to the domain, so you want to take away the
    irrelevant ones.

    Eric: Or you might want to prefer a VS.

    <Kerstin> the constrained use case just described now (narrower
    scope) is very comon for us in clinical trial data!

Validation Working Group

    Eric: Call for participation has gone out. Now's the time to
    join! First f2f will be in Santa Clara, end of October.

    <ericP> [13]https://www.w3.org/mid/op.xkzxenxqsvvqwp@sith.local

      [13] https://www.w3.org/mid/op.xkzxenxqsvvqwp@sith.local

    <ericP> [14]W3C Call for Participation in RDF Shapes WG (member
    only)

      [14] https://www.w3.org/mid/op.xkzxenxqsvvqwp@sith.local

    ADJOURNED

    <ericP>
    [15]https://help.github.com/articles/working-with-large-files

      [15] https://help.github.com/articles/working-with-large-files

    lost eric!

Summary of Action Items

    [End of minutes]
      __________________________________________________________


     Minutes formatted by David Booth's [16]scribe.perl version
     1.138 ([17]CVS log)
     $Date: 2014-09-09 17:02:09 $
      __________________________________________________________

      [16] http://dev.w3.org/cvsweb/~checkout~/2002/scribe/scribedoc.htm
      [17] http://dev.w3.org/cvsweb/2002/scribe/

Scribe.perl diagnostic output

    [Delete this section before finalizing the minutes.]
This is scribe.perl Revision: 1.138  of Date: 2013-04-25 13:59:11
Check for newer version at [18]http://dev.w3.org/cvsweb/~checkout~/2002/
scribe/

      [18] http://dev.w3.org/cvsweb/~checkout~/2002/scribe/

Guessing input format: RRSAgent_Text_Format (score 1.00)

Succeeded: s/David/DBooth/
No ScribeNick specified.  Guessing ScribeNick: dbooth
Inferring Scribes: dbooth
Default Present: +1.978.794.aaaa, tim_w, Mehmet, DBooth, Tony, claude, e
ricP, +1.469.226.aabb, Neda, mscottm2, Ingeborg, [IPcaller]
Present: +1.978.794.aaaa tim_w Mehmet DBooth Tony claude ericP +1.469.22
6.aabb Neda mscottm2 Ingeborg [IPcaller]
Got date from IRC log name: 09 Sep 2014
Guessing minutes URL: [19]http://www.w3.org/2014/09/09-hcls-minutes.html
People with action items:

      [19] http://www.w3.org/2014/09/09-hcls-minutes.html


    [End of [20]scribe.perl diagnostic output]

      [20] http://dev.w3.org/cvsweb/~checkout~/2002/scribe/scribedoc.htm

Received on Tuesday, 9 September 2014 19:21:49 UTC