Re: DICOM RDF representation from Detlef Grittner on 2024-03-25 (public-semweb-lifesci@w3.org from March 2024)

From: Detlef Grittner <detlef.grittner@sohard.de>
Date: Mon, 25 Mar 2024 11:52:35 +0100
To: Eric Prud'hommeaux <eric@uu3.org>, david@dbooth.org, phayes@ihmc.org
Cc: public-semweb-lifesci@w3.org
Message-ID: <cb7c29f4-c9b9-4d50-8d4d-11d83c0bfbe7@sohard.de>
On 25.03.24 10:03, Eric Prud'hommeaux wrote:
> On Sun, Mar 24, 2024 at 06:57:18AM +0000, Patrick J. Hayes wrote:
>>>
>>> On Mar 22, 2024, at 12:49 PM, David Booth <david@dbooth.org> wrote:
>>>
>>> Hi Erich,
>>>
>>> As food for thought, it occurred to me that there is another way you could think about the DICOM problem of needing to assert null values (missing data), which we discussed on yesterday's FHIR RDF call.[1]
>>> Imagine that you could write something in RDF that corresponds to this, in json-ish syntax:
>>>
>>>   { x: 25,
>>>     y: 30,
>>>     p: null }
>>>
>>> That's saying that, in this particular chunk of data, there is no value asserted for p.  But it isn't saying that a value for p doesn't exist somewhere else in the data or in the universe.  In the RDF world, that sounds very much like it is not making an assertion about p's value.
>>>
>>> However, I think it *is* making an assertion about the *schema* of the data, saying that p is part of the *schema*.  So in essence, I think it is using instance data syntax to assert something about the *schema*. So I wonder if it might work to somehow attach that schema information, when converting to RDF, instead of trying to assert a null p.  For starters, here is one straw man possibility, in Turtle-ish syntax:
>>>
>>>   :something
>>>     :x 25,
>>>     :y 30,
>>>     dicom:null :p .
>>>
>>> This would have the benefit of still permitting a value for :p to be asserted, without conflict, which retaining the schema information about :something normally having a :p property.
>>>
>>> In converting back from RDF to JSON or something else, the conversion could see if there is an asserted value for :p, such as 32.  If there is, it would assert that value:
>>>
>>>     p: 32
>>>
>>> If not, it would know to assert an empty value for p, such as one of these, depending on how you are representing null in json:
>>>
>>>     p: ""
>>>     p: null
>>>     p: []
>>>
>>> BTW, I mentioned on the call that another possibility is to use a distinguished value in RDF, to represent null, such as urn:null .  But I don't think this would play very well with inference, because if you asserted something like:
>>>
>>>    :something :p urn:null .
>>>
>>> and :p is normally supposed to hold an integer (for example), then urn:null would have to be in the value space of integer.  And if :p were later asserted to be 32, then an inference engine might conclude that urn:null = 32 (it :p can only have one value) unless it were augmented to handle urn:null specially.  I imagine folks like Pat Hayes thought about this long ago and concluded that a distinguished null value like this would be a Bad Idea, because it goes against the grain of description logic.
>> Yes, such folks did exactly that and came to that very conclusion. Any logic, not just description logics; and we had the debate with database engieers long before RDF was invented.
>>
>> You note one problem with null, but the chief problem is that if 'null' is treated as a name, then it has to denote the same thing wherever it occurs, so any two entries with 'null' in them have the same value. Which I gather (I actually have no clear idea what "null" is supposed to mean, after years of trying to get people to tell me) is not what is intended. Your dicom:null property solution overcomes this objection very neatly (as long as nobody tries to create an ontology of that dicom:null property) and it has the merit, from the RDF perspective, of handing the question of what 'null' means back to dicom itself.
>>
>> Maybe 'null' is not a name but rather an existential variable, with the convention that it is a different variable every time it occurs, ie each actual token of 'null' is a distinct variable. So then 'null' means something like "something, but we don't know what (yet)". In that case, in RDF the obvious solution is to use a unique bnode. Or a skolem constant if you dislike bnodes.
>>
>> But the simplest solution is the one you mention first. If 'null' means that no data is available, why are you bothering to say it? Just leave those entries out of the RDF altogether. If this seems wrong, my response would be, what utility is served by including them? What do they make possible, that would not work if they were simply not there?
> Hey Pat, good to hear from you! Now on with the geeking...
>
> NULL: The use cases for NULL are typically convenient lies, lies in which we happily participate in order be admitted into the medical world's stack of standards. On one side, you have þe olde standards which attempt to make squre and normal a sprawling graph of medical info. DB's of course have null because their tables are tables so if the cardinality is 0..1, you need to write SOMETHING down (or splity out another table which, while simple in theory, complicates queries and is likely to push your query optimizer past its abilities.
>
> Null Flavor: [setting the stage with a digression into another clinical data model.] Stack on top of that the somewhat course clinical standards which say "if you have an observation, it must have a value". It's not really true; if you have an observation you must have an observed value or a good excuse (AKA "Null Flavor") for not having a value. If we were inventing this from scratch in FHIR (which is not a gothic standard but still bears a bit of their legacy), we'd have written this in ShEx:
>
> <#Observation> {
>    (  fhir:value xsd:integer OR xsd:string OR xsd:boolean OR @fhir:PhysicalQuantity OR @fhir:CodeableConcept …
>     | fhir:noValueReason @NullFlavor ); …
> }
>
> We had to make sure we were round-tripable and familiar enough to the FHIR users that we swallowed and accepted the standard FHIR encoding (specifically, break naive queries by renabing fhir:value to fhir:_value if the value can't be interpreted naively).
>
> Many standards have a list of Null Flavors which typically include specific reasons for data omission and a catch-all like "No Information". Dicom's Null Flavors: <https://dicom.nema.org/MEDICAL/Dicom/current/output/chtml/part20/sect_5.3.2.html>.
>
> Maybe we should keep NULLs and Null Flavors seperate as the formats with which we round-trip do. The `dicom:null :p` predicate is a pretty cool way to prevent the match on `?s :p ?o`. OTOH, it's probably worth a little head scratching to see if there's a not-terribly-inelegant solution available. I feel like the general FHIR (XML or JSON) approach of capturing non-mon "extensions" like null flavors requires unreallistic diligence on the part of users/developers; they have to scan *every* entity in the tree for *every* modifiying extension (at least the part of the tree that's allowed to pull the rug out from under you). No one's gonna write those queries so non-mon stuff is gonna wreak a bit of havoc.
>
> OTOH, I think that what we did with FHIR/RDF is reasomabley elegant. We rename the property if it has any modifying extensions so no naive query/rule will match the data. If you match `fhir:_value`, then you know you are signing up for the complicated world described above, but at least the non-mon word is in a separate own bubble. Maybe this model is worth preserving in a general Dicom representation. At the very least, we'll want to define how we interact with it for the Dicom data that ends up in a FHIR Genomics Resource (Pat, that's a capitalization-of-art specific to FHIR).
>
> Erich, et al, can we see a use case that demos both NULL and Null Flavors?

The approach to use a "dicom:null :p" unfortunately runs into a problem, 
if you want to represent that concept in an OWL ontology, because it 
forces you to go to OWL 2 Full (including undecidability). For practical 
purposes I would like to stick to OWL DL or even a more restrictive 
profile. But then you are not allowed to use properties in the object 
position, if you don't expand the set of official meta properties in OWL 
2 DL. The mathematical consequences and what a "owl:null" meta property 
would mean for decidability is currently not known to me.

Looking at the semantics of DICOM, there are three types of attributes, 
type 1 is mandatory and must have a value, type 2 is mandatory, but may 
be empty, if the value is unknown, type 3 is optional [1]. The meaning 
of type 2 and empty value is not really that is doesn't exist, but that 
it is not known to the creator of the data set. That might not be a 
universal concept, but something that exists in the semantics of DICOM. 
And this concept is used for pure DICOM objects, there are different 
concepts for reports that intersect with HL7, which Eric has mentioned.

Just to name an example, take the attribute "Patient's Birth Date", 
logically that value must exist, but at the time of the creation of the 
DICOM object it might be unknown. The DICOM standard wants to preserve 
that semantic meaning, so "Patient's Birth Date" is a type 2 attribute. 
Maybe that somehow addresses Patrick's question.

In the scope of DICOM it is more like a dicom:Unknown. But this concept 
cannot be combined with the simple data type xsd:int, xsd:double etc. It 
looks like either you stick to the DICOM semantics and use strings, 
which can be empty in RDF, or you create a more complex representation 
of an attribute with a class that can have dicom:Unknown. The latter 
somehow bloats the actual data with additional nodes though. By the way 
in JSON the concept of an empty string "" could not be used, because 
being JavaScript at heart, it has weak typing and a "" translates to the 
Number 0.0. In RDF a string cannot be misinterpreted like that.

Detlef

[1]
https://dicom.nema.org/medical/dicom/current/output/chtml/part05/sect_7.4.html
https://dicom.nema.org/medical/dicom/current/output/chtml/part05/sect_7.4.2.html
https://dicom.nema.org/medical/dicom/current/output/chtml/part05/sect_7.4.3.html
https://dicom.nema.org/medical/dicom/current/output/chtml/part05/sect_7.4.4.html
https://dicom.nema.org/medical/dicom/current/output/chtml/part05/sect_7.4.5.html

>
>
>> Best wishes
>>
>> Pat Hayes
>>
>>
>> 1. https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.w3.org%2F2024%2F03%2F21-hcls-minutes.html&data=05%7C02%7Cphayes%40ihmc.us%7C8bc5d3e64c984e78b69508dc4aa979b5%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638467338984471646%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=O%2FD6EtLAdEMkMyfxdxx3H4ljyzx3is21rutxf%2FbaY5I%3D&reserved=0<https://www.w3.org/2024/03/21-hcls-minutes.html>
>>
>> Thanks,
>> David Booth
>>
>>
>>> On 1/10/24 13:21, Erich Bremer wrote:
>>> Hi Detle,
>>> On the delay in your response, you are completely forgiven. :-)
>>> Thanks for the links!  I've read your paper and I have a few questions if I may ask:
>>> 1) Is your ontology generated automatically from the normative DICOM XML?
>>> 2) If so, is this process open-source?
>>> 3) You mentioned in the paper that you convert everything to strings, (not taking advantage of the value representations).  I take it that an all-string approach is a big performance hit. Have you (since then) converted the strings to something more analytically efficient?
>>> 4) Can you share any samples of your dicom2RDF conversions?
>>> I've done my own conversion of DICOM to RDF (dcm2rdf) using (like you) the dcm4che library and scaled this to handle large sets of dcm files.  The code will generate either a long form conversion keeping VR typing:
>>> <urn:md5:44c5f855d4ee27141c926b2084b461a4> dcm:00080060  [ dcm:Value "XA"; dcm:vr "CS" ] .
>>> or a compact form dropping the VR typing and converting the actual values to an optimal form.
>>> <urn:md5:44c5f855d4ee27141c926b2084b461a4> dcm:00080060 "XA" .
>>> I tend to use the latter form as it cuts down on the number of triples and makes for better query performance in the Virtuoso triple store.
>>> All of this is done without a corresponding defined ontology and I would like to rectify this.  My preference is to see an official DICOM conversion but I don't know if I am alone in this endeavour.  OWL is good, but it would also be helpful to have a SHACL equivalent for RDF data validation.
>>>     - Erich
>>> ==========================================================
>>> Erich Bremer, M.Sc.
>>> Director, Applied Informatics
>>> Department of Biomedical Informatics
>>> Stony Brook Medicine
>>> Tel. : 1-631-444-3560
>>> Fax  : 1-631-444-8873
>>> Cell : 1-631-681-6228
>>> erich.bremer@stonybrook.edu<mailto:erich.bremer@stonybrook.edu> <mailto:erich.bremer@stonybrook.edu>
>>> Office Location/Mailing Address
>>> HSC, L3: Room 119
>>> Stony Brook, NY 11794-8330
>>> On Mon, Jan 8, 2024 at 10:20 AM Detlef Grittner <detlef.grittner@sohard.de<mailto:detlef.grittner@sohard.de><mailto:detlef.grittner@sohard.de>> wrote:
>>>     __
>>>     Hi Eric,
>>>     first let me apologize for the late answer due to all the holidays
>>>     at the end and beginning of the year.
>>>     Actually there has been a publication of a project where that DICOM
>>>     RDF has been used: https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpubmed.ncbi.nlm.nih.gov%2F25160167%2F&data=05%7C02%7Cphayes%40ihmc.us%7C8bc5d3e64c984e78b69508dc4aa979b5%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638467338984486270%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=X%2FlYKbH%2BMpQyeHmrJR6LZ659vCoCMR6uwZBVU4felE8%3D&reserved=0<https://pubmed.ncbi.nlm.nih.gov/25160167/>
>>>     <https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpubmed.ncbi.nlm.nih.gov%2F25160167%2F&data=05%7C02%7Cphayes%40ihmc.us%7C8bc5d3e64c984e78b69508dc4aa979b5%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638467338984490806%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=eJ7aguawAOdu7zAsTT0gJRmrf7Vqw8cp99HaeRmxFDY%3D&reserved=0<https://pubmed.ncbi.nlm.nih.gov/25160167/>>
>>>     This DICOM RDF is described in an OWL ontology, but the published
>>>     version on BioPortal
>>>     (https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbioportal.bioontology.org%2Fontologies%2FSEDI&data=05%7C02%7Cphayes%40ihmc.us%7C8bc5d3e64c984e78b69508dc4aa979b5%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638467338984494545%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=GySFiXPoo%2FUlelo7ohPD7wbM%2B6cdRmSuBKPwXio%2B1no%3D&reserved=0<https://bioportal.bioontology.org/ontologies/SEDI>
>>>     <https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbioportal.bioontology.org%2Fontologies%2FSEDI&data=05%7C02%7Cphayes%40ihmc.us%7C8bc5d3e64c984e78b69508dc4aa979b5%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638467338984498357%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=hkgKeIXgKAzA%2BkP%2BA7bkMR4%2BpuvS8z1WECkM0FBpSsw%3D&reserved=0<https://bioportal.bioontology.org/ontologies/SEDI>>) is completely
>>>     outdated. I will clarify, if I can provide the current version of
>>>     the ontology on that portal and will let you know.
>>>     At the moment there is no lobbying, but I think it would be an
>>>     interesting idea to take that DICOM ontology as a basis for such an
>>>     effort.
>>>     Detlef
>>>     Detlef Grittner
>>>     MSc ISM, M.A.
>>>     Software-Entwicklung
>>>     SOHARD Software GmbH
>>>     Würzburger Str. 197
>>>     90766 Fürth
>>>     Phone: +49 (0) 911 97341-54
>>>     Fax:   +49 (0) 911 97341-10
>>>     E-Mail: detlef.grittner@sohard.de<mailto:detlef.grittner@sohard.de> <mailto:detlef.grittner@sohard.de>
>>>     Geschäftsführer: Peter Feltens, Sebastian Schnitzenbaumer
>>>     Sitz der Gesellschaft: Fürth
>>>     Registergericht: Amtsgericht Fürth; HRB 11478
>>>     On 14.12.23 16:53, Erich Bremer wrote:
>>>     Hi Detlef,
>>>
>>>     Is there anywhere I can read about your DICOM RDF work?  I think
>>>     it would be helpful if there was an officially sanctioned RDF
>>>     representation of DICOM.  Is anyone lobbying them with the idea?     - Erich
>>>     ==========================================================
>>>     Erich Bremer, M.Sc.
>>>     Director, Applied Informatics
>>>     Department of Biomedical Informatics
>>>     Stony Brook Medicine
>>>     Tel. : 1-631-444-3560
>>>     Fax  : 1-631-444-8873
>>>     Cell : 1-631-681-6228
>>>     erich.bremer@stonybrook.edu<mailto:erich.bremer@stonybrook.edu> <mailto:erich.bremer@stonybrook.edu>
>>>     Office Location/Mailing Address
>>>     HSC, L3: Room 119
>>>     Stony Brook, NY 11794-8330
>>>
>>>
>>>
>>>     On Mon, Dec 4, 2023 at 1:25 PM Detlef Grittner
>>>     <detlef.grittner@sohard.de<mailto:detlef.grittner@sohard.de> <mailto:detlef.grittner@sohard.de>> wrote:
>>>
>>>         Hi all,
>>>
>>>         we've been working together with Scott on projects with DICOM
>>>         to RDF conversion. But it is not sanctioned in the sense that
>>>         any organization like w3c or nema has published it as a
>>>         recommendation or standard.
>>>
>>>         Anyhow, if you're interested we could explore whether our idea
>>>         of DICOM RDF fits your purpose.
>>>
>>>         Kind Regards,
>>>
>>>
>>>         Detlef Grittner
>>>         MSc ISM, M.A.
>>>         Software-Entwicklung
>>>
>>>         SOHARD Software GmbH
>>>         Würzburger Str. 197
>>>         90766 Fürth
>>>
>>>         Phone: +49 (0) 911 97341-54
>>>         Fax:   +49 (0) 911 97341-10
>>>         E-Mail: detlef.grittner@sohard.de<mailto:detlef.grittner@sohard.de>
>>>         <mailto:detlef.grittner@sohard.de>
>>>
>>>         Geschäftsführer: Peter Feltens, Sebastian Schnitzenbaumer
>>>         Sitz der Gesellschaft: Fürth
>>>         Registergericht: Amtsgericht Fürth; HRB 11478
>>>
>>>         On 30.11.23 18:29, Eric Prud'hommeaux wrote:
>>>         Hi Scott, Erich Bremer (Cc'd) is working on a use case that intersects FHIR/RDF and some detail-y bits of DICOM. I'd assumed there was a sanctioned RDF for (all of) DICOM but Erich said there A. wasn't a sanctioned RDF representation for DICOM or B, it didn't include the parts of DICOM that cover his use case. Any leads?
>>>
Received on Monday, 25 March 2024 10:52:43 UTC