[FHIR JSON-LD] How best to handle or avoid blank nodes? from David Booth on 2015-02-26 (public-linked-json@w3.org from February 2015)

From: David Booth <david@dbooth.org>
Date: Thu, 26 Feb 2015 15:35:15 -0500
To: public-linked-json@w3.org, Manu Sporny <msporny@digitalbazaar.com>, Markus Lanthaler <markus.lanthaler@gmx.net>, Jim McCusker <mccusj@rpi.edu>
CC: Pat Hayes <phayes@ihmc.us>
Message-ID: <54EF8383.2060900@dbooth.org>
On 02/25/2015 10:11 AM, Manu Sporny wrote:
> So, count us in - send the questions to the mailing list and it looks
> like you have multiple community members that would be willing to help out.

Thanks Manu (and Markus and Jim and any others)!   Okay, my first 
question regards blank nodes.

Here is an except of a FHIR JSON data:

{
   "dob": "1972-11-30",
   "_dob": {
     "id": "314159",
     "extension": [{
        "url" : "http://example.org/fhir/extensions#text",
        "valueString" : "Easter 1970"
     }]
}

To turn this into JSON-LD, I've created an @context:

{
    "@context":
    {
       "@vocab": "http://example/fhir/vocab#",
       "fhir": "http://example/fhir#",
       "xsd": "http://www.w3.org/2001/XMLSchema#",
       "dob":
       {
          "@id": "fhir:dob",
          "@type": "xsd:date"
       },
       "_dob":
       {
          "@id": "fhir:_dob",
          "@type": "@id"
       }
    }
}

and I've linked it from the FHIR JSON document (thus involving a single, 
constant, one-line change to the existing format).  Here is the 
resulting JSON-LD, with three # comments added for reference later:

{                                # _:b0
   "@context": "http://dbooth.org/2015/fhir/json-ld/dob-context.jsonld",
   "dob": "1972-11-30",
   "_dob": {                      # _:b1
     "id": "314159",
     "extension": [{              # _:b2
        "url" : "http://example.org/fhir/extensions#text",
        "valueString" : "Easter 1970"
     }]
}

As you can see, none of the JSON objects above has been given an @id, so 
when this is interpreted as RDF, blank nodes (_:b0, _:b1, _:b2) are 
generated to represent those unidentified objects.  Here is the RDF 
interpretation in Turtle:

   @prefix fhir:  <http://example/fhir#> .
   @prefix xsd:  <http://www.w3.org/2001/XMLSchema#> .

   _:b0 fhir:_dob  _:b1 ;
        fhir:dob   "1972-11-30"^^xsd:date .

   _:b1 <http://example/fhir/vocab#extension>  _:b2 ;
        <http://example/fhir/vocab#id>  "314159" .

   _:b2 <http://example/fhir/vocab#url>
            "http://example.org/fhir/extensions#text" ;
        <http://example/fhir/vocab#valueString>
            "Easter 1970" .

One of the key requirements is for FHIR data to be round-trippable 
between the different data representations, such as XML and JSON.  This 
means that if we interpret some given FHIR JSON-LD data as RDF, we need 
to be able to serialize from RDF back to the *same* JSON-LD.  Here is 
the result after serializing the above Turtle back into JSON-LD (using 
jena riot), with line numbers added for reference:

  1. {
  2.   "@context": {
  3.     "fhir": "http://example/fhir#",
  4.     "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
  5.     "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
  6.     "xsd": "http://www.w3.org/2001/XMLSchema#"
  7.   },
  8.   "@graph": [
  9.     {
10.       "@id": "_:fc7725329340449efa72b6f7f5d7182eeb3",
11.       "http://example/fhir/vocab#url": 
"http://example.org/fhir/extensions#text",
12.       "http://example/fhir/vocab#valueString": "Easter 1970"
13.     },
14.     {
15.       "@id": "_:fc7725329340449efa72b6f7f5d7182eeb2",
16.       "http://example/fhir/vocab#extension": {
17.         "@id": "_:fc7725329340449efa72b6f7f5d7182eeb3"
18.       },
19.       "http://example/fhir/vocab#id": "314159"
20.     },
21.     {
22.       "@id": "_:fc7725329340449efa72b6f7f5d7182eeb1",
23.       "fhir:_dob": {
24.         "@id": "_:fc7725329340449efa72b6f7f5d7182eeb2"
25.       },
26.       "fhir:dob": {
27.         "@type": "xsd:date",
28.         "@value": "1972-11-30"
29.       }
30.     }
31.   ]
32. }

As you can see, there are several differences from the original JSON-LD, 
which is not a surprise.  I already know that a generic RDF JSON-LD 
serializer will not suffice for this purpose -- a special purpose 
FHIR-aware JSON-LD serializer will be needed -- but that's okay, because 
FHIR already requires a special-purpose serializer for its existing JSON 
format anyway.   Some of the differences could be readily handled by a 
special-purpose FHIR JSON-LD serializer (such as the outer @graph 
wrapper and the embedded @context), but not all.

In this message I want to focus specifically on the blank nodes, so for 
the moment I'll ignore other differences.  Blank node labels are 
arbitrary in RDF, so they might be serialized differently by different 
serializers or in different runs of the same serializer, so I had been 
thinking that, to enable predictable round tripping, it may be best to 
generate predictable URIs instead of using blank nodes.  However, a 
downside of this is that AFAIK it would require the FHIR JSON-LD 
instance data -- not merely the @context -- to contain an explicit @id 
property on every FHIR JSON-LD object.  Is this correct?  If so, FHIR 
JSON users are likely to reject that option as too onerous, because most 
of them don't care about RDF.  If not, how would URIs be specified in 
the @context for JSON-LD objects that have no explicit @ids?

However, I am now thinking that it might be better to allow those 
unidentified JSON-LD objects to become blank nodes in the RDF 
interpretation, and instead have the FHIR-specific JSON-LD serializer 
suppress them (if they are blank nodes).   So instead of serializing a 
bit of RDF as:

   {
     "@id": "_:fc7725329340449efa72b6f7f5d7182eeb3",
     "http://example/fhir/vocab#url": 
"http://example.org/fhir/extensions#text",
     "http://example/fhir/vocab#valueString": "Easter 1970"
   },

it would instead omit the @id property and serialize it as:

   {
     "http://example/fhir/vocab#url": 
"http://example.org/fhir/extensions#text",
     "http://example/fhir/vocab#valueString": "Easter 1970"
   },

Does this seem like a good idea?   Any hidden problems?  Are there other 
approaches that you think would be better?

Thanks,
David
Received on Thursday, 26 February 2015 20:35:44 UTC