Re: Issue raised in health informatics use case (HL7 messages) from James McKinney on 2014-06-05 (public-csv-wg@w3.org from June 2014)

From: James McKinney <james@opennorth.ca>
Date: Thu, 5 Jun 2014 19:23:59 -0400
To: "Tandy, Jeremy" <jeremy.tandy@metoffice.gov.uk>
Cc: "public-csv-wg@w3.org" <public-csv-wg@w3.org>, Eric Prud'hommeaux <eric@w3.org>
Message-Id: <3547B652-9F4A-4512-A274-EFDCCE04A947@opennorth.ca>
Having now read a bit more about HL7, each row has its own schema (identified by the first three characters in the row), and rows may therefore have variable numbers of columns. Since the WG defines CSVs as tabular data in which all rows have the same schema, HL7 messages are not CSV.

On the other hand, I’m not sure that the point "one needs to refer to tables to effectively parse the content” means that HL7 messages are not CSVs. I know many CSVs that are incomprehensible without knowledge external to the CSV file itself: notably, CSVs without a header row. I think those tabular files are nonetheless CSV.

James

On Jun 5, 2014, at 7:00 PM, Tandy, Jeremy <jeremy.tandy@metoffice.gov.uk> wrote:

> All - I have added an issue to Use Case #20 - Health Level Seven (HL7) Messages [1]; the full details of the issue are at [2] ... but the gist of it is that, from what I understand, HL7 messages aren't really CSV. Whilst HL7 is row oriented, the line lengths are irregular, and one needs to refer to tables to effectively parse the content.
> 
> Now, my understanding of this format is very rudimentary, so I might have got it wrong!
> 
> The key question is whether we still want to include this use case? 
> 
> If so, we will need assistance in completing it, including illustrative examples that fit the narrative and a good understanding of how those examples actually work so that they can be adequately described.
> 
> In particular, I wonder if James McKinney, the original contributor of this use case [3], can respond with his thoughts and, if we are going to proceed, help to complete the use case.
> 
> We are fast approaching another Public Working Draft (PWD), so I anticipate publishing in the current state with the incomplete use case and issue in place. It would be excellent to have resolved everything for the subsequent PWD.
> 
> Best regards, Jeremy
> 
> [1] http://w3c.github.io/csvw/use-cases-and-requirements/#UC-HealthLevelSevenHL7
> [2] (text below)
> [3] http://lists.w3.org/Archives/Public/public-csv-wg-comments/2014Apr/0000.html 
> 
> ISSUE: 
> 
> This use case is currently incomplete and does not (yet) follow the narrative style displayed elsewhere in this document.
> 
> A suitable narrative might be:
> 
> "John Doe is being transferred from a one clinic to another to recieve specialied care. The machine-readable transfer documentation includes his name, patient ID, his visit to the first clinic, and some information about his next of kin. The visit info (and many other fields) require microparsing on the '^' separator to extract further structured information about, for example, the referring physician."
> 
> However, further information from Eric Prud'hommeaux indicates that HL7 might be more than we can (or want to) cope with. HL7 messages do not appear to be regular tabular data. OK, so the "microsyntax" in each field is complicated (making the data 3- or 4-dimensional, with "^~\&" being the declared separators for the "fields within fields" in the example below) but it can be worked out, but the real issue is that the rows are not uniform - they have different numbers of fields ...
> 
> MSH|^~\&|EPIC|EPICADT|SMS|SMSADT|199912271408|CHARRIS|ADT^A04|1817457|D|2.5|
> PID||0493575^^^2^ID 1|454721||DOE^JOHN^^^^|DOE^JOHN^^^^|19480203|M||B|254 MYSTREET AVE^^MYTOWN^OH^44123^USA||(216)123-4567|||M|NON|400003403~1129086|
> NK1||ROE^MARIE^^^^|SPO||(216)123-4567||EC|||||||||||||||||||||||||||
> PV1||O|168 ~219~C~PMA^^^^^^^^^||||277^ALLEN MYLASTNAME^BONNIE^^^^|||||||||| ||2688684|||||||||||||||||||||||||199912271408||||||002376853
> 
> 
> In the example above, there are four segments defined: MSH (message header?), PID (patient identification), NK1 (next of kin?) and PV1.
> 
> The data in each segment is parsed according to a specific set of rules defined in a "table" and without this table there's no way to label the parsed attributes. The first line in the example above says that the message conforms to version 2.5 tables (versions from 2.2 to 2.6 are visible in the wild). The version 2.5 table indicates how the message should be parsed, e.g. the PID segment, which happens to include subfields like lastname and firstname ("DOE" and "JOHN" respectively). Without that table, there's no way to know how to label the parsed attributes.
> 
> So whilst HL7 is row oriented, it does not appear to be CSV: line lengths are irregular, and one needs to refer to tables to effectively parse the content.
> 
> Do we still want to include this use case? If so, we will need assistance in completing it, including illustrative examples that fit the narrative and a good understanding of how those examples actually work so that they can be adequately described.
Received on Thursday, 5 June 2014 23:24:32 UTC