RE: Issue raised in health informatics use case (HL7 messages) from Tandy, Jeremy on 2014-06-06 (public-csv-wg@w3.org from June 2014)

From: Tandy, Jeremy <jeremy.tandy@metoffice.gov.uk>
Date: Fri, 6 Jun 2014 08:43:49 +0000
To: James McKinney <james@opennorth.ca>
CC: "public-csv-wg@w3.org" <public-csv-wg@w3.org>, Eric Prud'hommeaux <eric@w3.org>
Message-ID: <2624871D9A05174691BD59F8EFD68AE20884537F@EXXCMPD1DAG3.cmpd1.metoffice.gov.uk>
Hi James - thanks for your quick reply.

"""each row [segment] has its own schema (identified by the first three characters in the row)""" 

... is a much better way to express what I was trying to say!

"""Since the WG defines CSVs as tabular data in which all rows have the same schema, HL7 messages are not CSV."""

... based on this assessment I will remove the use case from the document - although I will leave a note in place indicating that we have considered other row-oriented formats such as HL7 but that these aren't CSV.

"""I know many CSVs that are incomprehensible without knowledge external to the CSV file itself: notably, CSVs without a header row. I think those tabular files are nonetheless CSV."""

... you're right again! In fact, we deal with these by assigning a numerical index identifier to each column and are defining how one would use the supplementary metadata to express the semantics for each column. What I was thinking of when I wrote the email last night was that parsing the microsyntax in each field is challenging without reference to the tables. This is challenge is compounded because the microsyntax in a given column may not be consistent; as you say, each row (or segment) has its own schema. Of course, we could just leave the entire field as a string literal (e.g. "|254 MYSTREET AVE^^MYTOWN^OH^44123^USA|") and leave the detail parsing to some external agent. But the real problem with HL7 is that the formatting of each row is not consistent within a given file.

Many thanks, Jeremy

> -----Original Message-----
> From: James McKinney [mailto:james@opennorth.ca]
> Sent: 06 June 2014 00:24
> To: Tandy, Jeremy
> Cc: public-csv-wg@w3.org; Eric Prud'hommeaux
> Subject: Re: Issue raised in health informatics use case (HL7 messages)
> 
> Having now read a bit more about HL7, each row has its own schema
> (identified by the first three characters in the row), and rows may
> therefore have variable numbers of columns. Since the WG defines CSVs
> as tabular data in which all rows have the same schema, HL7 messages
> are not CSV.
> 
> On the other hand, I'm not sure that the point "one needs to refer to
> tables to effectively parse the content" means that HL7 messages are
> not CSVs. I know many CSVs that are incomprehensible without knowledge
> external to the CSV file itself: notably, CSVs without a header row. I
> think those tabular files are nonetheless CSV.
> 
> James
> 
> On Jun 5, 2014, at 7:00 PM, Tandy, Jeremy
> <jeremy.tandy@metoffice.gov.uk> wrote:
> 
> > All - I have added an issue to Use Case #20 - Health Level Seven
> (HL7) Messages [1]; the full details of the issue are at [2] ... but
> the gist of it is that, from what I understand, HL7 messages aren't
> really CSV. Whilst HL7 is row oriented, the line lengths are irregular,
> and one needs to refer to tables to effectively parse the content.
> >
> > Now, my understanding of this format is very rudimentary, so I might
> have got it wrong!
> >
> > The key question is whether we still want to include this use case?
> >
> > If so, we will need assistance in completing it, including
> illustrative examples that fit the narrative and a good understanding
> of how those examples actually work so that they can be adequately
> described.
> >
> > In particular, I wonder if James McKinney, the original contributor
> of this use case [3], can respond with his thoughts and, if we are
> going to proceed, help to complete the use case.
> >
> > We are fast approaching another Public Working Draft (PWD), so I
> anticipate publishing in the current state with the incomplete use case
> and issue in place. It would be excellent to have resolved everything
> for the subsequent PWD.
> >
> > Best regards, Jeremy
> >
> > [1] http://w3c.github.io/csvw/use-cases-and-requirements/#UC-
> HealthLevelSevenHL7
> > [2] (text below)
> > [3] http://lists.w3.org/Archives/Public/public-csv-wg-
> comments/2014Apr/0000.html
> >
> > ISSUE:
> >
> > This use case is currently incomplete and does not (yet) follow the
> narrative style displayed elsewhere in this document.
> >
> > A suitable narrative might be:
> >
> > "John Doe is being transferred from a one clinic to another to
> recieve specialied care. The machine-readable transfer documentation
> includes his name, patient ID, his visit to the first clinic, and some
> information about his next of kin. The visit info (and many other
> fields) require microparsing on the '^' separator to extract further
> structured information about, for example, the referring physician."
> >
> > However, further information from Eric Prud'hommeaux indicates that
> HL7 might be more than we can (or want to) cope with. HL7 messages do
> not appear to be regular tabular data. OK, so the "microsyntax" in each
> field is complicated (making the data 3- or 4-dimensional, with "^~\&"
> being the declared separators for the "fields within fields" in the
> example below) but it can be worked out, but the real issue is that the
> rows are not uniform - they have different numbers of fields ...
> >
> >
> MSH|^~\&|EPIC|EPICADT|SMS|SMSADT|199912271408|CHARRIS|ADT^A04|1817457|D
> |2.5|
> > PID||0493575^^^2^ID
> 1|454721||DOE^JOHN^^^^|DOE^JOHN^^^^|19480203|M||B|254 MYSTREET
> AVE^^MYTOWN^OH^44123^USA||(216)123-4567|||M|NON|400003403~1129086|
> > NK1||ROE^MARIE^^^^|SPO||(216)123-4567||EC|||||||||||||||||||||||||||
> > PV1||O|168 ~219~C~PMA^^^^^^^^^||||277^ALLEN
> MYLASTNAME^BONNIE^^^^||||||||||
> ||2688684|||||||||||||||||||||||||199912271408||||||002376853
> >
> >
> > In the example above, there are four segments defined: MSH (message
> header?), PID (patient identification), NK1 (next of kin?) and PV1.
> >
> > The data in each segment is parsed according to a specific set of
> rules defined in a "table" and without this table there's no way to
> label the parsed attributes. The first line in the example above says
> that the message conforms to version 2.5 tables (versions from 2.2 to
> 2.6 are visible in the wild). The version 2.5 table indicates how the
> message should be parsed, e.g. the PID segment, which happens to
> include subfields like lastname and firstname ("DOE" and "JOHN"
> respectively). Without that table, there's no way to know how to label
> the parsed attributes.
> >
> > So whilst HL7 is row oriented, it does not appear to be CSV: line
> lengths are irregular, and one needs to refer to tables to effectively
> parse the content.
> >
> > Do we still want to include this use case? If so, we will need
> assistance in completing it, including illustrative examples that fit
> the narrative and a good understanding of how those examples actually
> work so that they can be adequately described.
Received on Friday, 6 June 2014 08:44:21 UTC