RE: Issue raised in health informatics use case (HL7 messages) from Tandy, Jeremy on 2014-06-06 (public-csv-wg@w3.org from June 2014)

From: Tandy, Jeremy <jeremy.tandy@metoffice.gov.uk>
Date: Fri, 6 Jun 2014 11:19:30 +0000
To: James McKinney <james@opennorth.ca>, "public-csv-wg@w3.org" <public-csv-wg@w3.org>
Message-ID: <2624871D9A05174691BD59F8EFD68AE208845613@EXXCMPD1DAG3.cmpd1.metoffice.gov.uk>
James et al, I've updated the use case document to remove the HL7 use case and add a general note to the use case section indicating that we're only focusing on examples that are tabular, not only row oriented, data. HL7 format is explicitly mentioned there. See below:

"""
The use cases below describe many applications of tabular data. Whilst there are many different variations of tabular data, all the examples conform to the definition of tabular data defined in the Model for Tabular Data and Metadata on the Web:

Tabular data is data that is structured into rows, each of which contains information about some thing. Each row contains the same number of fields (although some of these fields may be empty), which provide values of properties of the thing described by the row. In tabular data, fields within the same column provide values for the same property of the thing described by the particular row.

In selecting the use cases we have reviewed a number of row oriented data formats that, at first glance, appear to be tabular data. However, closer inspection indicates that one or other of the characteristics of tabular data were not present. For example, the HL7 format, from the health informatics domain defines a separate schema for each row (known as a "segment" in that format) which means that HL7 messages do not have a regular number of columns for each row.
"""

Best regards, Jeremy

> -----Original Message-----
> From: Tandy, Jeremy [mailto:jeremy.tandy@metoffice.gov.uk]
> Sent: 06 June 2014 09:44
> To: James McKinney
> Cc: public-csv-wg@w3.org; Eric Prud'hommeaux
> Subject: RE: Issue raised in health informatics use case (HL7 messages)
> 
> Hi James - thanks for your quick reply.
> 
> """each row [segment] has its own schema (identified by the first three
> characters in the row)"""
> 
> ... is a much better way to express what I was trying to say!
> 
> """Since the WG defines CSVs as tabular data in which all rows have the
> same schema, HL7 messages are not CSV."""
> 
> ... based on this assessment I will remove the use case from the
> document - although I will leave a note in place indicating that we
> have considered other row-oriented formats such as HL7 but that these
> aren't CSV.
> 
> """I know many CSVs that are incomprehensible without knowledge
> external to the CSV file itself: notably, CSVs without a header row. I
> think those tabular files are nonetheless CSV."""
> 
> ... you're right again! In fact, we deal with these by assigning a
> numerical index identifier to each column and are defining how one
> would use the supplementary metadata to express the semantics for each
> column. What I was thinking of when I wrote the email last night was
> that parsing the microsyntax in each field is challenging without
> reference to the tables. This is challenge is compounded because the
> microsyntax in a given column may not be consistent; as you say, each
> row (or segment) has its own schema. Of course, we could just leave the
> entire field as a string literal (e.g. "|254 MYSTREET
> AVE^^MYTOWN^OH^44123^USA|") and leave the detail parsing to some
> external agent. But the real problem with HL7 is that the formatting of
> each row is not consistent within a given file.
> 
> Many thanks, Jeremy
> 
> > -----Original Message-----
> > From: James McKinney [mailto:james@opennorth.ca]
> > Sent: 06 June 2014 00:24
> > To: Tandy, Jeremy
> > Cc: public-csv-wg@w3.org; Eric Prud'hommeaux
> > Subject: Re: Issue raised in health informatics use case (HL7
> > messages)
> >
> > Having now read a bit more about HL7, each row has its own schema
> > (identified by the first three characters in the row), and rows may
> > therefore have variable numbers of columns. Since the WG defines CSVs
> > as tabular data in which all rows have the same schema, HL7 messages
> > are not CSV.
> >
> > On the other hand, I'm not sure that the point "one needs to refer to
> > tables to effectively parse the content" means that HL7 messages are
> > not CSVs. I know many CSVs that are incomprehensible without
> knowledge
> > external to the CSV file itself: notably, CSVs without a header row.
> I
> > think those tabular files are nonetheless CSV.
> >
> > James
> >
> > On Jun 5, 2014, at 7:00 PM, Tandy, Jeremy
> > <jeremy.tandy@metoffice.gov.uk> wrote:
> >
> > > All - I have added an issue to Use Case #20 - Health Level Seven
> > (HL7) Messages [1]; the full details of the issue are at [2] ... but
> > the gist of it is that, from what I understand, HL7 messages aren't
> > really CSV. Whilst HL7 is row oriented, the line lengths are
> > irregular, and one needs to refer to tables to effectively parse the
> content.
> > >
> > > Now, my understanding of this format is very rudimentary, so I
> might
> > have got it wrong!
> > >
> > > The key question is whether we still want to include this use case?
> > >
> > > If so, we will need assistance in completing it, including
> > illustrative examples that fit the narrative and a good understanding
> > of how those examples actually work so that they can be adequately
> > described.
> > >
> > > In particular, I wonder if James McKinney, the original contributor
> > of this use case [3], can respond with his thoughts and, if we are
> > going to proceed, help to complete the use case.
> > >
> > > We are fast approaching another Public Working Draft (PWD), so I
> > anticipate publishing in the current state with the incomplete use
> > case and issue in place. It would be excellent to have resolved
> > everything for the subsequent PWD.
> > >
> > > Best regards, Jeremy
> > >
> > > [1] http://w3c.github.io/csvw/use-cases-and-requirements/#UC-
> > HealthLevelSevenHL7
> > > [2] (text below)
> > > [3] http://lists.w3.org/Archives/Public/public-csv-wg-
> > comments/2014Apr/0000.html
> > >
> > > ISSUE:
> > >
> > > This use case is currently incomplete and does not (yet) follow the
> > narrative style displayed elsewhere in this document.
> > >
> > > A suitable narrative might be:
> > >
> > > "John Doe is being transferred from a one clinic to another to
> > recieve specialied care. The machine-readable transfer documentation
> > includes his name, patient ID, his visit to the first clinic, and
> some
> > information about his next of kin. The visit info (and many other
> > fields) require microparsing on the '^' separator to extract further
> > structured information about, for example, the referring physician."
> > >
> > > However, further information from Eric Prud'hommeaux indicates that
> > HL7 might be more than we can (or want to) cope with. HL7 messages do
> > not appear to be regular tabular data. OK, so the "microsyntax" in
> > each field is complicated (making the data 3- or 4-dimensional, with
> "^~\&"
> > being the declared separators for the "fields within fields" in the
> > example below) but it can be worked out, but the real issue is that
> > the rows are not uniform - they have different numbers of fields ...
> > >
> > >
> >
> MSH|^~\&|EPIC|EPICADT|SMS|SMSADT|199912271408|CHARRIS|ADT^A04|1817457|
> > MSH|D
> > |2.5|
> > > PID||0493575^^^2^ID
> > 1|454721||DOE^JOHN^^^^|DOE^JOHN^^^^|19480203|M||B|254 MYSTREET
> > AVE^^MYTOWN^OH^44123^USA||(216)123-4567|||M|NON|400003403~1129086|
> > > NK1||ROE^MARIE^^^^|SPO||(216)123-
> 4567||EC|||||||||||||||||||||||||||
> > > PV1||O|168 ~219~C~PMA^^^^^^^^^||||277^ALLEN
> > MYLASTNAME^BONNIE^^^^||||||||||
> > ||2688684|||||||||||||||||||||||||199912271408||||||002376853
> > >
> > >
> > > In the example above, there are four segments defined: MSH (message
> > header?), PID (patient identification), NK1 (next of kin?) and PV1.
> > >
> > > The data in each segment is parsed according to a specific set of
> > rules defined in a "table" and without this table there's no way to
> > label the parsed attributes. The first line in the example above says
> > that the message conforms to version 2.5 tables (versions from 2.2 to
> > 2.6 are visible in the wild). The version 2.5 table indicates how the
> > message should be parsed, e.g. the PID segment, which happens to
> > include subfields like lastname and firstname ("DOE" and "JOHN"
> > respectively). Without that table, there's no way to know how to
> label
> > the parsed attributes.
> > >
> > > So whilst HL7 is row oriented, it does not appear to be CSV: line
> > lengths are irregular, and one needs to refer to tables to
> effectively
> > parse the content.
> > >
> > > Do we still want to include this use case? If so, we will need
> > assistance in completing it, including illustrative examples that fit
> > the narrative and a good understanding of how those examples actually
> > work so that they can be adequately described.
>
Received on Friday, 6 June 2014 11:20:01 UTC