RE: Issue raised in health informatics use case (HL7 messages) from Tandy, Jeremy on 2014-06-06 (public-csv-wg@w3.org from June 2014)

From: Tandy, Jeremy <jeremy.tandy@metoffice.gov.uk>
Date: Fri, 6 Jun 2014 14:47:06 +0000
To: James McKinney <james@opennorth.ca>
CC: "public-csv-wg@w3.org" <public-csv-wg@w3.org>
Message-ID: <2624871D9A05174691BD59F8EFD68AE20884576D@EXXCMPD1DAG3.cmpd1.metoffice.gov.uk>
Many thanks for the feedback. Jeremy

From: James McKinney [mailto:james@opennorth.ca]
Sent: 06 June 2014 14:19
To: Tandy, Jeremy
Cc: public-csv-wg@w3.org
Subject: Re: Issue raised in health informatics use case (HL7 messages)

Sounds good, thanks Jeremy!

On Jun 6, 2014, at 7:19 AM, Tandy, Jeremy <jeremy.tandy@metoffice.gov.uk<mailto:jeremy.tandy@metoffice.gov.uk>> wrote:


James et al, I've updated the use case document to remove the HL7 use case and add a general note to the use case section indicating that we're only focusing on examples that are tabular, not only row oriented, data. HL7 format is explicitly mentioned there. See below:

"""
The use cases below describe many applications of tabular data. Whilst there are many different variations of tabular data, all the examples conform to the definition of tabular data defined in the Model for Tabular Data and Metadata on the Web:

Tabular data is data that is structured into rows, each of which contains information about some thing. Each row contains the same number of fields (although some of these fields may be empty), which provide values of properties of the thing described by the row. In tabular data, fields within the same column provide values for the same property of the thing described by the particular row.

In selecting the use cases we have reviewed a number of row oriented data formats that, at first glance, appear to be tabular data. However, closer inspection indicates that one or other of the characteristics of tabular data were not present. For example, the HL7 format, from the health informatics domain defines a separate schema for each row (known as a "segment" in that format) which means that HL7 messages do not have a regular number of columns for each row.
"""

Best regards, Jeremy


-----Original Message-----
From: Tandy, Jeremy [mailto:jeremy.tandy@metoffice.gov.uk]
Sent: 06 June 2014 09:44
To: James McKinney
Cc: public-csv-wg@w3.org<mailto:public-csv-wg@w3.org>; Eric Prud'hommeaux
Subject: RE: Issue raised in health informatics use case (HL7 messages)

Hi James - thanks for your quick reply.

"""each row [segment] has its own schema (identified by the first three
characters in the row)"""

... is a much better way to express what I was trying to say!

"""Since the WG defines CSVs as tabular data in which all rows have the
same schema, HL7 messages are not CSV."""

... based on this assessment I will remove the use case from the
document - although I will leave a note in place indicating that we
have considered other row-oriented formats such as HL7 but that these
aren't CSV.

"""I know many CSVs that are incomprehensible without knowledge
external to the CSV file itself: notably, CSVs without a header row. I
think those tabular files are nonetheless CSV."""

... you're right again! In fact, we deal with these by assigning a
numerical index identifier to each column and are defining how one
would use the supplementary metadata to express the semantics for each
column. What I was thinking of when I wrote the email last night was
that parsing the microsyntax in each field is challenging without
reference to the tables. This is challenge is compounded because the
microsyntax in a given column may not be consistent; as you say, each
row (or segment) has its own schema. Of course, we could just leave the
entire field as a string literal (e.g. "|254 MYSTREET
AVE^^MYTOWN^OH^44123^USA|") and leave the detail parsing to some
external agent. But the real problem with HL7 is that the formatting of
each row is not consistent within a given file.

Many thanks, Jeremy


-----Original Message-----
From: James McKinney [mailto:james@opennorth.ca]
Sent: 06 June 2014 00:24
To: Tandy, Jeremy
Cc: public-csv-wg@w3.org<mailto:public-csv-wg@w3.org>; Eric Prud'hommeaux
Subject: Re: Issue raised in health informatics use case (HL7
messages)

Having now read a bit more about HL7, each row has its own schema
(identified by the first three characters in the row), and rows may
therefore have variable numbers of columns. Since the WG defines CSVs
as tabular data in which all rows have the same schema, HL7 messages
are not CSV.

On the other hand, I'm not sure that the point "one needs to refer to
tables to effectively parse the content" means that HL7 messages are
not CSVs. I know many CSVs that are incomprehensible without
knowledge

external to the CSV file itself: notably, CSVs without a header row.
I

think those tabular files are nonetheless CSV.

James

On Jun 5, 2014, at 7:00 PM, Tandy, Jeremy
<jeremy.tandy@metoffice.gov.uk<mailto:jeremy.tandy@metoffice.gov.uk>> wrote:


All - I have added an issue to Use Case #20 - Health Level Seven
(HL7) Messages [1]; the full details of the issue are at [2] ... but
the gist of it is that, from what I understand, HL7 messages aren't
really CSV. Whilst HL7 is row oriented, the line lengths are
irregular, and one needs to refer to tables to effectively parse the
content.


Now, my understanding of this format is very rudimentary, so I
might

have got it wrong!


The key question is whether we still want to include this use case?

If so, we will need assistance in completing it, including
illustrative examples that fit the narrative and a good understanding
of how those examples actually work so that they can be adequately
described.


In particular, I wonder if James McKinney, the original contributor
of this use case [3], can respond with his thoughts and, if we are
going to proceed, help to complete the use case.


We are fast approaching another Public Working Draft (PWD), so I
anticipate publishing in the current state with the incomplete use
case and issue in place. It would be excellent to have resolved
everything for the subsequent PWD.


Best regards, Jeremy

[1] http://w3c.github.io/csvw/use-cases-and-requirements/#UC-
HealthLevelSevenHL7

[2] (text below)
[3] http://lists.w3.org/Archives/Public/public-csv-wg-
comments/2014Apr/0000.html


ISSUE:

This use case is currently incomplete and does not (yet) follow the
narrative style displayed elsewhere in this document.


A suitable narrative might be:

"John Doe is being transferred from a one clinic to another to
recieve specialied care. The machine-readable transfer documentation
includes his name, patient ID, his visit to the first clinic, and
some

information about his next of kin. The visit info (and many other
fields) require microparsing on the '^' separator to extract further
structured information about, for example, the referring physician."


However, further information from Eric Prud'hommeaux indicates that
HL7 might be more than we can (or want to) cope with. HL7 messages do
not appear to be regular tabular data. OK, so the "microsyntax" in
each field is complicated (making the data 3- or 4-dimensional, with
"^~\&"

being the declared separators for the "fields within fields" in the
example below) but it can be worked out, but the real issue is that
the rows are not uniform - they have different numbers of fields ...



MSH|^~\&|EPIC|EPICADT|SMS|SMSADT|199912271408|CHARRIS|ADT^A04|1817457|

MSH|D
|2.5|

PID||0493575^^^2^ID
1|454721||DOE^JOHN^^^^|DOE^JOHN^^^^|19480203|M||B|254 MYSTREET
AVE^^MYTOWN^OH^44123^USA||(216)123-4567|||M|NON|400003403~1129086|

NK1||ROE^MARIE^^^^|SPO||(216)123-
4567||EC|||||||||||||||||||||||||||

PV1||O|168 ~219~C~PMA^^^^^^^^^||||277^ALLEN
MYLASTNAME^BONNIE^^^^||||||||||
||2688684|||||||||||||||||||||||||199912271408||||||002376853



In the example above, there are four segments defined: MSH (message
header?), PID (patient identification), NK1 (next of kin?) and PV1.


The data in each segment is parsed according to a specific set of
rules defined in a "table" and without this table there's no way to
label the parsed attributes. The first line in the example above says
that the message conforms to version 2.5 tables (versions from 2.2 to
2.6 are visible in the wild). The version 2.5 table indicates how the
message should be parsed, e.g. the PID segment, which happens to
include subfields like lastname and firstname ("DOE" and "JOHN"
respectively). Without that table, there's no way to know how to
label

the parsed attributes.


So whilst HL7 is row oriented, it does not appear to be CSV: line
lengths are irregular, and one needs to refer to tables to
effectively

parse the content.


Do we still want to include this use case? If so, we will need
assistance in completing it, including illustrative examples that fit
the narrative and a good understanding of how those examples actually
work so that they can be adequately described.
Received on Friday, 6 June 2014 14:47:37 UTC