- From: Tandy, Jeremy <jeremy.tandy@metoffice.gov.uk>
- Date: Fri, 6 Jun 2014 08:43:49 +0000
- To: James McKinney <james@opennorth.ca>
- CC: "public-csv-wg@w3.org" <public-csv-wg@w3.org>, Eric Prud'hommeaux <eric@w3.org>
Hi James - thanks for your quick reply. """each row [segment] has its own schema (identified by the first three characters in the row)""" ... is a much better way to express what I was trying to say! """Since the WG defines CSVs as tabular data in which all rows have the same schema, HL7 messages are not CSV.""" ... based on this assessment I will remove the use case from the document - although I will leave a note in place indicating that we have considered other row-oriented formats such as HL7 but that these aren't CSV. """I know many CSVs that are incomprehensible without knowledge external to the CSV file itself: notably, CSVs without a header row. I think those tabular files are nonetheless CSV.""" ... you're right again! In fact, we deal with these by assigning a numerical index identifier to each column and are defining how one would use the supplementary metadata to express the semantics for each column. What I was thinking of when I wrote the email last night was that parsing the microsyntax in each field is challenging without reference to the tables. This is challenge is compounded because the microsyntax in a given column may not be consistent; as you say, each row (or segment) has its own schema. Of course, we could just leave the entire field as a string literal (e.g. "|254 MYSTREET AVE^^MYTOWN^OH^44123^USA|") and leave the detail parsing to some external agent. But the real problem with HL7 is that the formatting of each row is not consistent within a given file. Many thanks, Jeremy > -----Original Message----- > From: James McKinney [mailto:james@opennorth.ca] > Sent: 06 June 2014 00:24 > To: Tandy, Jeremy > Cc: public-csv-wg@w3.org; Eric Prud'hommeaux > Subject: Re: Issue raised in health informatics use case (HL7 messages) > > Having now read a bit more about HL7, each row has its own schema > (identified by the first three characters in the row), and rows may > therefore have variable numbers of columns. Since the WG defines CSVs > as tabular data in which all rows have the same schema, HL7 messages > are not CSV. > > On the other hand, I'm not sure that the point "one needs to refer to > tables to effectively parse the content" means that HL7 messages are > not CSVs. I know many CSVs that are incomprehensible without knowledge > external to the CSV file itself: notably, CSVs without a header row. I > think those tabular files are nonetheless CSV. > > James > > On Jun 5, 2014, at 7:00 PM, Tandy, Jeremy > <jeremy.tandy@metoffice.gov.uk> wrote: > > > All - I have added an issue to Use Case #20 - Health Level Seven > (HL7) Messages [1]; the full details of the issue are at [2] ... but > the gist of it is that, from what I understand, HL7 messages aren't > really CSV. Whilst HL7 is row oriented, the line lengths are irregular, > and one needs to refer to tables to effectively parse the content. > > > > Now, my understanding of this format is very rudimentary, so I might > have got it wrong! > > > > The key question is whether we still want to include this use case? > > > > If so, we will need assistance in completing it, including > illustrative examples that fit the narrative and a good understanding > of how those examples actually work so that they can be adequately > described. > > > > In particular, I wonder if James McKinney, the original contributor > of this use case [3], can respond with his thoughts and, if we are > going to proceed, help to complete the use case. > > > > We are fast approaching another Public Working Draft (PWD), so I > anticipate publishing in the current state with the incomplete use case > and issue in place. It would be excellent to have resolved everything > for the subsequent PWD. > > > > Best regards, Jeremy > > > > [1] http://w3c.github.io/csvw/use-cases-and-requirements/#UC- > HealthLevelSevenHL7 > > [2] (text below) > > [3] http://lists.w3.org/Archives/Public/public-csv-wg- > comments/2014Apr/0000.html > > > > ISSUE: > > > > This use case is currently incomplete and does not (yet) follow the > narrative style displayed elsewhere in this document. > > > > A suitable narrative might be: > > > > "John Doe is being transferred from a one clinic to another to > recieve specialied care. The machine-readable transfer documentation > includes his name, patient ID, his visit to the first clinic, and some > information about his next of kin. The visit info (and many other > fields) require microparsing on the '^' separator to extract further > structured information about, for example, the referring physician." > > > > However, further information from Eric Prud'hommeaux indicates that > HL7 might be more than we can (or want to) cope with. HL7 messages do > not appear to be regular tabular data. OK, so the "microsyntax" in each > field is complicated (making the data 3- or 4-dimensional, with "^~\&" > being the declared separators for the "fields within fields" in the > example below) but it can be worked out, but the real issue is that the > rows are not uniform - they have different numbers of fields ... > > > > > MSH|^~\&|EPIC|EPICADT|SMS|SMSADT|199912271408|CHARRIS|ADT^A04|1817457|D > |2.5| > > PID||0493575^^^2^ID > 1|454721||DOE^JOHN^^^^|DOE^JOHN^^^^|19480203|M||B|254 MYSTREET > AVE^^MYTOWN^OH^44123^USA||(216)123-4567|||M|NON|400003403~1129086| > > NK1||ROE^MARIE^^^^|SPO||(216)123-4567||EC||||||||||||||||||||||||||| > > PV1||O|168 ~219~C~PMA^^^^^^^^^||||277^ALLEN > MYLASTNAME^BONNIE^^^^|||||||||| > ||2688684|||||||||||||||||||||||||199912271408||||||002376853 > > > > > > In the example above, there are four segments defined: MSH (message > header?), PID (patient identification), NK1 (next of kin?) and PV1. > > > > The data in each segment is parsed according to a specific set of > rules defined in a "table" and without this table there's no way to > label the parsed attributes. The first line in the example above says > that the message conforms to version 2.5 tables (versions from 2.2 to > 2.6 are visible in the wild). The version 2.5 table indicates how the > message should be parsed, e.g. the PID segment, which happens to > include subfields like lastname and firstname ("DOE" and "JOHN" > respectively). Without that table, there's no way to know how to label > the parsed attributes. > > > > So whilst HL7 is row oriented, it does not appear to be CSV: line > lengths are irregular, and one needs to refer to tables to effectively > parse the content. > > > > Do we still want to include this use case? If so, we will need > assistance in completing it, including illustrative examples that fit > the narrative and a good understanding of how those examples actually > work so that they can be adequately described.
Received on Friday, 6 June 2014 08:44:21 UTC