Updates to the use-case document from Tandy, Jeremy on 2014-05-26 (public-csv-wg@w3.org from May 2014)

From: Tandy, Jeremy <jeremy.tandy@metoffice.gov.uk>
Date: Mon, 26 May 2014 13:49:22 +0000
To: "public-csv-wg@w3.org" <public-csv-wg@w3.org>, Eric Stephan <ericphb@gmail.com>, Davide Ceolin <davide.ceolin@gmail.com>
Message-ID: <2624871D9A05174691BD59F8EFD68AE2088420FB@EXXCMPD1DAG3.cmpd1.metoffice.gov.uk>
Yesterday I updated the biodiversity / GBIF / Darwin Core Archive use case <http://w3c.github.io/csvw/use-cases-and-requirements/#UC-PublicationOfBiodiversityInformation> & am awaiting comments.

Today I have updated the RTL use case <http://w3c.github.io/csvw/use-cases-and-requirements/#UC-SupportingRightToLeftDirectionality>; cleaning up the text and example data files / images for the Arabic example. I decided to remove the Hebrew example as the web-page which was referenced provided different content to the CSV file, so it was impossible to make a comparison between the two. I had a hunt around on the Israeli Gov web site for relevant resources, but my lack of Hebrew meant that I drew a blank. That said, I think the Arabic example provides sufficient illustration. Comments please - especially Yakov who was the original contributor. 

... and apologies to Eric for deleting some of your work in getting rid of the Hebrew example :-(

Regarding the health informatics use case (HL7) <http://w3c.github.io/csvw/use-cases-and-requirements/#UC-HealthLevelSevenHL7>, further information from Eric Prud'hommeaux indicates that HL7 might be more than we can (or want to) cope with. See an excerpt from his email [1] where you'll see an example included. From what I can see, this is _NOT_ regular tabular data. OK, so the "microsyntax" in each field is complicated but it can be worked out, but the real issue to me is that the rows are not uniform - they have different numbers of fields. Furthermore, it appears that the data is parsed according to a specific set of rules defined in a "table" and without this table there's no way to label the parsed attributes.

I propose that we review this in more detail to see if we should include this use case. Personally, I don't think it adds anything - except to illustrate that row-oriented data can be more complicated than our tabular data model! I propose to drop this use case.

Finally, I note that JeniT suggested (during our teleconf, 14-May) that she would add an additional use case based around ONS data to help underpin the data model. Is there any progress on this?

Other than that, there's still work to do on the Requirements and I feel like we should review the email lists since FPWD to make sure nothing relating to use cases has fallen through the net.

Jeremy 

---

[1] Email from Eric Prud'hommeaux, 21-May

[a potential narrative for the use case ...]
John Doe is being transferred from a one clinic to another to recieve specialied care. The machine-readable transfer documentation includes his name, patient ID, his visit to the first clinic, and some information about his next of kin. The visit info (and many other fields) require microparsing on the '^' separator to extract further structured information about, for example, the referring physician.

[on the HL7 data format ...]
> I think you want to give up on this one because the message format is 
> hilariously complex and requires a ton of extra info to parse. For 
> instance, the header in the first line of
> 
> MSH|^~\&|EPIC|EPICADT|SMS|SMSADT|199912271408|CHARRIS|ADT^A04|1817457|
> MSH|D|2.5|
> PID||0493575^^^2^ID 
> PID||1|454721||DOE^JOHN^^^^|DOE^JOHN^^^^|19480203|M||B|254 MYSTREET 
> PID||AVE^^MYTOWN^OH^44123^USA||(216)123-4567|||M|NON|||
> NK1||ROE^MARIE^^^^|SPO||(216)123-4567||EC|||||||||||||||||||||||||||
> PV1||O|168 ~219~C~PMA^^^^^^^^^||||277^ALLEN 
> PV1||O|MYLASTNAME^BONNIE^^^^|||||||||| 
> PV1||O|||2688684|||||||||||||||||||||||||199912271408||||||002376853
> 
> says that the rest must be parsed with V2.5 tables (I think you'll see 2.2 to 2.6 in the wild).  The data is oriented in rows, so I'm not sure how applicable CSV techniques would be. It's also 3 or maybe 4 dimentional ("^~\&" being the declared separators for the fields within fields in this particular document).
> 
> The V2.5 table tells you how to parse the rest of the fields, e.g. the PID field, which happens to include subfields like lastname and firstname ("DOE" and "JOHN" respectively). Without that table, there's no way to know how to label the parsed attributes.
Received on Monday, 26 May 2014 13:49:52 UTC