Re: Updates to the use-case document from Ivan Herman on 2014-05-26 (public-csv-wg@w3.org from May 2014)

From: Ivan Herman <ivan@w3.org>
Date: Mon, 26 May 2014 17:12:47 +0200
To: "Tandy, Jeremy" <jeremy.tandy@metoffice.gov.uk>
Cc: W3C CSV on the Web Working Group <public-csv-wg@w3.org>, Eric Stephan <ericphb@gmail.com>, Davide Ceolin <davide.ceolin@gmail.com>, Jeni Tennison <jeni@jenitennison.com>, Yakov Shafranovich <yakov-ietf@shaftek.org>
Message-Id: <FBFC4F0E-E165-4C7C-AF6B-B95801B314DE@w3.org>
On 26 May 2014, at 16:37 , Tandy, Jeremy <jeremy.tandy@metoffice.gov.uk> wrote:

[skip]

>>> 
>>> Today I have updated the RTL use case <http://w3c.github.io/csvw/use-
>> cases-and-requirements/#UC-SupportingRightToLeftDirectionality>;
>> cleaning up the text and example data files / images for the Arabic
>> example. I decided to remove the Hebrew example as the web-page which
>> was referenced provided different content to the CSV file, so it was
>> impossible to make a comparison between the two. I had a hunt around on
>> the Israeli Gov web site for relevant resources, but my lack of Hebrew
>> meant that I drew a blank. That said, I think the Arabic example
>> provides sufficient illustration. Comments please - especially Yakov
>> who was the original contributor.
>>> 
>>> ... and apologies to Eric for deleting some of your work in getting
>>> rid of the Hebrew example :-(
>>> 
>> 
>> I am not sure the following remark is correct: "In contrast, over the
>> wire and in non-Unicode-aware text editors" (right after the example
>> picture for the Egyptian election result). If the text editor was not
>> Unicode aware, then the arabic characters would not be displayed
>> correctly...
>> 
>> The text editor will reflect what comes on the wire. In this case the
>> wire seems to be unintuitive for a RTL person, because it comes in the
>> 'wrong' order, so to say, ie, it does not come in a 'logical' order.
> 
> I've amended the use case to try to follow your suggestion. Hopefully I've got the right idea.
> 
> FWIW, I copied the "over the wire and in non-Unicode-aware text editors" comment from Jeni's model document :-)

Oops, I missed that one in that document!

> 
>> 
>> That being said this is a good news. In contrast to what I was worried
>> before, the example shows that, in the _logical_ sense, the left-to-
>> right internal representation is fine, ie, the '0'-th field in a row is
>> the (row) header, the '1'-st field is the next field, etc. Eg., for a
>> JSON generation, the logical way of generating a row would be to simply
>> follow the cells from the left-to-right order, ie, there may not be a
>> necessity to take care of some sort of an inverse ordering of the
>> field.
>> 
>> Are we sure that all CSV files for arabic and hebrew will indeed be
>> encoded this way? Is it possible that some of the CSV files will do it
>> the other way round, ie, the '0'-th field is the 'last' field in the
>> row, etc? I do not have the pointer to the hebrew CSV files that you
>> had in a previous version, may be worth checking. We do have a problem
>> if CSV do not follow the same order every time!
>> 
> 
> Sadly I have no idea ... I'm not a I18N expert. Given that I couldn't find an official web-rendering of the Hebrew example, I had nothing to compare the serialisation order against.

The problem is that this is not even an I18N expertise issue (ie, our I18N persons cannot answer this) but rather a matter of what the practice is out there...

Yakov, do you think you can help?

> 
>> 
>>> Regarding the health informatics use case (HL7)
>> <http://w3c.github.io/csvw/use-cases-and-requirements/#UC-
>> HealthLevelSevenHL7>, further information from Eric Prud'hommeaux
>> indicates that HL7 might be more than we can (or want to) cope with.
>> See an excerpt from his email [1] where you'll see an example included.
>> From what I can see, this is _NOT_ regular tabular data. OK, so the
>> "microsyntax" in each field is complicated but it can be worked out,
>> but the real issue to me is that the rows are not uniform - they have
>> different numbers of fields. Furthermore, it appears that the data is
>> parsed according to a specific set of rules defined in a "table" and
>> without this table there's no way to label the parsed attributes.
>>> 
>>> I propose that we review this in more detail to see if we should
>> include this use case. Personally, I don't think it adds anything -
>> except to illustrate that row-oriented data can be more complicated
>> than our tabular data model! I propose to drop this use case.
>>> 
>> 
>> ... or keep it as to illustrate exactly what you just said: a warning
>> to the reader that row-oriented data does not necessarily mean CSV!
>> (Either way is fine with me, I would go with the flow.)
>> 
>> Ivan
> 
> Personally, I think the effort to establish a "proper" action oriented, narrative style use case is not inconsiderable ... we don't have any examples at the moment &, taking my experience with DwC-A, to construct the use case properly means understanding the data format to a reasonable level. At this time, I simply don't have the capacity to follow through on this. Seems like nugatory work to me; best avoided.
> 

I do not have a problem with that. Let us nuke it! :-)

Ivan

> Jeremy
> 
>> 
>> 
>>> Finally, I note that JeniT suggested (during our teleconf, 14-May)
>> that she would add an additional use case based around ONS data to help
>> underpin the data model. Is there any progress on this?
>>> 
>>> Other than that, there's still work to do on the Requirements and I
>> feel like we should review the email lists since FPWD to make sure
>> nothing relating to use cases has fallen through the net.
>>> 
>>> Jeremy
>>> 
>>> ---
>>> 
>>> [1] Email from Eric Prud'hommeaux, 21-May
>>> 
>>> [a potential narrative for the use case ...] John Doe is being
>>> transferred from a one clinic to another to recieve specialied care.
>> The machine-readable transfer documentation includes his name, patient
>> ID, his visit to the first clinic, and some information about his next
>> of kin. The visit info (and many other fields) require microparsing on
>> the '^' separator to extract further structured information about, for
>> example, the referring physician.
>>> 
>>> [on the HL7 data format ...]
>>>> I think you want to give up on this one because the message format
>> is
>>>> hilariously complex and requires a ton of extra info to parse. For
>>>> instance, the header in the first line of
>>>> 
>>>> 
>> MSH|^~\&|EPIC|EPICADT|SMS|SMSADT|199912271408|CHARRIS|ADT^A04|1817457
>>>> MSH||
>>>> MSH|D|2.5|
>>>> PID||0493575^^^2^ID
>>>> PID||1|454721||DOE^JOHN^^^^|DOE^JOHN^^^^|19480203|M||B|254 MYSTREET
>>>> PID||AVE^^MYTOWN^OH^44123^USA||(216)123-4567|||M|NON|||
>>>> NK1||ROE^MARIE^^^^|SPO||(216)123-4567||EC|||||||||||||||||||||||||||
>>>> PV1||O|168 ~219~C~PMA^^^^^^^^^||||277^ALLEN
>>>> PV1||O|MYLASTNAME^BONNIE^^^^||||||||||
>>>> PV1||O|||2688684|||||||||||||||||||||||||199912271408||||||002376853
>>>> 
>>>> says that the rest must be parsed with V2.5 tables (I think you'll
>> see 2.2 to 2.6 in the wild).  The data is oriented in rows, so I'm not
>> sure how applicable CSV techniques would be. It's also 3 or maybe 4
>> dimentional ("^~\&" being the declared separators for the fields within
>> fields in this particular document).
>>>> 
>>>> The V2.5 table tells you how to parse the rest of the fields, e.g.
>> the PID field, which happens to include subfields like lastname and
>> firstname ("DOE" and "JOHN" respectively). Without that table, there's
>> no way to know how to label the parsed attributes.
>>> 
>> 
>> 
>> ----
>> Ivan Herman, W3C
>> Digital Publishing Activity Lead
>> Home: http://www.w3.org/People/Ivan/
>> mobile: +31-641044153
>> GPG: 0x343F1A3D
>> WebID: http://www.ivan-herman.net/foaf#me


----
Ivan Herman, W3C 
Digital Publishing Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
GPG: 0x343F1A3D
WebID: http://www.ivan-herman.net/foaf#me
Received on Monday, 26 May 2014 15:13:29 UTC