RE: Updates to the use-case document from Tandy, Jeremy on 2014-05-26 (public-csv-wg@w3.org from May 2014)

From: Tandy, Jeremy <jeremy.tandy@metoffice.gov.uk>
Date: Mon, 26 May 2014 16:45:46 +0000
To: Ivan Herman <ivan@w3.org>
CC: W3C CSV on the Web Working Group <public-csv-wg@w3.org>, Eric Stephan <ericphb@gmail.com>, Davide Ceolin <davide.ceolin@gmail.com>, Jeni Tennison <jeni@jenitennison.com>, Yakov Shafranovich <yakov-ietf@shaftek.org>
Message-ID: <2624871D9A05174691BD59F8EFD68AE208842216@EXXCMPD1DAG3.cmpd1.metoffice.gov.uk>
> -----Original Message-----
> From: Ivan Herman [mailto:ivan@w3.org]
> Sent: 26 May 2014 16:13
> To: Tandy, Jeremy
> Cc: W3C CSV on the Web Working Group; Eric Stephan; Davide Ceolin; Jeni
> Tennison; Yakov Shafranovich
> Subject: Re: Updates to the use-case document
> 
> 
> On 26 May 2014, at 16:37 , Tandy, Jeremy
> <jeremy.tandy@metoffice.gov.uk> wrote:
> 
> [skip]
> 
> >>>
> >>> Today I have updated the RTL use case
> >>> <http://w3c.github.io/csvw/use-
> >> cases-and-requirements/#UC-SupportingRightToLeftDirectionality>;
> >> cleaning up the text and example data files / images for the Arabic
> >> example. I decided to remove the Hebrew example as the web-page
> which
> >> was referenced provided different content to the CSV file, so it was
> >> impossible to make a comparison between the two. I had a hunt around
> >> on the Israeli Gov web site for relevant resources, but my lack of
> >> Hebrew meant that I drew a blank. That said, I think the Arabic
> >> example provides sufficient illustration. Comments please -
> >> especially Yakov who was the original contributor.
> >>>
> >>> ... and apologies to Eric for deleting some of your work in getting
> >>> rid of the Hebrew example :-(
> >>>
> >>
> >> I am not sure the following remark is correct: "In contrast, over
> the
> >> wire and in non-Unicode-aware text editors" (right after the example
> >> picture for the Egyptian election result). If the text editor was
> not
> >> Unicode aware, then the arabic characters would not be displayed
> >> correctly...
> >>
> >> The text editor will reflect what comes on the wire. In this case
> the
> >> wire seems to be unintuitive for a RTL person, because it comes in
> >> the 'wrong' order, so to say, ie, it does not come in a 'logical'
> order.
> >
> > I've amended the use case to try to follow your suggestion. Hopefully
> I've got the right idea.
> >
> > FWIW, I copied the "over the wire and in non-Unicode-aware text
> > editors" comment from Jeni's model document :-)
> 
> Oops, I missed that one in that document!
> 
> >
> >>
> >> That being said this is a good news. In contrast to what I was
> >> worried before, the example shows that, in the _logical_ sense, the
> >> left-to- right internal representation is fine, ie, the '0'-th field
> >> in a row is the (row) header, the '1'-st field is the next field,
> >> etc. Eg., for a JSON generation, the logical way of generating a row
> >> would be to simply follow the cells from the left-to-right order,
> ie,
> >> there may not be a necessity to take care of some sort of an inverse
> >> ordering of the field.
> >>
> >> Are we sure that all CSV files for arabic and hebrew will indeed be
> >> encoded this way? Is it possible that some of the CSV files will do
> >> it the other way round, ie, the '0'-th field is the 'last' field in
> >> the row, etc? I do not have the pointer to the hebrew CSV files that
> >> you had in a previous version, may be worth checking. We do have a
> >> problem if CSV do not follow the same order every time!
> >>
> >
> > Sadly I have no idea ... I'm not a I18N expert. Given that I couldn't
> find an official web-rendering of the Hebrew example, I had nothing to
> compare the serialisation order against.
> 
> The problem is that this is not even an I18N expertise issue (ie, our
> I18N persons cannot answer this) but rather a matter of what the
> practice is out there...
> 
> Yakov, do you think you can help?
> 
> >
> >>
> >>> Regarding the health informatics use case (HL7)
> >> <http://w3c.github.io/csvw/use-cases-and-requirements/#UC-
> >> HealthLevelSevenHL7>, further information from Eric Prud'hommeaux
> >> indicates that HL7 might be more than we can (or want to) cope with.
> >> See an excerpt from his email [1] where you'll see an example
> included.
> >> From what I can see, this is _NOT_ regular tabular data. OK, so the
> >> "microsyntax" in each field is complicated but it can be worked out,
> >> but the real issue to me is that the rows are not uniform - they
> have
> >> different numbers of fields. Furthermore, it appears that the data
> is
> >> parsed according to a specific set of rules defined in a "table" and
> >> without this table there's no way to label the parsed attributes.
> >>>
> >>> I propose that we review this in more detail to see if we should
> >> include this use case. Personally, I don't think it adds anything -
> >> except to illustrate that row-oriented data can be more complicated
> >> than our tabular data model! I propose to drop this use case.
> >>>
> >>
> >> ... or keep it as to illustrate exactly what you just said: a
> warning
> >> to the reader that row-oriented data does not necessarily mean CSV!
> >> (Either way is fine with me, I would go with the flow.)
> >>
> >> Ivan
> >
> > Personally, I think the effort to establish a "proper" action
> oriented, narrative style use case is not inconsiderable ... we don't
> have any examples at the moment &, taking my experience with DwC-A, to
> construct the use case properly means understanding the data format to
> a reasonable level. At this time, I simply don't have the capacity to
> follow through on this. Seems like nugatory work to me; best avoided.
> >
> 
> I do not have a problem with that. Let us nuke it! :-)

Done! Or at least agreed between you and I. Given that I will not make the call this week, I'll ask JeniT/DanBri to add a "do we all agree to remove the HL7 UC from the doc" item. Jeremy

> 
> Ivan
> 
> > Jeremy
> >
> >>
> >>
> >>> Finally, I note that JeniT suggested (during our teleconf, 14-May)
> >> that she would add an additional use case based around ONS data to
> >> help underpin the data model. Is there any progress on this?
> >>>
> >>> Other than that, there's still work to do on the Requirements and I
> >> feel like we should review the email lists since FPWD to make sure
> >> nothing relating to use cases has fallen through the net.
> >>>
> >>> Jeremy
> >>>
> >>> ---
> >>>
> >>> [1] Email from Eric Prud'hommeaux, 21-May
> >>>
> >>> [a potential narrative for the use case ...] John Doe is being
> >>> transferred from a one clinic to another to recieve specialied
> care.
> >> The machine-readable transfer documentation includes his name,
> >> patient ID, his visit to the first clinic, and some information
> about
> >> his next of kin. The visit info (and many other fields) require
> >> microparsing on the '^' separator to extract further structured
> >> information about, for example, the referring physician.
> >>>
> >>> [on the HL7 data format ...]
> >>>> I think you want to give up on this one because the message format
> >> is
> >>>> hilariously complex and requires a ton of extra info to parse. For
> >>>> instance, the header in the first line of
> >>>>
> >>>>
> >>
> MSH|^~\&|EPIC|EPICADT|SMS|SMSADT|199912271408|CHARRIS|ADT^A04|1817457
> >>>> MSH||
> >>>> MSH|D|2.5|
> >>>> PID||0493575^^^2^ID
> >>>> PID||1|454721||DOE^JOHN^^^^|DOE^JOHN^^^^|19480203|M||B|254
> MYSTREET
> >>>> PID||AVE^^MYTOWN^OH^44123^USA||(216)123-4567|||M|NON|||
> >>>> NK1||ROE^MARIE^^^^|SPO||(216)123-
> 4567||EC||||||||||||||||||||||||||
> >>>> NK1|||
> >>>> PV1||O|168 ~219~C~PMA^^^^^^^^^||||277^ALLEN
> >>>> PV1||O|MYLASTNAME^BONNIE^^^^||||||||||
> >>>>
> PV1||O|||2688684|||||||||||||||||||||||||199912271408||||||00237685
> >>>> PV1||O|||2688684|||||||||||||||||||||||||199912271408||||||3
> >>>>
> >>>> says that the rest must be parsed with V2.5 tables (I think you'll
> >> see 2.2 to 2.6 in the wild).  The data is oriented in rows, so I'm
> >> not sure how applicable CSV techniques would be. It's also 3 or
> maybe
> >> 4 dimentional ("^~\&" being the declared separators for the fields
> >> within fields in this particular document).
> >>>>
> >>>> The V2.5 table tells you how to parse the rest of the fields, e.g.
> >> the PID field, which happens to include subfields like lastname and
> >> firstname ("DOE" and "JOHN" respectively). Without that table,
> >> there's no way to know how to label the parsed attributes.
> >>>
> >>
> >>
> >> ----
> >> Ivan Herman, W3C
> >> Digital Publishing Activity Lead
> >> Home: http://www.w3.org/People/Ivan/
> >> mobile: +31-641044153
> >> GPG: 0x343F1A3D
> >> WebID: http://www.ivan-herman.net/foaf#me
> 
> 
> ----
> Ivan Herman, W3C
> Digital Publishing Activity Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> GPG: 0x343F1A3D
> WebID: http://www.ivan-herman.net/foaf#me
> 
> 
> 
>
Received on Monday, 26 May 2014 16:46:18 UTC