- From: Ivan Herman <ivan@w3.org>
- Date: Wed, 26 Mar 2014 15:16:56 +0100
- To: Dan Brickley <danbri@google.com>
- Cc: "Tandy, Jeremy" <jeremy.tandy@metoffice.gov.uk>, Davide Ceolin <d.ceolin@vu.nl>, Eric Stephan <ericphb@gmail.com>, W3C CSV on the Web Working Group <public-csv-wg@w3.org>
- Message-Id: <051FE970-871B-4D71-9248-C2F8C4B6755B@w3.org>
On 26 Mar 2014, at 14:02 , Dan Brickley <danbri@google.com> wrote: > On 26 March 2014 12:31, Ivan Herman <ivan@w3.org> wrote: >> Hi guys, >> >> now that the draft UCR document is almost published, I took some time this morning to make a more thorough reading through it. I found a number of minor issues, 99% editorial; I list them below. None of these are really serious (ie, no reason to bother for the publication of tomorrow), but you may want to take care of those for the next release. >> >> As an overall comment, though, I am a little bit bothered by one thing: the very anglo-saxon oriented nature of the use cases. > > (aside: I find this a peculiar use of the phrase 'anglo-saxon', though > it seems to be popular in France as a euphemism for UK+USA) > >> I would like to be sure that these use cases do not hide additional issues that may come up when using CSV in different other cultures. I could not put my finger to this, but I did ask myself questions like: we are talking about column headers, but we refer to that as the first column from the left; what happens with CSV files produced in arabic, hebrew, and other right-to-left writing systems? How do they do this in practice? Another issue is whether, for some writing systems that use vertical writing, is the role of the rows and the columns naturally transposed? We should remember that there are languages (as opposed to Chinese or Japanese) where vertical writing is THE writing mode, it is not an option like in CJK languages (e.g., Mongolian). I realize that these languages are in a strong minority, but nevertheless... Also, the "," character is not part of Arabic or CJK languages; the character that looks like a comma is actually a different code point. Do they use "," nevertheless? > > This is a good point. > > I've just talked to a Arabic native speaker colleague, and we started > looking for CSV files in the Web. > > He says that for a CSV file that was predominantly written in Arabic, > his intuition would be that you'd count columns with the 1st on the > right. > That would be my intuition, too. > Also noted that some web sites use the arabic letter for 'r', instead > of comma sometimes: > > ر > > Here's a multi-script, multilingual CSV (utf-8), > https://raw.githubusercontent.com/flangofas/generic-find-replace/master/translations.csv > ... let's try to find some more. > That actually looks harmless in terms of structure; if UTF is used for CSV (which, I believe, something that will be the case in the RFC if I understood Yakov correctly) then it is fairly standard... Ivan > Dan ---- Ivan Herman, W3C Digital Publishing Activity Lead Home: http://www.w3.org/People/Ivan/ mobile: +31-641044153 GPG: 0x343F1A3D FOAF: http://www.ivan-herman.net/foaf
Received on Wednesday, 26 March 2014 14:17:32 UTC