Re: Some comments on the UCR document

On 26 March 2014 12:31, Ivan Herman <ivan@w3.org> wrote:
> Hi guys,
>
> now that the draft UCR document is almost published, I took some time this morning to make a more thorough reading through it. I found a number of minor issues, 99% editorial; I list them below. None of these are really serious (ie, no reason to bother for the publication of tomorrow), but you may want to take care of those for the next release.
>
> As an overall comment, though, I am a little bit bothered by one thing: the very anglo-saxon oriented nature of the use cases.

(aside: I find this a peculiar use of the phrase 'anglo-saxon', though
it seems to be popular in France as a euphemism for UK+USA)

> I would like to be sure that these use cases do not hide additional issues that may come up when using CSV in different other cultures. I could not put my finger to this, but I did ask myself questions like: we are talking about column headers, but we refer to that as the first column from the left; what happens with CSV files produced in arabic, hebrew, and other right-to-left writing systems? How do they do this in practice? Another issue is whether, for some writing systems that use vertical writing, is the role of the rows and the columns naturally transposed? We should remember that there are languages (as opposed to Chinese or Japanese) where vertical writing is THE writing mode, it is not an option like in CJK languages (e.g., Mongolian). I realize that these languages are in a strong minority, but nevertheless... Also, the "," character is not part of Arabic or CJK languages; the character that looks like a comma is actually a different code point. Do they use "," nevertheless?

This is a good point.

I've just talked to a Arabic native speaker colleague, and we started
looking for CSV files in the Web.

He says that for a CSV file that was predominantly written in Arabic,
his intuition would be that you'd count columns with the 1st on the
right.

Also noted that some web sites use the arabic letter for 'r', instead
of comma sometimes:

ر

Here's a multi-script, multilingual CSV (utf-8),
https://raw.githubusercontent.com/flangofas/generic-find-replace/master/translations.csv
... let's try to find some more.

Dan

Received on Wednesday, 26 March 2014 13:02:39 UTC