Re: Some comments on the UCR document

On 26 Mar 2014, at 14:02 , Dan Brickley <danbri@google.com> wrote:

> On 26 March 2014 12:31, Ivan Herman <ivan@w3.org> wrote:
>> Hi guys,
>> 
>> now that the draft UCR document is almost published, I took some time this morning to make a more thorough reading through it. I found a number of minor issues, 99% editorial; I list them below. None of these are really serious (ie, no reason to bother for the publication of tomorrow), but you may want to take care of those for the next release.
>> 
>> As an overall comment, though, I am a little bit bothered by one thing: the very anglo-saxon oriented nature of the use cases.
> 
> (aside: I find this a peculiar use of the phrase 'anglo-saxon', though
> it seems to be popular in France as a euphemism for UK+USA)
> 
>> I would like to be sure that these use cases do not hide additional issues that may come up when using CSV in different other cultures. I could not put my finger to this, but I did ask myself questions like: we are talking about column headers, but we refer to that as the first column from the left; what happens with CSV files produced in arabic, hebrew, and other right-to-left writing systems? How do they do this in practice? Another issue is whether, for some writing systems that use vertical writing, is the role of the rows and the columns naturally transposed? We should remember that there are languages (as opposed to Chinese or Japanese) where vertical writing is THE writing mode, it is not an option like in CJK languages (e.g., Mongolian). I realize that these languages are in a strong minority, but nevertheless... Also, the "," character is not part of Arabic or CJK languages; the character that looks like a comma is actually a different code point. Do they use "," nevertheless?
> 
> This is a good point.
> 
> I've just talked to a Arabic native speaker colleague, and we started
> looking for CSV files in the Web.
> 
> He says that for a CSV file that was predominantly written in Arabic,
> his intuition would be that you'd count columns with the 1st on the
> right.
> 

That would be my intuition, too.


> Also noted that some web sites use the arabic letter for 'r', instead
> of comma sometimes:
> 
> ر
> 
> Here's a multi-script, multilingual CSV (utf-8),
> https://raw.githubusercontent.com/flangofas/generic-find-replace/master/translations.csv
> ... let's try to find some more.
> 

That actually looks harmless in terms of structure; if UTF is used for CSV (which, I believe, something that will be the case in the RFC if I understood Yakov correctly) then it is fairly standard...

Ivan

> Dan


----
Ivan Herman, W3C 
Digital Publishing Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
GPG: 0x343F1A3D
FOAF: http://www.ivan-herman.net/foaf

Received on Wednesday, 26 March 2014 14:17:32 UTC