Re: (should be added to the use case document...) Re: Example of CSV in Hebrew from Ivan Herman on 2014-03-31 (public-csv-wg@w3.org from March 2014)

From: Ivan Herman <ivan@w3.org>
Date: Mon, 31 Mar 2014 11:48:27 +0200
To: Andy Seaborne <andy@apache.org>
Cc: W3C CSV on the Web Working Group <public-csv-wg@w3.org>
Message-Id: <664915A1-F1B5-49B3-8B5A-7972FF46CF80@w3.org>

On 31 Mar 2014, at 11:39 , Andy Seaborne <andy@apache.org> wrote:

> On 31/03/14 08:33, Ivan Herman wrote:
>> 
>> On 31 Mar 2014, at 01:24 , Yakov Shafranovich <yakov-ietf@shaftek.org> wrote:
>> 
>>> see:
>>> 
>>> http://data.gov.il/data?title=&category=All&type=All&ministry=All&file_type=csv
>>> 
>>> Looks like the columns are going in reverse order
>>> 
>> 
>> This may be an interesting use case to investigate a bit further (beyond this), because:
>> 
>> - I am not sure what encoding is used. If I download a file (I tried[1]) and read into iWork Number or simply look at it in a text editor, I get gibberish, although programs on Macs do usually handle UTF-8 natively, afaik. The question is, then, how does one find out what encoding is used. Note that "curl --head" on [1] does not reveal any more information. Yakov, I presume you succeeded to get it in Hebrew, how did you get the right results?
> 
> Content-type is application/octet-stream.
> 
> When I set the character set to ISO-8859-8 (Hebrew), it displays in Firefox.  LibreOffice can read it if I tell it that it's ISO-8859-8

Ah, indeed. So the question is where this encoding is to be specified. I guess something to be pushed, at the minimum, into the metadata of a file... (and also in the return HTTP header, but that does not help if the file is downloaded...)

> 
> Does the locale of the client can affect the displayed column order?

In my screen editor it does not.

(Interestingly, I did not find a way to specify the encoding in iWorks Number:-(

Ivan

> 
> 	Andy
> 
>> - The JSON file is also published alongside the CSV files ([2]). Some notes on that one:
>>   - the structure is very much what one would expect (each row a separate object)
>>   - the Hebrew text is now correctly in Hebrew
>>   - the column names are in English (that may be the case in the CSV file, but I could not read it)
>>   - all records are collected into one big Array labeled as "Mishmorah" (I do not know what that means), but there is no "row number" in the individual objects for rows. I presume using an array is a more natural way of preserving the order of the rows... Is it something we should take into account for our JSON conversion?
>> 
>> Definitely something to be added to the use case list I believe. Thanks Yakov!
>> 
>> Ivan
>> 
>>> Yakov
>>> 
>> 
>> [1] http://www.justice.gov.il/MojHeb/DataGov/Custody/Custody_Court_Decisions_2006-2010.csv
>> [2] http://www.justice.gov.il/MojHeb/DataGov/Custody/Custody_Court_Decisions_2006-2010.json
>> 
>> ----
>> Ivan Herman, W3C
>> Digital Publishing Activity Lead
>> Home: http://www.w3.org/People/Ivan/
>> mobile: +31-641044153
>> GPG: 0x343F1A3D
>> FOAF: http://www.ivan-herman.net/foaf


----
Ivan Herman, W3C 
Digital Publishing Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
GPG: 0x343F1A3D
FOAF: http://www.ivan-herman.net/foaf

Received on Monday, 31 March 2014 09:48:58 UTC