W3C home > Mailing lists > Public > public-csv-wg@w3.org > June 2015

Re: i18n-ISSUE-473: Can RDF data be generated from non UTF-8 encoded CSV data

From: <cowan@ccil.org>
Date: Wed, 10 Jun 2015 10:14:52 -0400
Message-ID: <c0a358d3bef7d2b2ccf134f7d25c42fe.squirrel@www.ccil.org>
To: "Anne van Kesteren" <annevk@annevk.nl>
Cc: "Jeni Tennison" <jeni@jenitennison.com>, "www-international@w3.org" <www-international@w3.org>, public-csv-wg@w3.org, "Steven Atkin" <atkin@us.ibm.com>
Anne van Kesteren scripsit:

> I'm just wondering what the expected benefit of this normalization is.
> I'm not aware of any legacy encoding producing non-NFC code points.

Transcoding Windows-1258 and other legacy Vietnamese encodings code point
by code point may not produce properly normalized results.  These encodings
typically express vowels using a base character (which may include a
circumflex, breve, or horn) followed by a combining character representing
the tone.  To get NFC, these must be combined into a single character.
I don't know whether commonly available transcoders do this, but the
question should be explored.

-- 
John Cowan          http://www.ccil.org/~cowan        cowan@ccil.org
Schlingt dreifach einen Kreis vom dies!
Schliesst euer Aug vor heiliger Schau,
Denn er genoss vom Honig-Tau,
Und trank die Milch vom Paradies.
Received on Wednesday, 10 June 2015 14:15:29 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 10 June 2015 14:15:30 UTC