Re: i18n-ISSUE-473: Can RDF data be generated from non UTF-8 encoded CSV data

Anne van Kesteren scripsit:

> I'm just wondering what the expected benefit of this normalization is.
> I'm not aware of any legacy encoding producing non-NFC code points.

Transcoding Windows-1258 and other legacy Vietnamese encodings code point
by code point may not produce properly normalized results.  These encodings
typically express vowels using a base character (which may include a
circumflex, breve, or horn) followed by a combining character representing
the tone.  To get NFC, these must be combined into a single character.
I don't know whether commonly available transcoders do this, but the
question should be explored.

-- 
John Cowan          http://www.ccil.org/~cowan        cowan@ccil.org
Schlingt dreifach einen Kreis vom dies!
Schliesst euer Aug vor heiliger Schau,
Denn er genoss vom Honig-Tau,
Und trank die Milch vom Paradies.

Received on Wednesday, 10 June 2015 14:15:29 UTC