Re: i18n-ISSUE-473: Can RDF data be generated from non UTF-8 encoded CSV data from Jeni Tennison on 2015-06-10 (public-csv-wg@w3.org from June 2015)

From: Jeni Tennison <jeni@jenitennison.com>
Date: Wed, 10 Jun 2015 10:24:38 +0100
To: Anne van Kesteren <annevk@annevk.nl>
Cc: "www-international@w3.org" <www-international@w3.org>, public-csv-wg@w3.org, Steven Atkin <atkin@us.ibm.com>
Message-ID: <etPan.55780256.5cc2a7b5.ed@jenit.local>

Hi Anne,

On 10 June 2015 at 09:52:19, Anne van Kesteren (annevk@annevk.nl) wrote:
> On Wed, Jun 10, 2015 at 10:45 AM, Jeni Tennison wrote:
> > 5. Read the file using the encoding, as specified in [encoding], using the replacement  
> > error mode.
>  
> So far so good. Though have you defined how to obtain the encoding
> from a string?

As currently defined, the encoding is specified through a flag (see http://w3c.github.io/csvw/syntax/#dfn-encoding) and must be one of the values specified in the encoding spec. It is either set explicitly in the JSON metadata document supplied for the CSV file or through the charset in the Content-Type header. Otherwise it defaults to utf-8.

Would you recommend an alternative approach?

> > If the encoding is not a Unicode encoding, use a normalizing transcoder
> > to normalize into Unicode Normal Form C as defined in [UAX15].
>  
> 1) What is a Unicode encoding?

What would you recommend that we say? The comments from Steven on behalf of the I18N WG simply said “not in Unicode”, would that be a better way of framing it than “not a Unicode encoding”?

> 2) What encodings would be affected by this?

Are you asking us to list the encodings that aren’t Unicode encodings, in the spec?

Please suggest improved wording where it’s not right; you’re the experts. We’re also very happy to receive pull requests.

Thanks,

Jeni
-- 
Jeni Tennison
http://www.jenitennison.com/

Received on Wednesday, 10 June 2015 09:25:06 UTC