Re: Charsets, encodings, http-request, unescape-markup, and convenience, oh my! from Norman Walsh on 2011-10-10 (public-xml-processing-model-comments@w3.org from October 2011)

From: Norman Walsh <ndw@nwalsh.com>
Date: Mon, 10 Oct 2011 08:57:27 -0400
To: public-xml-processing-model-comments@w3.org
Message-ID: <m2listovgo.fsf@nwalsh.com>

"vojtech.toman@emc.com" <vojtech.toman@emc.com> writes:
>> 2. If the
>> charset parameter isn't known/specified, we default to...ISO Latin 1,
>> or
>>    whatever the Internet tells us the default is for text/* documents
>> that don't
>>    specify a charset.
>
> I think so. You already get this behavior when you read text data,
> except that the applied default charset is not available anywhere in
> the constructed c:body.

I was doing some "totally off the reservation" hacking this morning
and I think we have to be a little more careful about the wording.
Consider application/json for example, even if the charset isn't
specified, the charset is always UTF-8.

I still think we should allow implementations to guess/know/infer the
encoding if it isn't specified, but we have to be a little careful.

                                        Be seeing you,
                                          norm

-- 
Norman Walsh
Lead Engineer
MarkLogic Corporation
Phone: +1 413 624 6676
www.marklogic.com

Received on Monday, 10 October 2011 12:58:04 UTC