Re: allow US-ASCII as well as UTF-8 for mobileOK documents? from Sean Owen on 2006-08-17 (public-bpwg-comments@w3.org from July to September 2006)

From: Sean Owen <srowen@google.com>
Date: Thu, 17 Aug 2006 10:21:39 -0400
To: "Dan Connolly" <connolly@w3.org>
Cc: public-bpwg-comments@w3.org
Message-ID: <e920a71c0608170721r15eb5eccj9da23029aa20f3d5@mail.google.com>

Dan we discussed character encoding and US-ASCII at our weekly call.
One concern was that several Japanese phones would not recognize the
US-ASCII encoding when called "US-ASCII" and would default to
Shift_JIS instead of UTF-8. The same may be true of other handsets.

I think the position of the group would be that a document encoded as
US-ASCII is also encoded as UTF-8 and so should be declared as "UTF-8"
instead. We'd not want to tell people to assume that phones know what
US-ASCII is, as it seems there is enough lack of support to puncture
that assumption.

Regards,
Sean

On 7/28/06, Sean Owen <srowen@google.com> wrote:
> Fair point, I'll bring it up to the group. I personally am not sure
> how common US-ASCII-encoded pages are, and haven't seen one in recent
> memory. We assume a capability profile that includes UTF-8 support,
> and as you say a valid US-ASCII-encoding of text is also a valid UTF-8
> encoding of the same text, so one could label US-ASCII documents as
> UTF-8.
>
> We haven't assumed UTF-16 support in the Default Device Context, and
> the tests generally assume that capability profile, so the test does
> intend to verify that content can be received in UTF-8 encoding.
>
> Agreed about the fragment IDs. This is an artifact of how the document
> is generated, but, can probably be fixed in an upcoming draft.
>
> Thanks for this valuable input,
> Sean
>
> On 7/27/06, Dan Connolly <connolly@w3.org> wrote:
> > I see:
> >
> > "If the request response does specify a character encoding but it is not
> > "UTF-8", FAIL"
> >  -- http://www.w3.org/TR/mobileOK/#id4485785
> >
> > How about US-ASCII? especially since you can treat US-ASCII
> > as UTF-8 and preserve the meaning of the bytes.
> >
> > It's perhaps not worthwhile to complicate things, if very
> > few documents are labelled US-ASCII.
> >
> > p.s. I wonder if it's acceptable to limit encodings to UTF-8
> > and exclude UTF-16; it wasn't when XML was ratified.
> > But I'll leave it to those who have 1st-hand experience
> > with the need for UTF-16 to comment on that.
> >
> > p.p.s. The fragid #id4485785 seems fragile. If you're
> > going to break it, break it only once, for the next draft.
> > At that point, change it to something like #char-encoding-support
> > and keep it that way for future revisions.
> >
> >
> > --
> > Dan Connolly, W3C http://www.w3.org/People/Connolly/
> > D3C2 887B 0F92 6005 C541  0875 0F91 96DE 6E52 C29E
>

Received on Thursday, 17 August 2006 14:22:02 UTC