Re: allow UTF-16 not just UTF-8 (PR#6774) from don@lexmark.com on 2003-10-16 (www-html@w3.org from October 2003)

From: <don@lexmark.com>
Date: Thu, 16 Oct 2003 16:40:54 -0400
To: "Steven Pemberton" <steven.pemberton@cwi.nl>
Cc: <don@lexmark.com>, "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com>, <w3c-html-wg@w3.org>, <voyager-issues@mn.aptest.com>, <elliott.bradshaw@zoran.com>, <www-html@w3.org>
Message-ID: <OF68127FE0.858E6178-ON85256DC1.007151EB@lexmark.com>
Steven:

Of course I knew this was jsut the external representation.

I'm trying to reduce conversions and reduce the sizes of buffers, etc.
necessary to do this work.  I have no doubt it can be done, I'm just trying
to do things with smaller less powerful processors and with less available
memory than what programmers normally expect to be available in today's
environment.

*******************************************
Don Wright                 don@lexmark.com

Chair,  IEEE SA Standards Board
Member, IEEE-ISTO Board of Directors
f.wright@ieee.org / f.wright@computer.org

Director, Alliances and Standards
Lexmark International
740 New Circle Rd C14/082-3
Lexington, Ky 40550
859-825-4808 (phone) 603-963-8352 (fax)
*******************************************





"Steven Pemberton" <steven.pemberton@cwi.nl> on 10/16/2003 09:10:59 AM

To:    <don@lexmark.com>
cc:    <don@lexmark.com>, "BIGELOW,JIM \(HP-Boise,ex1\)"
       <jim.bigelow@hp.com>, <w3c-html-wg@w3.org>,
       <voyager-issues@mn.aptest.com>, <elliott.bradshaw@zoran.com>,
       <www-html@w3.org>
Subject:    Re: allow UTF-16 not just UTF-8 (PR#6774)


Don,

I've been wondering for a long time if that was the misunderstanding, but I
was assured it wasn't.

UTF 16 and UTF 8 are *external* representations. The internal amount of
storage needed for them is identical, and completely up to you how you
store.

The only extra memory needed is the couple of dozen extra bytes of code to
convert UTF 16 into whatever internal representation you use.

Best wishes,

Steven



----- Original Message -----
From: <don@lexmark.com>
To: "Steven Pemberton" <steven.pemberton@cwi.nl>
Cc: <don@lexmark.com>; "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com>;
<w3c-html-wg@w3.org>; <voyager-issues@mn.aptest.com>;
<elliott.bradshaw@zoran.com>; <www-html@w3.org>
Sent: Thursday, October 16, 2003 2:51 PM
Subject: Re: allow UTF-16 not just UTF-8 (PR#6774)


>
>
> Steven:
>
> I think your answer proves my point that the XML commmunity did not and
> does not consider the limitations of low cost, constrained embedded
> environments when developing XML.
>
> You make the assertion that no extra memory is required yet the reality
is
> quite the opposite.
>
> Please tell me if I'm wrong, but my understanding of UTF-8 and UTF-16 is
> that:
>
> 1) Every XHTML tag will require twice as many bytes when represented in
> UTF-16 versus UTF-8
> 2) Every English XHTML-Print print job will be twice as big encoded with
> UTF-16 versus UTF-8
> 3) Every "Latin 1" print job will be larger approaching 2X in size.
>
> When you double the data's size, buffers have to double to be able to
hold
> and manipulate an equivalent amount of print stream content.  There is
real
> cost and performance costs to be paid to deal with UTF-16 encoding
> especially when dealing with western character sets.  When a device is
> designed to deal with the far east "characters" there are other penalties
> to be paid in things like the size of the font load that mitigate the
> UTF-16 versus UTF-8 encoding issue.
>
> *******************************************
> Don Wright                 don@lexmark.com
>
> Chair,  IEEE SA Standards Board
> Member, IEEE-ISTO Board of Directors
> f.wright@ieee.org / f.wright@computer.org
>
> Director, Alliances and Standards
> Lexmark International
> 740 New Circle Rd C14/082-3
> Lexington, Ky 40550
> 859-825-4808 (phone) 603-963-8352 (fax)
> *******************************************
>
>
>
>
>
> "Steven Pemberton" <steven.pemberton@cwi.nl> on 10/15/2003 07:26:24 PM
>
> To:    <don@lexmark.com>
> cc:    "BIGELOW,JIM \(HP-Boise,ex1\)" <jim.bigelow@hp.com>,
>        <w3c-html-wg@w3.org>, <don@lexmark.com>,
>        <voyager-issues@mn.aptest.com>, <elliott.bradshaw@zoran.com>,
>        <www-html@w3.org>
> Subject:    Re: allow UTF-16 not just UTF-8 (PR#6774)
>
>
> But support for UTF 16 adds a few dozen bytes of code, and no extra
memory
> requirements. It is simpler than UTF 8! What's the problem?
>
> Steven
>
> ----- Original Message -----
> From: <don@lexmark.com>
> To: "Steven Pemberton" <Steven.Pemberton@cwi.nl>
> Cc: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com>;
> <w3c-html-wg@w3.org>;
> <don@lexmark.com>; <voyager-issues@mn.aptest.com>;
> <elliott.bradshaw@zoran.com>; <www-html@w3.org>
> Sent: Thursday, October 16, 2003 12:20 AM
> Subject: Re: allow UTF-16 not just UTF-8 (PR#6774)
>
>
> >
> > Steven, et al:
> >
> > The real problem is that the entire XML architecture was designed
> assuming
> > high end boxes like the 3 GHz Pentium with 512 megabytes of memory.  We
> > have already seen push back in other standards groups that consumer
> > electronic devices and other smaller, lighter devices cannot afford all
> the
> > luxuries demand by an obese XML architecture.  Unless the XML community
> > accepts subsetting, we can't expect the broadest support for XML to
> happen
> > at the low end until the price/performance ratios experience another
> order
> > or two magnitude improvement.  As recently reported in several of the
> trade
> > magazines focused on IT professionals, the deployment of XML and Web
> > Services are have significant negative impacts on the IT infrastructure
> > especially in the area of bandwidth utilization.  This is just another
> > symptom of the same problem.
> >
> > I know I will lose this argument in the W3C but the realities of the
> > XHTML-Print implementations will blow off UTF-16 as more fat with no
> > benefit and simply not support it, "interoperable" or not.
> >
> > Sorry I'm not pure but practical.
> >
> > *******************************************
> > Don Wright                 don@lexmark.com
> >
> > Chair,  IEEE SA Standards Board
> > Member, IEEE-ISTO Board of Directors
> > f.wright@ieee.org / f.wright@computer.org
> >
> > Director, Alliances and Standards
> > Lexmark International
> > 740 New Circle Rd C14/082-3
> > Lexington, Ky 40550
> > 859-825-4808 (phone) 603-963-8352 (fax)
> > *******************************************
> >
> >
> >
> >
> > "Steven Pemberton" <Steven.Pemberton@cwi.nl> on 10/15/2003 09:18:15 AM
> >
> > To:    "BIGELOW,JIM \(HP-Boise,ex1\)" <jim.bigelow@hp.com>,
> >        <w3c-html-wg@w3.org>, <don@lexmark.com>
> > cc:    <voyager-issues@mn.aptest.com>, <elliott.bradshaw@zoran.com>,
> >        <www-html@w3.org>
> > Subject:    Re: allow UTF-16 not just UTF-8 (PR#6774)
> >
> >
> > > From: don@lexmark.com [mailto:don@lexmark.com]
> >
> > > So let me understand this....
> > >
> > > Because people have poorly designed and written XML applications
> running
> > on
> > > 3 GHz Pentium 4s with 512 megabytes of real memory that do not allow
> the
> > > control over whether UTF-8 or UTF-16 are emitted, we are expecting to
> > burden
> > > $49 printers with code to be able to detect and interpret both.
> >
> > No Don. It is about interoperability and conforming to standards. XML
> > allows
> > documents to be encoded in either UTF8 or UTF 16: consumers must accept
> > both, producers may produce either. An XHTML-Print printer will be just
a
> > consumer of an XML byte-stream at some IP address; we don't want to
> burden
> > every program in the world that can produce XML with a switch that says
> > "this output is going to a poor lowly XHTML Print processor that can't
> deal
> > with UTF-16, so please produce UTF-8", especially since UTF 16 is the
> easy
> > one to implement, and can only cost a few dozen bytes at best.
> >
> > If we changed this, XHTML Print would have to go back to last call, and
> you
> > can bet your boots that the XML community would rise up against us, as
it
> > has in the past, and I can tell you we don't want to go there, and we
> would
> > have a hundred people registering objections.
> >
> > Conforming to XML requirements comes with the territory of being XHTML.
> The
> > XML community will not take lightly to us messing with their standards.
> >
> > Best wishes,
> >
> > Steven Pemberton
> >
> >
> >
> >
> >
> >
> >
>
>
>
>
>
>
>
Received on Thursday, 16 October 2003 16:43:54 UTC