Re: allow UTF-16 not just UTF-8 (PR#6774) from Henri Sivonen on 2003-10-16 (www-html@w3.org from October 2003)

From: Henri Sivonen <hsivonen@iki.fi>
Date: Thu, 16 Oct 2003 13:23:36 +0300
To: www-html@w3.org
Cc: voyager-issues@mn.aptest.com, w3c-html-wg@w3.org
Message-Id: <CD402B8F-FFC2-11D7-BE86-003065B8CF0E@iki.fi>

On Thursday, Oct 16, 2003, at 01:20 Europe/Helsinki, don@lexmark.com  
wrote:

> The real problem is that the entire XML architecture was designed  
> assuming
> high end boxes like the 3 GHz Pentium with 512 megabytes of memory.

Lesser devices can host expat. However, if a device can't host expat,  
perhaps it would be better to use something other than XML to  
communicate with the device.

> We have already seen push back in other standards groups that consumer
> electronic devices and other smaller, lighter devices cannot afford  
> all the
> luxuries demand by an obese XML architecture.  Unless the XML community
> accepts subsetting, we can't expect the broadest support for XML to  
> happen
> at the low end until the price/performance ratios experience another  
> order
> or two magnitude improvement.

If you subset XML, is support for the subset support for XML?

What's the point of building a language on application-specific  
almost-XML? A Language built on such almost-XML breaks expectations  
(either in software or in the minds of people who need to deal with the  
language). If you can't use tools that are based on the assumption that  
the data they process is *exactly* XML and the programmers' knowledge  
about XML isn't guaranteed to apply, wouldn't it be less confusing to  
invent another grammar entirely and not call it XML?

A well-defined extended subset of XML (for example: UTF-8 only,  
normalization form C only, no doctype, no PIs, no CDATA sections, no  
epilog, all HTML character entities predefined, namespace processing  
mandatory) would be more useful that having specs layered on top of XML  
1.0 trying to readjust what XML 1.0 is.

XHTML-Print printers get data over HTTP which is over TCP. It would be  
ludicrous to tweak the TCP header format in the XHTML-Print spec.

> I know I will lose this argument in the W3C but the realities of the
> XHTML-Print implementations will blow off UTF-16 as more fat with no
> benefit and simply not support it, "interoperable" or not.

Converting UTF-16 to UTF-8 really isn't a big deal. It's basically a  
matter of shifting bits.

Considering eliminating fat, I'd much rather eliminate character  
entities[1] and references to the external DTD subset[2]. Character  
entities are a burden in any case. They require either processing the  
external DTD subset (bad for execution speed and memory requirements)  
or implementing an extra feature which doesn't belong in an XML  
processor (bad for conformance and yet redundant since there are  
conforming ways of representing characters).

[1]  
http://hades.mn.aptest.com/cgi-bin/voyager-issues/XHTML- 
Print?id=6776;user=guest
[2]  
http://hades.mn.aptest.com/cgi-bin/voyager-issues/XHTML- 
Print?id=6773;user=guest

-- 
Henri Sivonen
hsivonen@iki.fi
http://www.iki.fi/hsivonen/

Received on Thursday, 16 October 2003 06:23:40 UTC