Re: DOMString Character Encoding [was RE: C++ binding] from David Brownell on 2002-02-18 (www-dom@w3.org from January to March 2002)

From: David Brownell <david-b@pacbell.net>
Date: Mon, 18 Feb 2002 13:52:43 -0800
To: "Allen, Michael B (RSCH)" <Michael_B_Allen@ml.com>
Cc: www-dom@w3c.org
Message-id: <16c301c1b8c6$9886efe0$6800000a@brownell.org>

> Sure, but I don't think you have to define what the DOMString *character
> encoding* is. DOMString could just be the standard string type for that
> language. In C this would be a pointer to 'char' (The encoding of the string
> object this pointer points to is the locale dependent character enocoding such
> as ISO-8859-5 or UTF-8 but my point is this shouldn't matter.

But it _does_ matter whether the representation supports all
XML characters.  &#x30a1; is not representable in 8859-5, but
it is representable in XML -- and in UTF-8 or UTF-16.

If you're saying that different environments have different ways
to handle wide such variability (like <wchar.h> etc in C), sure;
but if you're saying that it's OK to assume a single restrictive
locale and encoding, you've got a problem on your hands.  In
XML processing you can't make the simplifying "only strings
in this system's locale will ever appear" assumption.


> Making it UTF-16 (big endian, little endian, w/wo BOM?) unnecessarily
> constrains the implementation. I know first hand it creates a significant barrier
> for C. It requires that the implementation provide all the usual string
> manipulation functions. 

Well, yes.  Not that I do much C++ hacking at any more, but
in what sense could an API be portable if no code using it
could be portable?  And if it couldn't actually represent ALL
the data that has to go through the API?  Seems to me the
barrier you're talking about is a widely recognized gap in
older C/C++ environments:  poor I18N support.  That gap
is one reason that Java caught on so well for XML.

- Dave

Received on Monday, 18 February 2002 16:54:27 UTC