- From: Allen, Michael B (RSCH) <Michael_B_Allen@ml.com>
- Date: Sun, 17 Feb 2002 19:15:22 -0500
- To: "'Joseph Kesselman'" <keshlam@us.ibm.com>
- cc: www-dom@w3c.org
> -----Original Message----- > From: Joseph Kesselman [SMTP:keshlam@us.ibm.com] > > > Why does the DOMString type needs to be defined at all? > > The nature of a language binding is that it brings everything down to > language-specific interfaces and types, so that code written against that > binding will compile and run against all instances of that binding. If you > don't nail down DOMString to _something_, you don't have a binding. > Sure, but I don't think you have to define what the DOMString *character encoding* is. DOMString could just be the standard string type for that language. In C this would be a pointer to 'char' (The encoding of the string object this pointer points to is the locale dependent character enocoding such as ISO-8859-5 or UTF-8 but my point is this shouldn't matter). In Java this would be the 'String' object (encoding is UTF-16 non-endian dependent with a BOM but again this doesn't matter). The reason it doesn't matter is because you will never be transferring strings between language bindings at runtime. That's what XML is for. So only the DOM serialization routines (probably to XML) need to worry about details like character encoding. > The solution we've used in the past is for the binding to state what > specific type DOMString is mapped to. > Specifying the type is one thing, but specifying the encoding is another. Making it UTF-16 (big endian, little endian, w/wo BOM?) unnecessarily constrains the implementation. I know first hand it creates a significant barrier for C. It requires that the implementation provide all the usual string manipulation functions. Consider what would happen if the DOMString type were defined as a long and specified the encoding as UCS-4BE? What would the Java language binding look like? > Another solution would be to say that DOMString will itself be a specific > interface in this binding, derived from or wrapped around whatever the > implementation wants to use. > If you mean: interface DOMString { }; then great. And maybe aknowledge that this might also be represented as a simple typedef in some language bindings (just like it would be silly to represent the EventListener interface with anything but a function pointer in C: typedef void (*DOM_EventListener)(DOM_Event *evt);). DOMString could just an object to be used by the standard string manipulation functions of the language. You cannot define the string manipulation functions too... > There's nothing inherently wrong with providing a DOM-like API that > supports only 8-bit characters in a specific encoding, if you know that's > all your customers are going to want... but if you do so, be sure to > clearly document that divergence and don't claim to be a fully compliant > DOM implementation. You'll benefit from folks not having to relearn the API > architecture, but you'll lose interoperability with other systems... and > interoperability is part of the point of the DOM. > Then I think we'll have a lot of "DOM-like" implementations of the DOM laying round (DOMC being one of them). I don't see how interoperabilty would be lost if the actual character encoding of the DOMString type were defined in the language binding. And in most languages there is an obvious choice for strings if you have a choice at all. Mike
Received on Sunday, 17 February 2002 19:15:39 UTC