- From: Francois Yergeau <FYergeau@alis.com>
- Date: Wed, 12 Mar 2003 09:00:46 -0500
- To: www-dom@w3.org
- Cc: w3c-i18n-ig@w3.org
Hello DOM WG The i18n WG has reviewed the recently published DOM 3 Core and Load&Save working drafts: http://www.w3.org/TR/2003/WD-DOM-Level-3-Core-20030226/ http://www.w3.org/TR/2003/WD-DOM-Level-3-LS-20030226/ We understand that this is still very much work in progress, but thought that submitting comments now could be productive. In particular, many of our comments can serve as hints as to where the specs need clarification, while others raise potential issues that do not appear to be in your issues lists and therefore may not be on your radar. We plan to re-review the specs when they reach Last Call, but it is probably better to raise issues we know about now instead of later when the spec is considered done. -- François Yergeau for the i18n WG DOM 3 Core ========== C1) Document interface, "actualEncoding" and "encoding" attributes: it is not quite clear what these two are, especially how they differ. Also, the effect, if any, of setting them (they are not readonly) is not clear. Same issues with interface Entity. C2) Document interface, "standalone" attribute: this is said to match the [standalone] property of the infoset, but is boolean whereas the infoset property can have 3 values: "yes", "no" and "no value". Either the datatype should be changed or (my preference) it should be specified that this is true when the infoset says "yes", false otherwise. C3) Document interface, "strictErrorChecking" attribute: shouldn't this be a parameter of DOMConfiguration? C4) Document interface, "version" attribute: presumably, this controls the error checking that is done on names in e.g. createAttribute(), which raises INVALID_CHARACTER_ERR if the specified name contains an illegal character. Which rules (1.0 or 1.1) apply if "version" is null? Shouldn't "version" default to "1.0" if not specified? C5) Document interface, "version" attribute: what happens if version is set from "1.1" to "1.0" and the document already contains names that are not legal in 1.0? Is this controlled by "strictErrorChecking"? C6) Document interface, "adoptNode()" method: what happens if a 1.0 document adopts a node containing names not legal in 1.0 (e.g. from a 1.1 document)? By analogy with createAttribute() and friends, this should throw an INVALID_CHARACTER_ERR exception. Same comment for importNode(); Ah! I see that importNode() does throw INVALID_CHARACTER_ERR in that case. C7) Document interface, "createAttribute()" method and several others: should specify that the rules that decide whether an INVALID_CHARACTER_ERR exception is thrown depend on the "version" attribute. Same comment for Document.setAttribute() and Document.setAttributeNS(). C8) Document interface, "createCDATASection()" method: what happens if the "data" argument contains the string "]]>"? Is this controlled by "strictErrorChecking"? Impacts Load&Save. C9) Document interface, "createComment()" method: what happens if the "data" argument contains the string "--"? Is this controlled by "strictErrorChecking"? Impacts Load&Save. C10) Document interface, "createTextNode()" method: what happens if the "data" argument contains characters not allowed by the XML Char production? Is this controlled by "strictErrorChecking"? Impacts Load&Save. Same question for setting Node.textContent. C11) Document interface, "normalizeDocument()" method: doesn't mention character-normalization. C12) Node interface, "normalize()" method: this should also perform character normalization, perhaps conditional to the config of the containing Document. C13) CharacterData interface: are the various methods supposed to maintain normalization? Under the control of the config of the containing Document? Of "strictErrorChecking"? C14) Attr interface, last paragraph before Note before IDL definition contains the term "character entity reference", which is not defined anywhere. This whole para is pretty unclear, one comes out not knowing what the value of an attribute is supposed to be or not to be. C15) Attr interface, "value" attribute: what happens if the attribute contains a reference to an entity for which no definition is available? Same question for getAttribute() and getAttributeNS() in the Element interface. C16) DOMLocator interface, "offset" attribute: there should be two attributes, one for byte offset and the other for character offset (or alternatively another attribute that says whether "offset" is byte or character), since the application may not be able to determine if the source was bytes or characters. C17) Substitute IRI for URI throughout. C18) DOMConfiguration interface, "cdata-sections" and "entities" parameters: it doesn't make sense to default to keeping CDATA sections but not entity references. The former are mere syntactic sugar with no structural role (hint: they do not exist in the infoset) while the latter are part of the physical structure of XML documents. At least change "cdata-sections" default to false. C19) DOMConfiguration interface, "normalize-characters" parameter: it is not quite clear what exactly this setting does and when. Change "Perform the W3C Text Normalization of the characters [CharModel] in the document." to "The characters in the document are fully-normalized according to the rules defined in [CharModel] supplemented by the definitions of relevant constructs from Section 2.13 of [XML1.1]." This reflects both a change of terminology in CharModel and the necessity of taking into account the relevant constructs defined in XML 1.1 (as per the provisions of CharModel). Since Charmodel says that text SHOULD be normalized, the default for this should be true, the user having the chance to set it to false after careful consideration of the consequences (see definition of SHOULD in RFC2119). C20) Entity interface: the 4th paragraph starts "XML does not mandate that a non-validating XML processor read and process entity declarations made in the external subset or declared in external parameter entities. " The last occurrence of "external" is superfluous and somewhat misleading, since non-validating processors are not obligated to read even *internal* parameter entities. C21) The references to Unicode 2.0 and ISO/IEC 10646 need to be updated. Both are obsolete and unavailable. There is no apparent reason not to use current versions or, better, version-less references (see Charmod section 9). DOM 3 Load&Save =============== LS1) The "schemaType" arg of DOMImplementationLS.createDOMBuilder() specifies an "absolute URI representing the type of the schema language used during the load of a Document". That URI is used solely for matching (à la XML namespace), not for resolving, and should be an absolute *IRI*. The identity matching rules (e.g. character for character, %e9 == %E9 or not, etc.) should be specified. This also applies to the "schema-type" parameter of DOMConfiguration in DOM 3 Core. LS2) The effect of the "certified" parameter of DOMConfiguration in DOMBuilder is not clearly defined. Its interaction with the "normalize-characters" parameter defined in Core should be clarified. Actually, "certified" should be a property of DOMInputSource, not of DOMBuilder. It is in fact a source that can be certified (or not), not a parser. And certification may be different for a main document and for the external entities it pulls in during parse. LS3) It should be specified clearly somewhere what "normalize-characters=true" in the config of a DOMBuilder means: non-certified input will be verified for full-normalization and the load will fail with an error if it is not. The default value of "normalize-characters" must be true in DOMBuilder, at least when loading XML 1.1 documents, in order to satisfy the prescriptions of [XML1.1] and [CharModel]. In particular, the DocumentLS.load() and loadXML() methods automatically do the wrong thing and have no way to do the right thing if the default is false. LS4) The "unknown-characters" parameter of DOMConfiguration in DOMBuilder is correctly designed but poorly named. Suggest "ignore-unknown-denormalizations". Same remark for same-named parameter in DOMWriter. LS5) Substitute IRI for URI throughout. LS6) In interface DOMInputSource, the role of the "publicId" attribute is not clear at all. It is not mentionned in the paragraph above the IDL definition that describes how the source of input is determined (nor is the "stringData" attribute mentionned there). The role of the "encoding" attribute is mentionned in too many places. LS7) In the discussion of interface DOMWriter (above the IDL definition), it would be nice if character references were specified to be hexadecimal (preferred) or decimal. One way or the other determined by the spec, not implementation-dependent. Similarly (still within DOMWriter), it would be better to specify serialization of attribute values to be always in quotes (or apostrophes, you choose), with escaping as necessary. Requiring serializers to examine the value and choose quotes or apostrophes based on content seems like useless work. LS8) In the paragraph (still within DOMWriter) discussing the effect of "normalize-characters", change "...is W3C Text normalized according to the rules defined in [CharModel]." to "...is fully-normalized according to the rules defined in [CharModel] supplemented by the definitions of relevant constructs from Section 2.13 of [XML1.1]." This reflects both a change of terminology in CharModel and the necessity of taking into account the relevant constructs defined in XML 1.1 (as per the provisions of CharModel). LS9) In the description of "encoding" in DOMWriter, it is said that encoding info can be gleaned from e.g. "actualEncoding" from the Document. What about "encoding" from the Document? What if both are set, which wins? LS10) It would be nice to be more specific about what happens when "encoding" is either "UTF-16" or "UTF-32". The implementation has to choose between big-endian and little-endian; the DOM spec could say which or say "implementation-dependent" explicitly. Then, for UTF-16, the implementation can choose to output a BOM and no encoding declaration (or a declaration that says "UTF-16"), or to output no BOM but an encoding declaration that says either "UTF-16BE" or "UTF-16LE". We have no specific recommendation to make at this point, but think that the spec should specify more precisely what is supposed to happen. LS11) In DOMWriter, there should be a way to specify the version of XML under which serialization is performed. While it seems possible to set the Document.version attribute, this has the side effect of changing the DOM in memory and more seriously is not practical when serializing other than the whole document. LS12) Unless the Core guarantees this never happens (cf. C5 above), it needs to be specified what happens when a node containing names (e.g. element names) legal only in XML 1.1 is serialized using 1.0 rules: DOMException of type INVALID_CHARACTER_ERR? Error event sent to errorHandler? If the latter, details? LS13) The asymetry between DOMBuilder and DOMWriter is bothersome. Why isn't there a DOMOutputSink to paralle DOMInputSource? Why isn't there a DOMWriter.writeURI() to parallel DOMBuilder.parseURI()? Saving to an HTTP (with PUT), FTP or mailto URI appears to make a lot of sense. LS14) The references to Unicode 2.0 and ISO/IEC 10646 need to be updated. Both are obsolete and unavailable. There is no apparent reason not to use current versions or, better, version-less references (see Charmod section 9).
Received on Wednesday, 12 March 2003 09:00:56 UTC