- From: Steven Pemberton <steven.pemberton@cwi.nl>
- Date: Wed, 22 Aug 2007 14:20:39 +0200
- To: "C. M. Sperberg-McQueen" <cmsmcq@acm.org>, www-html-editor@w3.org
- Cc: "Schema IG" <w3c-xml-schema-ig@w3.org>
Dear Michael, and other colleagues, Thank you for your belated last call comments on XHTML Modularization 1.1. http://lists.w3.org/Archives/Public/www-html-editor/2007JanMar/0035 To return the favour, here is our belated reply :-)/2 (largely caused by our rechartering, which happened after your comments arrived). 2.1. Charset type Charset is defined as a vacuous restriction of xsd:string. That may be the right thing to do, but it seems likely that a better definition can be formulated. [...] A more ambitous definition might mention all of the values in the IANA type registry, but the result, when examined, is rather long and not really very informative — rather like the registry itself — and it is not included here. While we agree on the principle of validating as much as possible, we are wary of duplicating someone else's list in a specification: we run the risk of making the schema brittle, and needing to be regularly updated. 2.2. Color type Two things seem puzzling in the current definition of Color: (1) it allows any NMTOKEN, rather than just the sixteen well known color names. And (2) while six-digit hexadecimal values are allowed, three-digit values are not allowed. (The description of Color in HTML 4.01 (<URL:[40]http://www.w3.org/TR/html401/types.html#h-6.5>) doesn't actually specify how many digits are to be used for hex color values.) Three digit hex colour values were introduced in CSS, and are not actually a part of HTML; in fact we agree that the HTML definition is a little unclear, and only seems to suggest what the correct values are through examples. The problem is, with legacy content now on the web, it is difficult to say whether colour "#FAB" should be interpreted as "#FFAABB" as it is in CSS, or "#000FAB" as would be suggested if you interpret the value as "a hexadecimal number" which is what the specification says it is. Since the 6 digit version is the only likely interoperable one, we prefer to keep it at that. As for the sixteen well-known values, while these are defined in the HTML4 specification, many other values are now extant and interoperable on the web (and remember that Modularization is for a whole family of languages, not just HTML4 derivatives). 2.3. ContentType Like Charset, this could be defined as a union whose first member(s) recognize well-known values defined by the RFCs or in the IANA registry and whose final type (here xsd:string) takes care of extensibility. It's not clear to me whether the values are in fact limited by the RFC to ASCII characters; if so, xsd:string is a bit too broad. We are considering this change for a future revision. 2.4. Coords type Since the possible values of Coords values are so clearly specified in the spec, it seems a shame not to define the type a little more tightly. This seems like a reasonable suggestion. 2.5. FPI type [...] The pattern is then quite simple: <xsd:simpleType name="FPI"> <xsd:restriction base="xsd:normalizedString"> <xsd:pattern value="&fpi;"/> </xsd:restriction> </xsd:simpleType> Looks good. 2.6. FrameTarget type The HTML spec (<URL:[43]http://www.w3.org/TR/html401/types.html#h-6.16>) seems to want a slightly tighter definition of frame target names. Perhaps something like the following should be used. Good idea 2.7. LinkTypes type LinkTypes is a good example of a type with what is sometimes called a ‘semi-open’ list of values. Some set of well-known values is defined, which software is encouraged to recognize and which authors are encouraged to use when appropriate, but for strict validity, a much larger set of values is allowed. In such cases, it's good practice to document the recognized types in the type definition. Since the well known values here are case insensitive, that's best done with a list of patterns rather than with an enumeration: Frankly this looks rather like overkill to us. These values are intended only to be an initial set, and many more to be used, so we don't really see the value-add of including these few in the schema (especially since it is not really readable). 2.8. Tightening other types In general we agree that closed sets of values should be more tightly defined; we are not so enamoured of defining values of open sets, since there is no validation win. 2.9. Named model groups vs. substitution groups We reiterate our advice of four years ago: the definition of the XHTML vocabulary would be easier to follow, and it would be easier to extend it, if the schema documents used substitution groups wherever feasible. If you have had specific problems applying substitution groups to XHTML, we would very much like to know what they were; we can speculate, but would prefer to hear from you. The people who produced the schema felt that the approach used here to be the most consistent with Modularization in general, and the one most likely to work. However, we take your advice seriously, and would like to adopt this. However, in order to allow modularization to proceed without too much more delay, we will not adopt this (rather drastic) change in this version, but save it for the planned version 2. 2.10. Adding attributes It's not clear that the way modules add attributes works. For example, the client side image map module adds attributes to the img element. All well and good, but looking at the schema I see an attribute group defined: <!-- modify img attribute definition list --> <xs:attributeGroup name="xhtml.img.csim.attlist"> <xs:attribute name="usemap" type="xs:IDREF"/> </xs:attributeGroup> I can't see where this actually is used anywhere in the schema. I think what the module should be doing is a redefine of the groups. The extension mechanisms get used in the 'drivers' which define a language on the basis of the modules. There is no driver supplied with modularization; you need to look at a particular language's use of Modularization to see these in use. 2.11. A missing scenario One important scenario that seems to be missing is just plonking bits of the XHTML namespace into specific places in some other namespace. Maybe its too obvious/easy, but it is actually the most common scenario. e.g. MyOwnLanguage has its own things, and I'll just put some XHTML inline elements here. Introducing XHTML elements into the xsd:documentation elements in a schema document is another instance of the scenario. We have a concept of 'integration sets' which allow this usage. What we will do is add an example to the spec to show how to do this, to make it clearer. 3.1. Make the introduction less DTD-specific This should be much better now. 3.2. The term PCDATA fixed. 3.5. Shape type Shouldn't the overview in section 4.3 say that Shape has just the four values rect, circle, ply, and default? Yes, it should, and will. 3.6. White space in the document source Thanks. We will do a clean up prior to publication. 4.1. Testing the schema documents [...] [Later information from Shane McCarron is that this spec doesn't provide a driver, but that <URL:[52]http://www.w3.org/MarkUp/SCHEMA/xhtml11.xsd> might be consulted as an example. To be followed up ...) Indeed, the Modularization spec doesn't include any drivers. We have added an informative link to one. 4.2. Where is the html element? Where is the html element defined? It is in the structure module. (And, for the instruction of those seeking to understand how to use these modules, a pointer to the XHTML 1.1 driver modules would be very useful. Done. But the issue appears to at least some readers as at least partly substantive: that is, it seems to us that a specification describing a modular definition of the XHTML 1.1 vocabulary ought, in the nature of things, to include a top-level driver module which calls in all the others. Coming from a group that didn't include a mechanism to specify what the root element is, I am shocked! But seriously, this is modularization 1.1, not the modularization of XHTML 1.1. Modularization 1.1 is and will be used by many different languages. (See for instance http://www.w3.org/MarkUp/Group/2007/xhtml-modularization-11-implementation ) 4.3. Case insensitivity and XML Schema patterns or enumerations [...] Given that many regex libraries already have such flags, such an addition wouldn't seem to be difficult for implementors. Should the XML Schema Working Group consider such a change? It would make certain declarations easier to write, and make them actually readable. And if so, what is to be done about Unicode characters for which the upper/lowercase mapping is not 1:1? And what should be done about title case? Ha! You're asking the wrong people... Thanks for the comments. Best wishes, Steven Pemberton For the XHTML2 Working Group
Received on Wednesday, 22 August 2007 12:20:46 UTC