Re: comments on XHTML Modularization 1.1 from XML Schema WG

XHTML2 WG, here is a proposed reply to the XML Schema WG's comments on  
Modularization 1.1 (for newcomers, known colloquially in the group as  
M12N).

In particular, see the two places marked @@ where I am not sure the answer  
is complete.

Their original comments are at  
http://www.w3.org/XML/Group/2007/02/m12n-of-xhtml.xsd-comments.html

Steven

To: "C. M. Sperberg-McQueen" <cmsmcq@acm.org>, www-html-editor@w3.org
Cc: "Schema IG" <w3c-xml-schema-ig@w3.org>

Dear Michael, and other colleagues,

Thank you for your belated last call comments on XHTML Modularization 1.1.

To return the favour, here is our belated reply :-)/2 (largely caused by  
our rechartering, which happened after your comments arrived).

> 2.1. Charset type
>
>     Charset is defined as a vacuous restriction of xsd:string. That may
>     be the right thing to do, but it seems likely that a better
>     definition can be formulated.
[...]
>     A more ambitous definition might mention all of the values in the
>     IANA type registry, but the result, when examined, is rather long
>     and not really very informative — rather like the registry itself
>     — and it is not included here.

While we agree on the principle of validating as much as possible, we are  
wary of duplicating someone else's list in a specification: we run the  
risk of making the schema brittle, and needing to be regularly updated.

> 2.2. Color type
>
>     Two things seem puzzling in the current definition of Color: (1) it
>     allows any NMTOKEN, rather than just the sixteen well known color
>     names. And (2) while six-digit hexadecimal values are allowed,
>     three-digit values are not allowed. (The description of Color in
>     HTML 4.01 (<URL:[40]http://www.w3.org/TR/html401/types.html#h-6.5>)
>     doesn't actually specify how many digits are to be used for hex
>     color values.)

Three digit hex colour values were introduced in CSS, and are not actually  
a part of HTML; in fact we agree that the HTML definition is a little  
unclear, and only seems to suggest what the correct values are through  
examples. The problem is, with legacy content now on the web, it is  
difficult to say whether colour "#FAB" should be interpreted as "#FFAABB"  
as it is in CSS, or "#000FAB" as would be suggested if you interpret the  
value as "a hexadecimal number" which is what the specification says it  
is. Since the 6 digit version is the only likely interoperable one, we  
prefer to keep it at that. As for the sixteen well-known values, while  
these are defined in the HTML4 specification, many other values are now  
extant and interoperable on the web (and remember that Modularization is  
for a whole family of languages, not just HTML4 derivatives).

> 2.3. ContentType
>
>     Like Charset, this could be defined as a union whose first member(s)
>     recognize well-known values defined by the RFCs or in the IANA
>     registry and whose final type (here xsd:string) takes care of
>     extensibility. It's not clear to me whether the values are in fact
>     limited by the RFC to ASCII characters; if so, xsd:string is a bit
>     too broad.

We are considering this change for a future revision.

> 2.4. Coords type
>
>     Since the possible values of Coords values are so clearly specified
>     in the spec, it seems a shame not to define the type a little more
>     tightly.

This seems like a reasonable suggestion.

> 2.5. FPI type
[...]
>     The pattern is then quite simple:
>
>   <xsd:simpleType name="FPI">
>    <xsd:restriction base="xsd:normalizedString">
>     <xsd:pattern value="&fpi;"/>
>    </xsd:restriction>
>   </xsd:simpleType>

Looks good.

> 2.6. FrameTarget type
>
>     The HTML spec
>     (<URL:[43]http://www.w3.org/TR/html401/types.html#h-6.16>) seems to
>     want a slightly tighter definition of frame target names. Perhaps
>     something like the following should be used.

Good idea

> 2.7. LinkTypes type
>
>     LinkTypes is a good example of a type with what is sometimes called
>     a ‘semi-open’ list of values. Some set of well-known values is
>     defined, which software is encouraged to recognize and which authors
>     are encouraged to use when appropriate, but for strict validity, a
>     much larger set of values is allowed.
>
>     In such cases, it's good practice to document the recognized types
>     in the type definition. Since the well known values here are case
>     insensitive, that's best done with a list of patterns rather than
>     with an enumeration:

Frankly this looks rather like overkill to us. These values are intended  
only to be an initial set, and many more to be used, so we don't really  
see the value-add of including these few in the schema (especially since  
it is not really readable).

> 2.8. Tightening other types

In general we agree that closed sets of values should be more tightly  
defined; we are not so enamoured of defining values of open sets, since  
there is no validation win.

> 2.9. Named model groups vs. substitution groups
>
>     We reiterate our advice of four years ago: the definition of the
>     XHTML vocabulary would be easier to follow, and it would be easier
>     to extend it, if the schema documents used substitution groups
>     wherever feasible.
>
>     If you have had specific problems applying substitution groups to
>     XHTML, we would very much like to know what they were; we can
>     speculate, but would prefer to hear from you.

The people who produced the schema felt that the approach used here to be  
the most consistent with Modularization in general, and the one most  
likely to work. In particular we believe that a given element can only  
appear in a single subsitution group, and if that is true, then it doesn't  
work for us (though we are happy to be educated on this if we have got it  
wrong).

@@ Anything more to say here?

> 2.10. Adding attributes
>
>     It's not clear that the way modules add attributes works. For
>     example, the client side image map module adds attributes to the img
>     element. All well and good, but looking at the schema I see an
>     attribute group defined:
>
>    <!-- modify img attribute definition list -->
>       <xs:attributeGroup name="xhtml.img.csim.attlist">
>           <xs:attribute name="usemap" type="xs:IDREF"/>
>       </xs:attributeGroup>
>
>     I can't see where this actually is used anywhere in the schema. I
>     think what the module should be doing is a redefine of the groups.

The extension mechanisms get used in the 'drivers' which define a language  
on the basis of the modules. There is no driver supplied with  
modularization; you need to look at a particular language's use of  
Modularization to see these in use.

> 2.11. A missing scenario
>
>     One important scenario that seems to be missing is just plonking
>     bits of the XHTML namespace into specific places in some other
>     namespace. Maybe its too obvious/easy, but it is actually the most
>     common scenario. e.g. MyOwnLanguage has its own things, and I'll
>     just put some XHTML inline elements here.
>
>     Introducing XHTML elements into the xsd:documentation elements in a
>     schema document is another instance of the scenario.

@@ is this missing? What about integration sets?

> 3.1. Make the introduction less DTD-specific

This should be much better now.

> 3.2. The term PCDATA

fixed.

> 3.5. Shape type
>
>     Shouldn't the overview in section 4.3 say that Shape has just the
>     four values rect, circle, ply, and default?

Yes, it should.

> 3.6. White space in the document source

Thanks. We will do a clean up prior to publication.

> 4.1. Testing the schema documents
[...]
>     [Later information from Shane McCarron is that this spec doesn't
>     provide a driver, but that
>     <URL:[52]http://www.w3.org/MarkUp/SCHEMA/xhtml11.xsd> might be
>     consulted as an example. To be followed up ...)

Indeed, the Modularization spec doesn't include any drivers. We have added  
an informative link to one.

> 4.2. Where is the html element?

>     Where is the html element defined?

It is in the structure module.

>    (And, for the instruction of those seeking to understand
>     how to use these modules, a pointer to the XHTML 1.1 driver modules
>     would be very useful.

Done.

>     But the issue appears to at least some readers as at least partly
>     substantive: that is, it seems to us that a specification describing
>     a modular definition of the XHTML 1.1 vocabulary ought, in the
>     nature of things, to include a top-level driver module which calls
>     in all the others.

Coming from a group that didn't include a mechanism to specify what the  
root element is, I am shocked!

> 4.3. Case insensitivity and XML Schema patterns or enumerations
[...]
>     Given that many regex libraries already have such flags, such an
>     addition wouldn't seem to be difficult for implementors.
>     Should the XML Schema Working Group consider such a change?

It would make certain declarations easier to write, and make them actually  
readable. :-)

>     And if so, what is to be done about Unicode characters for which the
>     upper/lowercase mapping is not 1:1? And what should be done about
>     title case?

Ha! You're asking the wrong people...

Best wishes,

Steven Pemberton

Received on Wednesday, 11 July 2007 13:30:19 UTC