- From: Steven Pemberton <steven.pemberton@cwi.nl>
- Date: Wed, 22 Aug 2007 14:20:39 +0200
- To: "C. M. Sperberg-McQueen" <cmsmcq@acm.org>, www-html-editor@w3.org
- Cc: "Schema IG" <w3c-xml-schema-ig@w3.org>
Dear Michael, and other colleagues,
Thank you for your belated last call comments on XHTML Modularization 1.1.
http://lists.w3.org/Archives/Public/www-html-editor/2007JanMar/0035
To return the favour, here is our belated reply :-)/2 (largely caused by
our rechartering, which happened after your comments arrived).
2.1. Charset type
Charset is defined as a vacuous restriction of xsd:string. That may
be the right thing to do, but it seems likely that a better
definition can be formulated.
[...]
A more ambitous definition might mention all of the values in the
IANA type registry, but the result, when examined, is rather long
and not really very informative — rather like the registry itself
— and it is not included here.
While we agree on the principle of validating as much as possible, we are
wary of duplicating someone else's list in a specification: we run the
risk of making the schema brittle, and needing to be regularly updated.
2.2. Color type
Two things seem puzzling in the current definition of Color: (1) it
allows any NMTOKEN, rather than just the sixteen well known color
names. And (2) while six-digit hexadecimal values are allowed,
three-digit values are not allowed. (The description of Color in
HTML 4.01 (<URL:[40]http://www.w3.org/TR/html401/types.html#h-6.5>)
doesn't actually specify how many digits are to be used for hex
color values.)
Three digit hex colour values were introduced in CSS, and are not actually
a part of HTML; in fact we agree that the HTML definition is a little
unclear, and only seems to suggest what the correct values are through
examples. The problem is, with legacy content now on the web, it is
difficult to say whether colour "#FAB" should be interpreted as "#FFAABB"
as it is in CSS, or "#000FAB" as would be suggested if you interpret the
value as "a hexadecimal number" which is what the specification says it
is. Since the 6 digit version is the only likely interoperable one, we
prefer to keep it at that. As for the sixteen well-known values, while
these are defined in the HTML4 specification, many other values are now
extant and interoperable on the web (and remember that Modularization is
for a whole family of languages, not just HTML4 derivatives).
2.3. ContentType
Like Charset, this could be defined as a union whose first member(s)
recognize well-known values defined by the RFCs or in the IANA
registry and whose final type (here xsd:string) takes care of
extensibility. It's not clear to me whether the values are in fact
limited by the RFC to ASCII characters; if so, xsd:string is a bit
too broad.
We are considering this change for a future revision.
2.4. Coords type
Since the possible values of Coords values are so clearly specified
in the spec, it seems a shame not to define the type a little more
tightly.
This seems like a reasonable suggestion.
2.5. FPI type
[...]
The pattern is then quite simple:
<xsd:simpleType name="FPI">
<xsd:restriction base="xsd:normalizedString">
<xsd:pattern value="&fpi;"/>
</xsd:restriction>
</xsd:simpleType>
Looks good.
2.6. FrameTarget type
The HTML spec
(<URL:[43]http://www.w3.org/TR/html401/types.html#h-6.16>) seems to
want a slightly tighter definition of frame target names. Perhaps
something like the following should be used.
Good idea
2.7. LinkTypes type
LinkTypes is a good example of a type with what is sometimes called
a ‘semi-open’ list of values. Some set of well-known values is
defined, which software is encouraged to recognize and which authors
are encouraged to use when appropriate, but for strict validity, a
much larger set of values is allowed.
In such cases, it's good practice to document the recognized types
in the type definition. Since the well known values here are case
insensitive, that's best done with a list of patterns rather than
with an enumeration:
Frankly this looks rather like overkill to us. These values are intended
only to be an initial set, and many more to be used, so we don't really
see the value-add of including these few in the schema (especially since
it is not really readable).
2.8. Tightening other types
In general we agree that closed sets of values should be more tightly
defined; we are not so enamoured of defining values of open sets, since
there is no validation win.
2.9. Named model groups vs. substitution groups
We reiterate our advice of four years ago: the definition of the
XHTML vocabulary would be easier to follow, and it would be easier
to extend it, if the schema documents used substitution groups
wherever feasible.
If you have had specific problems applying substitution groups to
XHTML, we would very much like to know what they were; we can
speculate, but would prefer to hear from you.
The people who produced the schema felt that the approach used here to be
the most consistent with Modularization in general, and the one most
likely to work. However, we take your advice seriously, and would like to
adopt this. However, in order to allow modularization to proceed without
too much more delay, we will not adopt this (rather drastic) change in
this version, but save it for the planned version 2.
2.10. Adding attributes
It's not clear that the way modules add attributes works. For
example, the client side image map module adds attributes to the img
element. All well and good, but looking at the schema I see an
attribute group defined:
<!-- modify img attribute definition list -->
<xs:attributeGroup name="xhtml.img.csim.attlist">
<xs:attribute name="usemap" type="xs:IDREF"/>
</xs:attributeGroup>
I can't see where this actually is used anywhere in the schema. I
think what the module should be doing is a redefine of the groups.
The extension mechanisms get used in the 'drivers' which define a language
on the basis of the modules. There is no driver supplied with
modularization; you need to look at a particular language's use of
Modularization to see these in use.
2.11. A missing scenario
One important scenario that seems to be missing is just plonking
bits of the XHTML namespace into specific places in some other
namespace. Maybe its too obvious/easy, but it is actually the most
common scenario. e.g. MyOwnLanguage has its own things, and I'll
just put some XHTML inline elements here.
Introducing XHTML elements into the xsd:documentation elements in a
schema document is another instance of the scenario.
We have a concept of 'integration sets' which allow this usage. What we
will do is add an example to the spec to show how to do this, to make it
clearer.
3.1. Make the introduction less DTD-specific
This should be much better now.
3.2. The term PCDATA
fixed.
3.5. Shape type
Shouldn't the overview in section 4.3 say that Shape has just the
four values rect, circle, ply, and default?
Yes, it should, and will.
3.6. White space in the document source
Thanks. We will do a clean up prior to publication.
4.1. Testing the schema documents
[...]
[Later information from Shane McCarron is that this spec doesn't
provide a driver, but that
<URL:[52]http://www.w3.org/MarkUp/SCHEMA/xhtml11.xsd> might be
consulted as an example. To be followed up ...)
Indeed, the Modularization spec doesn't include any drivers. We have added
an informative link to one.
4.2. Where is the html element?
Where is the html element defined?
It is in the structure module.
(And, for the instruction of those seeking to understand
how to use these modules, a pointer to the XHTML 1.1 driver modules
would be very useful.
Done.
But the issue appears to at least some readers as at least partly
substantive: that is, it seems to us that a specification describing
a modular definition of the XHTML 1.1 vocabulary ought, in the
nature of things, to include a top-level driver module which calls
in all the others.
Coming from a group that didn't include a mechanism to specify what the
root element is, I am shocked!
But seriously, this is modularization 1.1, not the modularization of XHTML
1.1. Modularization 1.1 is and will be used by many different languages.
(See for instance
http://www.w3.org/MarkUp/Group/2007/xhtml-modularization-11-implementation
)
4.3. Case insensitivity and XML Schema patterns or enumerations
[...]
Given that many regex libraries already have such flags, such an
addition wouldn't seem to be difficult for implementors.
Should the XML Schema Working Group consider such a change?
It would make certain declarations easier to write, and make them actually
readable.
And if so, what is to be done about Unicode characters for which the
upper/lowercase mapping is not 1:1? And what should be done about
title case?
Ha! You're asking the wrong people...
Thanks for the comments.
Best wishes,
Steven Pemberton
For the XHTML2 Working Group
Received on Wednesday, 22 August 2007 12:20:46 UTC