- From: C. M. Sperberg-McQueen <cmsmcq@acm.org>
- Date: Tue, 27 Feb 2007 15:44:45 -0700
- To: www-html-editor@w3.org
- Cc: "C. M. Sperberg-McQueen" <cmsmcq@acm.org>, Schema IG <w3c-xml-schema-ig@w3.org>
Dear colleagues: On behalf of the XML Schema Working Group, I congratulate the HTML Working Group on your progress with XHTML Modularization. As described in the comments below, owing to a snafu the XML Schema WG did not review the Last Call WD of XHTML Modularization 1.1 last summer. In the hopes that the maxim "better late than never" is true in this case, we transmit to you now our comments on the document. My apologies for the snafu. Our comments are available at any of the URIs http://www.w3.org/XML/Group/2007/02/m12n-of-xhtml.xsd-comments http://www.w3.org/XML/Group/2007/02/m12n-of-xhtml.xsd-comments.xml http://www.w3.org/XML/Group/2007/02/m12n-of-xhtml.xsd-comments.html A text version is provided below for those who find it more convenient. --C. M. Sperberg-McQueen on behalf of the W3C XML Schema WG Notes on XHTML Modularization 1.1 Ed. by C. M. Sperberg-McQueen Submitted to the HTML Working Group on behalf of the XML Schema Working Group 27 February 2007 $Id: m12n-of-xhtml.xsd-comments.html,v 1.1 2007/02/27 22:36:18 cmsmcq Exp $ _________________________________________________________ * 1. [7]Background * 2. [8]Substantive comments + 2.1. [9]Charset type + 2.2. [10]Color type + 2.3. [11]ContentType + 2.4. [12]Coords type + 2.5. [13]FPI type + 2.6. [14]FrameTarget type + 2.7. [15]LinkTypes type + 2.8. [16]Tightening other types + 2.9. [17]Named model groups vs. substitution groups + 2.10. [18]Adding attributes + 2.11. [19]A missing scenario * 3. [20]Editorial comments + 3.1. [21]Make the introduction less DTD-specific + 3.2. [22]The term PCDATA + 3.3. [23]Section 4.3 Attribute Types + 3.4. [24]Length type: well done + 3.5. [25]Shape type + 3.6. [26]White space in the document source * 4. [27]Comments half substantive and half editorial + 4.1. [28]Testing the schema documents + 4.2. [29]Where is the html element? + 4.3. [30]Case insensitivity and XML Schema patterns or enumerations _________________________________________________________ NOTE: This document contains comments on the [31]Last Call Working Draft of XHTML™ Modularization 1.1. Several different readers formulated the comments; the editor has not attempted to unify and organize them strictly. The comments are forwarded to the XHTML Working Group on behalf of the XML Schema Working Group, but it should be noted that the XML Schema Working Group has not had the leisure to consider them in detail. The Last Call comment period on this draft ended 4 August 2006, so these comments are very late. They are being forwarded nonetheless in the hopes that even at this late date they may prove useful to those responsible for the XHTML Modularization spec. To minimize wasted effort, the copy actually consulted is the [32]editor's copy of 19 February 2007. [31] http://www.w3.org/TR/2006/WD-xhtml-modularization-20060705 [32] http://www.w3.org/MarkUp/Group/2007/WD-xhtml- modularization-20070219/introduction.html 1. Background Owing apparently to human error, the XML Schema Working Group failed to attend to the publication of the Last Call draft of [33]XHTML Modularization 1.1, and consequently failed to review the spec during the scheduled last-call comment period. We apologize for this oversight; our chair has administered severe counseling to our staff contact, and our staff contact has promised he will endeavor not to make similar mistakes in future. Since HTML and XHTML constitute by far the most widely used vocabularies published by any W3C Working Group, the Schema Working Group has a deep interest in making sure the formulations of XHTML using XML Schema are as useful as possible. The following comments have been prepared in haste, in an attempt to perform as useful a review as possible. The Schema Working Group's previous comments (apparently on the [34]Last Call draft of 9 December 2002) are at <URL:[35]http://www.w3.org/XML/Group/2003/01/xmlschema-notes-on-xhtm l-modularization.html> and were transmitted to the HTML WG in <URL:[36]http://lists.w3.org/Archives/Public/www-html-editor/2003Jan Mar/0043.html> and <URL:[37]http://lists.w3.org/Archives/Member/w3c-xml-schema-ig/2003J an/0099.html>. A quick summary of the earlier comments: 1. Please use the appropriate simple types. 2. Exploit substitution groups. 3. Explain what to do about multiple schemas for same namespace. 4. Don't declare everything blocked and final! 5. Sec 2.2.6 is opaque. 6. Point to external documentation. 7. Provide internal documentation. 8. Clarify conformance. 9. More concrete extension scenarios. 10. Exhibit structure of schema better. [33] http://www.w3.org/TR/2006/WD-xhtml-modularization-20060705 [34] http://www.w3.org/TR/2002/WD-xhtml-m12n-schema-20021209/ [35] http://www.w3.org/XML/Group/2003/01/xmlschema-notes-on- xhtml-modularization.html [36] http://lists.w3.org/Archives/Public/www-html-editor/ 2003JanMar/0043.html [37] http://lists.w3.org/Archives/Member/w3c-xml-schema-ig/ 2003Jan/0099.html It appears that the current document addresses a number of these comments very directly; others less so or not at all. The XML Schema Working Group appears not to have reviewed or sent comments on the later working drafts of [38]3 October 2003 or [39]13 February 2006. [38] http://www.w3.org/TR/2003/WD-xhtml-m12n-schema-20031003/ [39] http://www.w3.org/TR/2006/PR-xhtml-modularization-20060213/ 2. Substantive comments The following comments are substantive in the sense that they propose changes which would affect the validity of some documents in the XHTML family. Whether they are substantive in the sense that they would invalidate existing reviews of the Modularization document, we leave to others to decide. 2.1. Charset type Charset is defined as a vacuous restriction of xsd:string. That may be the right thing to do, but it seems likely that a better definition can be formulated. First, RFC 2045 defines charset values as either tokens or quoted-strings; it defines token as containing only ASCII characters and it seems to take over the definition of quoted-string from RFC 822, which define quoted-string as containing only ASCII characters. So a better definition of Charset might be <xsd:simpleType name="Other-Charset-identifier"> <xsd:annotation> <xsd:documentation> <div xmlns="http://www.w3.org/1999/xhtml"> <p>Charset values predefined by RFC 2046. The RFC restricts these values to ASCII characters, i.e. those in the Unicode BasicLatin block.</p> </div> </xsd:documentation> </xsd:annotation> <xsd:restriction base="xsd:string"> <xsd:pattern value="\p{IsBasicLatin}"> </xsd:pattern> </xsd:restriction> </xsd:simpleType> The IANA registry seems to say that in fact charset identifiers are limited to 40 characters, but it's not clear whether that rule is intended by the XHTML spec to be binding on Charset values in HTML documents. Another point is that it might be more helpful for readers (and possibly implementors) to define the type in such a way as to identify at least some of the well-known identifiers which user agents should recognize — e.g. those mentioned in RFC 2046 — as well as others. One way to do this would be to define a type listing the charset values identified in RFC 2046, and then define a union of that type with xsd:string. The well-known charset values can be enumerated: <xsd:simpleType name="RFC2046-Predefined-charsets"> <xsd:annotation> <xsd:documentation> <div xmlns="http://www.w3.org/1999/xhtml"> <p>Charset values predefined by RFC 2046. Other values are also accepted as charset values.</p> </div> </xsd:documentation> </xsd:annotation> <xsd:restriction base="xsd:string"> <xsd:enumeration value="US-ASCII"> <xsd:annotation> <xsd:documentation>As defined in ANSI X3.4-1986.</xsd:documentatio n> </xsd:annotation> </xsd:enumeration> <xsd:enumeration value="ISO-8859-1"/> <xsd:enumeration value="ISO-8859-2"/> <xsd:enumeration value="ISO-8859-3"/> <xsd:enumeration value="ISO-8859-4"/> <xsd:enumeration value="ISO-8859-5"/> <xsd:enumeration value="ISO-8859-6"/> <xsd:enumeration value="ISO-8859-7"/> <xsd:enumeration value="ISO-8859-8"/> <xsd:enumeration value="ISO-8859-9"/> <xsd:enumeration value="ISO-8859-10"/> </xsd:restriction> </xsd:simpleType> The problem with this is that the RFCs define charset values as case-insensitive. So probably a better way to define the well known charset values would be with patterns: <xsd:simpleType name="RFC2046-Predefined-charsets"> <xsd:annotation> <xsd:documentation> <div xmlns="http://www.w3.org/1999/xhtml"> <p>Charset values predefined by RFC 2046. Other values are also accepted.</p> </div> </xsd:documentation> </xsd:annotation> <xsd:restriction base="xsd:string"> <xsd:whiteSpace value="collapse"/> <xsd:pattern value="[Uu][Ss]-[Aa][Ss][Cc][Ii][Ii]"> <xsd:annotation> <xsd:documentation>As defined in ANSI X3.4-1986.</xsd:documentatio n> </xsd:annotation> </xsd:pattern> <xsd:pattern value="[Ii][Ss][Oo]-8859-(10|[1-9])"> <xsd:annotation> <xsd:documentation>ISO-8859 parts 1-10.</xsd:documentation> </xsd:annotation> </xsd:pattern> </xsd:restriction> </xsd:simpleType> The actual definition of Charset could usefully be a union of these two: <xsd:simpleType name="Charset"> <xsd:annotation> <xsd:documentation> <div xmlns="http://www.w3.org/1999/xhtml"> <p>Charset values. Accept values predefined by RFC 2046, and also other values.</p> </div> </xsd:documentation> </xsd:annotation> <xsd:union memberTypes=" xh11d:RFC2046-Predefined-charsets xh11d:Other-Charset-identifier "> </xsd:union> </xsd:simpleType> A more ambitous definition might mention all of the values in the IANA type registry, but the result, when examined, is rather long and not really very informative — rather like the registry itself — and it is not included here. 2.2. Color type Two things seem puzzling in the current definition of Color: (1) it allows any NMTOKEN, rather than just the sixteen well known color names. And (2) while six-digit hexadecimal values are allowed, three-digit values are not allowed. (The description of Color in HTML 4.01 (<URL:[40]http://www.w3.org/TR/html401/types.html#h-6.5>) doesn't actually specify how many digits are to be used for hex color values.) If these properties are unintentional, a type that identifies the well-known names and allows three-digit hex values may be better: <!-- sixteen color names or RGB color expression--> <xsd:simpleType name="Color"> <xsd:union> <xsd:simpleType> <!--* Known color names are case-insensitive *--> <xsd:restriction base="xsd:NMTOKEN"> <xsd:pattern value="[Bb][Ll][Aa][Cc][Kk]"/> <xsd:pattern value="[Gg][Rr][Ee][Ee][Nn]"/> <xsd:pattern value="[Ss][Ii][Ll][Vv][Ee][Rr]"/> <xsd:pattern value="[Ll][Ii][Mm][Ee]"/> <xsd:pattern value="[Gg][Rr][Aa][Yy]"/> <xsd:pattern value="[Oo][Ll][Ii][Vv][Ee]"/> <xsd:pattern value="[Ww][Hh][Ii][Tt][Ee]"/> <xsd:pattern value="[Yy][Ee][Ll][Ll][Oo][Ww]"/> <xsd:pattern value="[Mm][Aa][Rr][Oo][Oo][Nn]"/> <xsd:pattern value="[Nn][Aa][Vv][Yy]"/> <xsd:pattern value="[Rr][Ee][Dd]"/> <xsd:pattern value="[Bb][Ll][Uu][Ee]"/> <xsd:pattern value="[Pp][Uu][Rr][Pp][Ll][Ee]"/> <xsd:pattern value="[Tt][Ee][Aa][Ll]"/> <xsd:pattern value="[Ff][Uu][Cc][Hh][Ss][Ii][Aa]"/> <xsd:pattern value="[Aa][Qq][Uu][Aa]"/> </xsd:enumeration> </xsd:restriction> </xsd:simpleType> <xsd:simpleType> <!--* Other numbers are expressed using a hash mark plus a * three- or six-digit hexadecimal number *--> <xsd:restriction base="xsd:token"> <xsd:pattern value="#[0-9a-fA-F]{3}([0-9a-fA-F]{3})?"/> </xsd:restriction> </xsd:simpleType> </xsd:union> </xsd:simpleType> [40] http://www.w3.org/TR/html401/types.html#h-6.5 If it's desired to allow other NMTOKEN values to count as valid, as well as the sixteen named by HTML 4.01 (e.g. for the system colors allowed by CSS2 <URL:[41]http://www.w3.org/TR/REC-CSS2/syndata.html#value-def-color >]), then inserting <xsd:simpleType> <xsd:restriction base="xsd:NMTOKEN"/> </xsd:simpleType> [41] http://www.w3.org/TR/REC-CSS2/syndata.html#value-def-color as a final union member would do that. (Since the system colors of CSS2 appear to be a finite enumerated list, they could be defined in the same was as the sixteen names in HTML 4.01, although for clarity they should probably go into a different member type. That's left as an exercise for the reader.) 2.3. ContentType Like Charset, this could be defined as a union whose first member(s) recognize well-known values defined by the RFCs or in the IANA registry and whose final type (here xsd:string) takes care of extensibility. It's not clear to me whether the values are in fact limited by the RFC to ASCII characters; if so, xsd:string is a bit too broad. 2.4. Coords type Since the possible values of Coords values are so clearly specified in the spec, it seems a shame not to define the type a little more tightly. The absence of macros in XML Schema regular expressions makes life a little harder, but one reason XML Schema doesn't need macros in regexes is that we can use general entities. If we write the following entity declarations into the internal subset of the schema document, we have general entities which correspond to the important bits of coordinate strings, as defined in HTML (<URL:[42]http://www.w3.org/TR/html401/struct/objects.html#adef-coor ds>): <!ENTITY Pixel "\d+"> <!ENTITY Percent "(\d+[%]|\d*\.\d+[%])"> <!ENTITY Length "(&Pixel;|&Percent;)"> <!ENTITY Comma "\s*,\s*"> <!ENTITY Pair "&Length;&Comma;&Length;"> [42] http://www.w3.org/TR/html401/struct/objects.html#adef-coords That allows the declarations to be fairly clear about their structure: <xsd:simpleType name="Coords.rect"> <xsd:restriction base="xsd:token"> <xsd:pattern value="(&Length;&Comma;){3}(&Length;)"/> </xsd:restriction> </xsd:simpleType> <xsd:simpleType name="Coords.circle"> <xsd:restriction base="xsd:token"> <xsd:pattern value="(&Length;&Comma;){2}(&Length;)"/> </xsd:restriction> </xsd:simpleType> <xsd:simpleType name="Coords.poly"> <xsd:restriction base="xsd:token"> <xsd:pattern value="(&Pair;&Comma;){2,unbounded}(&Pair;)"/> </xsd:restriction> </xsd:simpleType> If they prove to cause trouble for any schema processors, of course, the entity references can be expanded. And the Coords type can be clear that what is expected is either the coordinates for a rectangle, or those for a circle, or those for a polygon. (Type-aware systems can use the information about which member type in the union actually accepted the value to perform a sanity check: if the coords attribute has type Coords.rect, then the value of the shape attribute had better be 'rect', and vice versa.) <xsd:simpleType name="Coords"> <xsd:union memberTypes=" xh11d:Coords.rect xh11d:Coords.circle xh11d:Coords.poly"> </xsd:union> </xsd:simpleType> 2.5. FPI type ISO 8879 appears to define the formal public identifier using a regular language, which means it's not necessary to allow any xsd:normalizedString value. (The formalization below assumes that only unregistered owner identifiers are to be used, since section 3.6 of this spec says the value must begin with '-'.) Building it up gradually using entities, one can write: <!ENTITY minimum-data "[ a-zA-Z()+,\-./:/?]*"> <!ENTITY owner-id "&minimum-data;"> <!ENTITY textclass1 "(DTD|ELEMENTS|ENTITIES|NOTATION|TEXT)"> <!ENTITY textclass2 "(CAPACITY|CHARSET|DOCUMENT|LPD|NONSGML|SHORTREF| SUBDOC|SYNTAX)"> <!ENTITY textclass "(&textclass1;|&textclass2;)"> It's not clear that any of the names in textclass2 make any sense whatever for modules intended for use in the XHTML family, so one might choose to omit them. <!ENTITY langname "(\i\c*)"> <!ENTITY designator "&minimum-data;"> <!ENTITY lang-or-des "(&langname;|&designator;)"> <!ENTITY display "&minimum-data;"> <!ENTITY textid "&textclass; (-//)?&textdesc;//&lang-or-des;(//&displ ay;)?"> <!ENTITY fpi "-//&ownerid;//&textid;"> The pattern is then quite simple: <xsd:simpleType name="FPI"> <xsd:restriction base="xsd:normalizedString"> <xsd:pattern value="&fpi;"/> </xsd:restriction> </xsd:simpleType> 2.6. FrameTarget type The HTML spec (<URL:[43]http://www.w3.org/TR/html401/types.html#h-6.16>) seems to want a slightly tighter definition of frame target names. Perhaps something like the following should be used. <xsd:simpleType name="FrameTarget"> <xsd:union> <xsd:simpleType> <xsd:restriction base="xsd:NMTOKEN"> <xsd:enumeration value="_blank"/> <xsd:enumeration value="_self"/> <xsd:enumeration value="_parent"/> <xsd:enumeration value="_top"/> </xsd:restriction> </xsd:simpleType> <xsd:simpleType> <xsd:restriction base="xsd:string"> <xsd:pattern value="[a-zA-Z].*"/> </xsd:restriction> </xsd:simpleType> </xsd:union> </xsd:simpleType> [43] http://www.w3.org/TR/html401/types.html#h-6.16 2.7. LinkTypes type LinkTypes is a good example of a type with what is sometimes called a ‘semi-open’ list of values. Some set of well-known values is defined, which software is encouraged to recognize and which authors are encouraged to use when appropriate, but for strict validity, a much larger set of values is allowed. In such cases, it's good practice to document the recognized types in the type definition. Since the well known values here are case insensitive, that's best done with a list of patterns rather than with an enumeration: <xsd:simpleType name="KnownLinkTypes"> <xsd:restriction base="xsd:NMTOKEN"> <xsd:pattern value="[Aa][Ll][Tt][Ee][Rr][Nn][Aa][Tt][Ee]"/> <xsd:pattern value="[Ss][Tt][Yy][Ll][Ee][Ss][Hh][Ee][Ee][Tt]"/> <xsd:pattern value="[Ss][Tt][Aa][Rr][Tt]"/> <xsd:pattern value="[Nn][Ee][Xx][Tt]"/> <xsd:pattern value="[Pp][Rr][Ee][Vv]"/> <xsd:pattern value="[Cc][Oo][Nn][Tt][Ee][Nn][Tt][Ss]"/> <xsd:pattern value="[Ii][Nn][Dd][Ee][Xx]"/> <xsd:pattern value="[Gg][Ll][Oo][Ss][Ss][Aa][Rr][Yy]"/> <xsd:pattern value="[Cc][Oo][Pp][Yy][Rr][Ii][Gg][Hh][Tt]"/> <xsd:pattern value="[Cc][Hh][Aa][Pp][Tt][Ee][Rr]"/> <xsd:pattern value="[Ss][Ee][Cc][Tt][Ii][Oo][Nn]"/> <xsd:pattern value="[Ss][Uu][Bb][Ss][Ee][Cc][Tt][Ii][Oo][Nn]"/> <xsd:pattern value="[Aa][Pp][Pp][Ee][Nn][Dd][Ii][Xx]"/> <xsd:pattern value="[Hh][Ee][Ll][Pp]"/> <xsd:pattern value="[Bb][Oo][Oo][Kk][Mm][Aa][Rr][Kk]"/> </xsd:enumeration> </xsd:restriction> </xsd:simpleType> <xsd:simpleType name="LinkTypes"> <xsd:union memberTypes="xh11d:KnownLinkTypes xsd:NMTOKEN"/> </xsd:union> </xsd:simpleType> 2.8. Tightening other types If we continue in the same way, we risk belaboring out point past reason. So instead of commenting in detail on individual types which could, it seems to us, usefully be made more restrictive, or more informative, or both, by means of enumerations or patterns to recognize well known values or unions to combine subtypes (including more and less restrictive definitions of a datatype), we will merely say that we believe other types should also be given definitions closer to the requirements of the prose. (MultiLength, for example, is not really that hard to capture with a pattern.) 2.9. Named model groups vs. substitution groups We reiterate our advice of four years ago: the definition of the XHTML vocabulary would be easier to follow, and it would be easier to extend it, if the schema documents used substitution groups wherever feasible. If you have had specific problems applying substitution groups to XHTML, we would very much like to know what they were; we can speculate, but would prefer to hear from you. Using named model groups for extensibility has a number of unfortunate side effects. For example, the schema includes this definition: <xs:group name="xhtml.title.content"> <xs:sequence/> </xs:group> What's the point of that, exactly? Presumably the idea is to play a similar trick to what you did when this was a DTD and splice your own stuff in there from your own namespace. But how does using a group get you there? It's not impossible, but it is harder than necessary and you could just as easily redefine the element in question directtly. So defining all these content groups just gums up the schema and makes it harder to read. (Those accustomed to DTD-based extension of vocabularies may have little trouble following the logic here, but that group may no longer be as large as it once was.) If a user wants to use XHTML and just add one little inline element or allow some new content in, say, the title element, the user has to jump through a few unnecessary hoops. This scenario could be better enabled even within the existing architecture just by adding an abstract substitution group head as a choice to all the named model groups. So even if you don't restructure the schema documents to use substitution groups wherever possible, you could simplify extensibility for users of the spec a great deal by just adding an abstract element to each group, or each content model where extensibiity is an obvious requirement, to provide hooks for later schema authors. 2.10. Adding attributes It's not clear that the way modules add attributes works. For example, the client side image map module adds attributes to the img element. All well and good, but looking at the schema I see an attribute group defined: <!-- modify img attribute definition list --> <xs:attributeGroup name="xhtml.img.csim.attlist"> <xs:attribute name="usemap" type="xs:IDREF"/> </xs:attributeGroup> I can't see where this actually is used anywhere in the schema. I think what the module should be doing is a redefine of the groups. 2.11. A missing scenario One important scenario that seems to be missing is just plonking bits of the XHTML namespace into specific places in some other namespace. Maybe its too obvious/easy, but it is actually the most common scenario. e.g. MyOwnLanguage has its own things, and I'll just put some XHTML inline elements here. Introducing XHTML elements into the xsd:documentation elements in a schema document is another instance of the scenario. 3. Editorial comments The following comments are editorial; we hope that they can be made without invalidating any existing reviews of the specification. 3.1. Make the introduction less DTD-specific Section 1 Introduction <URL:[44]http://www.w3.org/TR/xhtml-modularization/introduction.html > also <URL:[45]http://www.w3.org/MarkUp/Group/2007/WD-xhtml-modularization -20070219/introduction.html> sec 1.2 para 1: "These abstract modules are implemented in this specification using the XML Document Type Definition language, but an implementation using XML Schemas is expected." Read "These abstract modules are implemented in this specification using both the XML Document Type Definition language and XML Schema 1.0."? sec 1.3.4 para 2: [44] http://www.w3.org/TR/xhtml-modularization/introduction.html [45] http://www.w3.org/MarkUp/Group/2007/WD-xhtml- modularization-20070219/introduction.html A document is an instance of one particular document type defined by the DTD identified in the document's prologue. Validating the document is the process of checking that the document complies with the rules in the document type definition. Here (as elsewhere) there are traces of DTD-only terminology. Some SGML experts maintain that the term "document type definition" of ISO 8879 and XML is defined broadly enough to include schemas defined with XSD or with any other language currently known to information technology — on that reading, the only problem with the paragraph just quoted is the assumption that the document and its DTD are associated in the document's prologue. Normal usage, however, uses the term "document type definition" with narrower scope nowadays, to mean only those schemas written using the bracket-bang keyword syntax of ISO 8879 and the XML spec. On that reading, there are several things in this paragraph that apply only to conventional XML DTDs, not to schemas in general: In fact, any document is an instance of an infinite number of document types and schemas (or document type definitions), just as any object is contained by an infinite number of sets. This fact does not conflict with the equally important fact that an author may wish to advertise conformance to a particular schema or affiliation with a particular document type, either for the sake of tool support or for other reasons. Documents may be associated with a schema by their prolog, or by xsi:schemaLocation hints in the document instance, or by out-of-band associations between document and schema (e.g. by parameters passed to the validator at invocation time). Validation is the process of checking whether, not the process of ensuring that, a document complies with the rules in the document type definition. To make this paragraph cover the current situation (where you're providing normative XSD schema documents as well as normative DTDs), you might consider saying something like the following. If you're willing to adopt the term "schema" as the general term for a formal machine-readable expression of the rules for a document type, then: A document may be associated with a particular document type defined by a schema. The document's prolog may identify a DTD, or xsi:schemaLocation attributes may be used to associated the document with a schema written in XML Schema 1.0, or the document may be associated with a schema by other means (e.g. validation-time identification of the schema by means of a parameter passed to a validator). Validating the document is the process of testing whether the document complies with the rules in the schema. Or if you'd prefer to stay with "document type definition", you could write: A document may be associated with a particular document type. The document's prolog may identify a DTD, or xsi:schemaLocation attributes may be used to associated the document with a document type definition written in XML Schema 1.0, or the document may be associated with a document type definition by other means (e.g. a parameter passed to a validator). Validating the document is the process of testing whether the document complies with the rules in the document type definition. If you stick with "document type definition", you might want to add something to the definition of "document type definition" in the glossary, e.g. by changing the sentence: The same markup model may be expressed by a variety of DTDs. to something like The same markup model may be expressed by a variety of document type definitions, written in a variety of languages, such as the DTD notation of XML or XML Schema 1.0. just to make explicit somewhere that you're using "document type definition" to cover rules written in a variety of languages. You could mention Relax NG and/or Schematron, too, if you wish. 3.2. The term PCDATA Section 4.2 <URL:[46]http://www.w3.org/MarkUp/Group/2007/WD-xhtml-modularization -20070219/abstraction.html> 4.2 para 1 reads in part [46] http://www.w3.org/MarkUp/Group/2007/WD-xhtml- modularization-20070219/abstraction.html ... In these cases, the symbol used for text is PCDATA (processed characted data). This is a term, defined in the XML 1.0 Recommendation, that refers to processed character data. ... Strictly speaking, XML 1.0 doesn't define the term; it only says The keyword #PCDATA derives historically from the term "parsed character data." (Note also the typo 'characted' for 'character'.) We'd suggest rewording to say something like ... In these cases, the symbol used for text is PCDATA; this is short for "parsed character data", denoting sequences of characters which are to be parsed for markup by an XML processor. ... 3.3. Section 4.3 Attribute Types Congratulations to the editors; this section is much easier to read and follow than is sometimes the case when specs defined (or fail to define) fundamental types used throughout them. Some comments on the definitions of some of the datatypes, as found in <URL:[47]http://www.w3.org/TR/xhtml-modularization/SCHEMA/xhtml-data types-1.xsd> and other schema documents, may be found elsewhere. [47] http://www.w3.org/TR/xhtml-modularization/SCHEMA/xhtml- datatypes-1.xsd 3.4. Length type: well done The definition for Length seems well done. Good work! 3.5. Shape type Shouldn't the overview in section 4.3 say that Shape has just the four values rect, circle, ply, and default? 3.6. White space in the document source Minor but extremely irritating: <URL:[48]http://www.w3.org/MarkUp/Group/2007/WD-xhtml-modularization -20070219/schema_module_defs.html#a_smodule_Text> <URL:[49]http://www.w3.org/MarkUp/Group/2007/WD-xhtml-modularization -20070219/schema_module_defs.html#a_smodule_Presentation> (and presumably others) have the tabbing alignment in the schema messed up, making it harder to read. [48] http://www.w3.org/MarkUp/Group/2007/WD-xhtml- modularization-20070219/schema_module_defs.html#a_smodule_Text [49] http://www.w3.org/MarkUp/Group/2007/WD-xhtml- modularization-20070219/schema_module_defs.html#a_smodule_Presentation 4. Comments half substantive and half editorial The following comments may be regarded as purely editorial, or they may be regarded as substantive; we leave that judgment to you. 4.1. Testing the schema documents We endeavored to test the schema documents for syntax errors or other problems, but encountered some difficulty knowing where to start. Which file(s) should be used as the top-level driver file(s)? One test reported: I'm using files extracted from <URL:[50]http://www.w3.org/TR/xhtml-modularization/xhtml-modularizat ion.zip>. [50] http://www.w3.org/TR/xhtml-modularization/xhtml- modularization.zip xhtml-framework-1.xsd seems to be the root (the first one mentioned in Appendix C). But it won't compile (missing many att-groups like "xhtml.Core.extra.attrib" and "xhtml.I18n.extra.attrib"). I can't tell whether this is an error or users of these schemas must provide definitions of those att-groups. (Looks like the latter, because one of the examples myml-model-1.xsd defines those missing groups.) I was hoping testing.xml can be a little more helpful, but unfortunately it refers to <URL:[51]file:/C:/cygwin/home/ahby/htmlwg/xhtml-modularization/SCHEM A/xhtml11.xsd> I really hope I can't access someone else's "file:/C:/" xhtml11.xsd doesn't exist anywhere. [51] file://localhost/C:/cygwin/home/ahby/htmlwg/xhtml- modularization/SCHEMA/xhtml11.xsd So I gave up on that. Then I looked in the examples directory. "simpleml-1_0.xsd" doesn't refer to anything like "../". It redefines "xhtml.Misc.class" in http://www.w3.org/MarkUp/SCHEMA/xhtml-basic10.xsd. But Xerces-J fails to locate that group in the schema being redefined. (I found a Misc.class, but nothing starts with "xhtml.".) I then got many more errors about missing components. Similar to the ones I got from xhtml-framework-1.xsd, but different. (Note that these errors are from schema files in http://www.w3.org/MarkUp/SCHEMA/.) My last hope was those .html files in examples. Unfortunately they all they gave me was more errors, both in the schema and the instance. In summary, I don't know how these files should be used, so I can't claim that they are broken. No useful input from me ... [Later information from Shane McCarron is that this spec doesn't provide a driver, but that <URL:[52]http://www.w3.org/MarkUp/SCHEMA/xhtml11.xsd> might be consulted as an example. To be followed up ...) [52] http://www.w3.org/MarkUp/SCHEMA/xhtml11.xsd 4.2. Where is the html element? (Possibly related to the preceding.) Where is the html element defined? After some searching, starting not from this document but from <URL:[53]http://www.w3.org/MarkUp/SCHEMA/xhtml11.xsd>, we found a definition in <URL:[54]http://www.w3.org/MarkUp/SCHEMA/xhtml11-model-1.xsd>. This may be solely an editorial issue: the abstract says [53] http://www.w3.org/MarkUp/SCHEMA/xhtml11.xsd [54] http://www.w3.org/MarkUp/SCHEMA/xhtml11-model-1.xsd This modularization provides a means for subsetting and extending XHTML, a feature needed for extending XHTML's reach onto emerging platforms. This specification is intended for use by language designers as they construct new XHTML Family Markup Languages. and this had lead at least some readers to infer that the modules defined here would include everything needed for a definition of XHTML 1.1, including the top-level driver files. If the problem is editorial, the solution is also editorial: the spec needs to make clear(er) that no top-level driver for XHTML is provided. (And, for the instruction of those seeking to understand how to use these modules, a pointer to the XHTML 1.1 driver modules would be very useful. If such a pointer is already present, then let this note serve as a record that at least some readers didn't see the pointer when they needed to.) But the issue appears to at least some readers as at least partly substantive: that is, it seems to us that a specification describing a modular definition of the XHTML 1.1 vocabulary ought, in the nature of things, to include a top-level driver module which calls in all the others. 4.3. Case insensitivity and XML Schema patterns or enumerations Several of the alternative type definitions offered elsewhere in these comments propose to use patterns (rather than enuemerations, as one might expect) to handle the well known values for types which have well known values. In the numerous cases in which the values are defined as case insensitive, the pattern for a (case-insensitive) value like “black” is written “<xsd:pattern value="[Bb][Ll][Aa][Cc][Kk]"/>”. The regularity with which this technique must be used suggests that perhaps XML Schema should add a caseInsensitive flag to patterns. This would allow writing the pattern “<xsd:pattern value="black" caseInsensitve="true"/>” instead. Given that many regex libraries already have such flags, such an addition wouldn't seem to be difficult for implementors. Should the XML Schema Working Group consider such a change? And if so, what is to be done about Unicode characters for which the upper/lowercase mapping is not 1:1? And what should be done about title case?
Received on Tuesday, 27 February 2007 22:45:37 UTC