Re: Modules, Modularization, and the XHTML Family from Roland Merrick on 2009-02-10 (public-xhtml2@w3.org from February 2009)

From: Roland Merrick <roland_merrick@uk.ibm.com>
Date: Tue, 10 Feb 2009 16:11:37 +0000
To: Markus Gylling <markus.gylling@gmail.com>
Cc: XHTML WG <public-xhtml2@w3.org>, public-xhtml2-request@w3.org
Message-ID: <OF902F7906.28942533-ON80257559.0057DE80-80257559.0058F541@uk.ibm.com>
Greetings, the idea of subsetting modules is fine with me. 

As mentioned there are two constituencies that would be affected, one is 
the language designers that are assembling a language from modules and/or 
parts of modules. These are the constituents directly affected by the 
modularisation rules. I do not think that the <tr> example is too arduous 
a problem, we should be able to assert that it is a mandatory descendent 
of <table> or that <td> and <th> must have a parent of <tr>. We must adopt 
the principle that a document written to valid according to the 
"subsetted" module must also be valid  according to the "complete" module.

The other constituency are the document authors and brings me to a subject 
I have been bothered about before, how can a author, particularly of a 
fragment assert what kind of document fragment is is? We already have the 
situation that I can write a document (or fragment) that conforms to more 
than one language specification but the the author is unable to assert 
that fact.

Regards, Roland




From:
Markus Gylling <markus.gylling@gmail.com>
To:
XHTML WG <public-xhtml2@w3.org>
Date:
09/02/2009 11:37
Subject:
Re: Modules, Modularization, and the XHTML Family




One way to approach this issue is to ask the question what the
fundamental functional requirements for a next generation X(HT)ML
modularization framework are. XHTML M12N has always allowed
extensions. The option to define subsetting restrictions on imported
modules is so far absent. From my point of view as a language
designer, this appears (in lack of a better phrasing) mildly
asymmetric, and in my humble opinion, this is a feature worthy of
consideration for inclusion in XHTML M12N 2.0.

As Shane points out, we are not talking about allowing arbitrary
subsetting here: a mechanism would need to be put in place that allows
the module provider to express where subsetting is allowed (or the
inverse, thats a spec design choice). The table example is a good one
- in this scenario, the provider of the table module would be able to
express that while a user of the module is not allowed to remove
table, tr and td, it is possible to have an impact on the content
model of td.

To make the use case for this feature more concrete, let me give a few
real-world examples of subsetting desires that have surfaced within
the ANSI/NISO Z39.86 standard context whilst evaluating the viability
of adopting XHTML M12N 2.0. Some of these are examples from profiles
that pertain to quite specialized document type domains, but some of
them I would say are quite generic too, and can as such be said to
demonstrate the subsetting use case for language designers that strive
to produce highly structured/predictable grammars based on XHTML M12N
2.0.

(Further, these examples incidentally relate to the XHTML2 modules,
but of course a generic subsetting mechanism would be equally relevant
to any module created under the aegis of M12N 2.0, regardless of
whether it is using the XHTML namespace.)

- Allowing only one h element within section
- Allowing only one h element within section, and requiring it to be
the first child of section
- Allowing ul, ol, and dl but not nl [1]
- Allowing h but not h1-h6 (or vice versa)
- Disallowing recursive inlines (abbr inside abbr for example)
- Disallowing Structure class and Text class members to be mixed in a
sibling list (see: current Flow model)
- ... and many examples from attribute collections (such as: allowing
only @xml:id, not @id document-wide, or vice versa)

As far as I can see, these examples favorably match Shanes rule of
thumb "will a document that conforms to this language work correctly
in an xhtml family user agent". Another way to put it is: an XHTML
Family compliant UA would not be able to tell from the document
infoset alone that it was authored against a subsetting schema.

(For clarity, it should also be noted that from a language designers
point of view, the expression of additional restrictions isnt always
about subsetting, but sometimes about plain dis-optionalization
(ouch). Examples: requiring xml:lang on root, requiring di in dl.
Perhaps M12N-current already allows this, in which case (and unless I
have missed the obvious spec fragment where this is spelled out) I
would suggest a light editorial pass to clarify for the average reader
that this is so.)

>What are the risks?
Indeed, finding the sweet spot in terms of dynamicity is not the
simplest thing to do. But, in light of the upside items Shane
mentions, I would argue that its worth a try. Obviously, careful
evaluation and risk assessment would need to be performed on candidate
solutions.

hth, /markus

[1] Note: this is not at all related to the discussion whether nl is
appropriate in the XHTML2 document type, but related only to XHTML
M12N 2.0 as an example of module subsetting to fit a given specialized
document type context.









Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
Received on Tuesday, 10 February 2009 16:12:21 UTC