Modules, Modularization, and the XHTML Family

(I just had a call with Markus where we went over some deliverables 
related to module implementations.  During that call, we got onto the 
topic of host language conformance requirements.  In particular, we 
wandered into the topic of subsetting of modules.  I want to give Markus 
full credit here - he really helped focus my thinking.)

Many of you will remember that I am fierce opponent of subsetting.  The 
conformance rules related to module integrity are largely the result of 
my lobbying back in the day.  So it may come as a bit of a shock when I 
say that I have changed my mind.

The primary argument for modules, and the composition of modules into 
markup languages, is that the structure of the resulting "host" 
languages will be similar enough that they can be a member of the XHTML 
Family.  Languages that are part of the XHTML Family are in theory 
portable among XHTML conforming user agents.  We get to use terms like 
"graceful degradation" and we sound all cool and stuff.

In retrospect, I think I "grabbed the wrong end of the stick".  The key 
to portability of content is NOT that the host language support 
everything in a module.  The key is that the USER AGENTS support the 
core functionality of the module so that languages that rely upon that 
functionality will have the portability we desire.

"So what?" I hear you saying.  Well, I was thinking that it would be 
possible to broaden the utility and appeal of our modules by loosening 
our rules for how the modules can be used.  Specifically, identifying 
the places where it is permissible for host languages to further 
restrict the content model, eliminate elements, or eliminate attributes 
from attributes - while at the same time making it clear that user 
agents are *required* to support the entire module.

Obviously this can't be a blanket change.  The rule of thumb in 
identifying things that a host language could omit / reduce should be 
"will a document that conforms to this language work correctly in an 
xhtml family user agent?  And will it work in legacy user agents when 
relying upon their interpretation of XHTML as HTML?"  Omitting the "em" 
element, for example, would work fine.  Lots of documents don't have 
"em" elements.  Omitting the "tr" element from the table module, on the 
other hand, would be a disaster!

If we were to identify the things that can be safely "subsetted" in our 
modules, what would be the upside?  Some obvious items:

1.  More flexibility in the application of our modules.

2.  Greater appeal to the "structured" markup community.  The folks who
     brought us SGML in the first place because they wanted very tight
     document construction rules.

3.  Less need to create lots of little modules since a language designer
     can just ignore the bits they don't like.


What are the risks?  Well, first there is the significant risk we will 
get it wrong.  Identifying candidate elements and attributes is going to 
be painstaking.  Then there is the collateral risk that user agent 
implementors will ignore our requirements and start implementing subsets 
that support only the host languages they care about.  Finally, there is 
the less obvious risk that this will jump-start a plethora of xhtml 
family host languages with their own "minted" document types that 
existing user agents will misinterpret.

What do people think?  Am I missing something?  Would anyone oppose an 
effort to make our host language conformance rules more flexible?

-- 
Shane P. McCarron                          Phone: +1 763 786-8160 x120
Managing Director                            Fax: +1 763 786-8180
ApTest Minnesota                            Inet: shane@aptest.com

Received on Friday, 6 February 2009 18:21:18 UTC