Re: PIs considered harmful Was: XML-SW, a thought experiment

 Norm,
 had the DocBook schema been designed to allow for arbitrary 
extensibility in an other namespace, the added element (or 
attribute) would be the solution. Look at HTML - it ignores 
unknown tags. In XML we could ignore unknown elements from 
unknown namespaces _and_ their contents as well, of course.

 But I can see why you don't want to change DocBook's schema now 
that it's set. (I only can't see a reason why not to add that 
arbitrary extensibility in the first place.)

 Anyway, the solution I'd propose would be something like:

<variablelist dbhtml:list-format="table">
  ...
</variablelist>

 Yes it would not validate against the original schema, but this 
is not a raw DocBook document, it is a document extended with 
information for transformations. Why not admit that and create a 
schema that allows attributes (or elements) from the known 
extension namespace (here prefixed dbhtml)? 

 I've used an attribute here and not an element because this 
parameter is actually an attribute of the list, not a member 
thereof. That's also the wrong thing about PIs - it's usually 
working as a switch acting "on the following siblings", not "on 
the children of the switch" or "on the parent element of the 
switch", both of which would IMHO be more XML-ish.

 Alternatively, you could create an other file which would list
which variablelists should be formatted as tables, for example as
tuples of an XPath expression and a list format type. The
stylesheet could then heed that information. I think you might
have suggested this approach below, actually.

 You already said that you have to change your stylesheet to 
check for that processing instruction, so I don't see why it 
would be substantially worse to have to change the stylesheet to 
check for a known element (or attribute) and not process it using 
the default rules. If this is considerably more complex in XSLT 
(or whatever you use), then it's a problem of XSLT, not of XML, I 
think. 

 If you decided to throw away XML's namespace extensibility in
*all* your stylesheets (by adding the default rules copying and
quoting the tags into the result document), well, you've closed 
one door before yourselves. 

 Lastly to the security consideration... I'm not moved by that 
argument either, for the exactly same reasons.

                   Jacek Kopecky

                   Senior Architect, Systinet (formerly Idoox)
                   http://www.systinet.com/



On Wed, 13 Feb 2002, Norman Walsh wrote:

 > / Jacek Kopecky <jacek@systinet.com> was heard to say:
 > |  I'm speaking here as a relative newcomer to the depths of XML, 
 > | but I have a feeling that you wish for three things which 
 > | together contradict themselves:
 > |
 > |  1) maintain tight control over your vocabulary,
 > |  2) extend it nevertheless in specific applications,
 > |  3) validate the extended documents according to the original 
 > | tight schema.
 > 
 > That's not actually quite what I want. I want to ignore my
 > instructions for how the document should be processed when I'm testing
 > the validity of the document. They aren't relevant.
 > 
 > |  Why does the specific application not validate against a 
 > | specific schema? You could get the benefit of validating the 
 > | extensions, too.
 > 
 > Let's look at a concrete example.
 > 
 > DocBook has <variablelist>s. They're basically like HTML DLs. Suppose
 > I have a book that contains a whole bunch of these. I write an XSL
 > stylesheet to produce PDF (via XSL FOs) from this book. I print it and
 > the design department reviews it and says, "Yep, perfect, exactly what
 > the publishing specs say. Go ahead and send it to the printer."
 > 
 > Next, I write a stylesheet to produce HTML for online publication of
 > the book. This time the design department says, "You know, norm, a
 > bunch (but not all) of these lists look sortof awkward as HTML lists.
 > Could you make them into tables instead?"
 > 
 > Naturally, I flatly refuse. They aren't tables semantically and it
 > would be wrong to turn them into tables in the XML source just because
 > someone thinks they'd look prettier in HTML. And besides, even if I
 > was willing to do that, I'd have to go through the whole print
 > approval cycle again. I'd rather have a root canal.
 > 
 > What I really want here is, uh, how can I describe this? What I want
 > is an instruction that I can insert into my document that will tell a
 > particular processor that it should do something special. I want a,
 > wait for it, a processing instruction!
 > 
 > So I add a few PIs to my source document:
 > 
 >   <variablelist>
 >     <?dbhtml format="table"?>
 >     ...
 >   </variablelist>
 > 
 > I tweak my HTML stylesheet and voila, I'm finished in an afternoon.
 > And the print stylesheets still do exactly what they should. And the
 > design department is happy with what the HTML stylesheet produces. And
 > I get to go home before bedtime and have a cookie because I met all my
 > deadlines.
 > 
 > The alternative that's most often suggested to PIs is using an element
 > in a foreign namespace:
 > 
 >   <variablelist>
 >     <dbhtml:format-as-table/>
 >     ...
 >   </variablelist>
 > 
 > I'm sorry, that's just not a reasonable suggestion:
 > 
 > 1. I have $35,000 editing, content, and workflow management system
 > that took six months to build, install, and debug that is built around
 > the DocBook schema. You want me to make a local change to that system
 > to support one formatting request?
 > 
 > 2. I exchange files with 11 authors and 6 translators on 3 continents.
 > You want me to propagate my schema change to all of them?
 > 
 > 3. Some of the folks that I exchange documents with work for stuffy
 > organizations that insist on industry standard schemas. DocBook does
 > not now, nor is it ever likely, to allow random namespaced cruft. You
 > want me to get the DocBook Technical Committee to accept a request to
 > change the DTD to support my formatting request? (Here's a tip, as the
 > chair of that TC, I know what the likely answer is going to be :-)
 > 
 > 4. *Every* stylesheet that processes the document has to go to special
 > effort to deal with or ignore the extra elements. (The stock HTML stylesheets
 > for DocBook will turn this into <font color="red">&lt;dbhtml:format-as-table&gt;</font>,
 > for example.)
 > 
 > The only reasonable answer that I see (please, don't suggest using CSS
 > instead of a table; that may or may not be reasonable depending on the
 > browsers involved and it isn't what the design department told me to
 > do (and if you really wanted me to, I could come up with a similar
 > example that isn't amenable to a CSS solution)), is to move this
 > formatting information completely out of band.
 > 
 > But that's a lot more work and it's a lot more fragile. The PI is
 > entirely harmless (and invisible) to processors that don't care about
 > it, but provides useful information for processors that go out of
 > their way to look for it.
 > 
 > The argument that PIs are a security danger doesn't move me at all.
 > Anyone that implements a system that processes <?runthis cmd="rm -rf
 > ~/"?> knows full well what door they've left open and had better take
 > precautions.
 > 
 >                                         Be seeing you,
 >                                           norm
 > 
 > 

Received on Wednesday, 13 February 2002 10:40:50 UTC