[Prev][Next][Index][Thread]

Some XML syntax/semantic thoughts (PI and RMD)



As I was going over the RMD stuff again, while thinking over the use of the
subset to eliminat PIs, I had some thoughts that might let us simplify the
RMD.

   I noticed that the RMD is required in certain places (ie. you _must_
rather than _should_ declare RMD to include any attribute defaults).

   I think that this places an onus on the user to declare things, but does
not give much benefit: If the RMD is incorrect, a parser that doesn't need
a DTD (the kind the RMD is intended to aid) will silently produce incorrect
results. A parser that does use a DTD, (and does not need an RMD) cannot
ignore it, but most check the locus of declaration of certain things so
that it can issue an error message in case the RMD is incorrect.

   First, I think we should certainly simplify RMD to two values: INTERNAL,
and ALL. If allow multiple attribute list declarations, this will let us
solve the "default value problem" in a much more natural way. The RMD could
even be left out, and an explicit note made in the standard that
DTD-ignoring applications will not see default attribute values declared in
the DTD.

    It will still be possible for DTD-parsing applications to print a
warning if the user desires, and should simplify the author's life a little
bit.

    Now for PIs. I've thought about the PI thing a lot, since some off-list
mail with Michael where he pointed out the PIs are the best way that we
have of adding declarations to XML, while retaining SGML compatibility.

   This point is completely correct, and given the PI keyword reservation
discipline we have added in XML, this can be well controlled too, so I
found myself at a loss as to my continuing discomfort with PIs, since I
still believe that they are, at bottom, a gross hack.

   But being a gross hack is usually incompatible with being the best (as
opposed to a workable) solution to a design problem.

   But I think I have resolved the cognitive dissonance in my own mind, and
I'd like to suggest that some simple wording and production changes would
make things nicer.

   PIs in SGML are a way to extend the capailities of SGMl processors in
arbitrary ways. As such they are a kind of pandora's box that can be used
to add arbitrary _non-structural_ markup to a document. As a true believer
in content markup this disturbs me.

   XML needs the PI syntax, as that is the only way we can add declarations
to XML and still remain SGML compatible. So, I would suggest that we split
the PI into two constructs: an "extension declaration" and a "Processing
Instruction". extension declarations can occur only in the external or
internal subset. I would argue that user-declarations might not even be
needed, but if people want them that is OK, I guess. A processing
instruction is any <? ?> sequence tha occur in the document instance.

   Just separating the terminology makes the fact that there are two
different functions much cleare. It is even clearer that arbitrary XML
extension declarations cannot be part of the document, and that PI usage by
users in the document instance is a completely different animal.

   So now that I've agreed that PI declarations are good, I'd like to
suggest a change to the syntactic requirements (as opposed to their
doucmentation).

   Let's _require_ the use of a declared notation in the document instance
when using PIs. This one thing would make handling random PIs in the
instance easier to conceptualize, and would make it much easier to be sure
whether it's safe to ignore a PI or not.

   This will break old documents with TROFF commands wedged into them, but
I think the additional structure would be worth it.

   So my suggestions are:

   A. make two productions for PI as declaration, and PI as inline instruction.

   B. require inline instructions (and user declarations if we want them)
to declare a notation as well.
[[[   This is more compatible with HyTime, as I understand the current
stuff, and moves PI from kludge to "structured kludge." I can be happy with
a structured kludge. ]]]

   C. RMD should go away, with a warning that some applications will not
see delcarations outside the internal subset. It will no longer pe
permissible for an application to ignore the subset, even if it only cares
about well-formedness.

   D. We should allow multiple attribute declarations to ease the use of
the subset for specifying default values.

[[[ I think that this simplifies XML (and removes the need to explain the
legality of the odd case where RMD=ALL when there is nothing declared). ]]]

   -- David

I am not a number. I am an undefined character.
_________________________________________
David Durand              dgd@cs.bu.edu  \  david@dynamicDiagrams.com
Boston University Computer Science        \  Sr. Analyst
http://www.cs.bu.edu/students/grads/dgd/   \  Dynamic Diagrams
--------------------------------------------\  http://dynamicDiagrams.com/
MAPA: mapping for the WWW                    \__________________________