Re: A28: syntax of markup declarations? (LONG)

I'm strongly against Tim's proposal.

>2.1 Quality of Specification
>This is close to my heart as co-editor of the XML spec.  If we use DTD
>syntax, the spec will have to: (a) specify the DTD syntax, (b) specify how
>the declarations in the DTD syntax impose constraints on instances.  If
>we use instance syntax, we lose almost all of (a)

You don't lose it.  You still have to have to include the DSD for DSDs.  I
don't think that's going to be any shorter that a BNF gramar for the SGML
DTD syntax.  All potential implementors will be familiar with BNF.  Almost
none will be familiar with DSDs.  An implementor coming fresh to the spec is
going to have a hard time understanding DSDs if the only specification of
what is a legal DSD is itself in the form of a DSD.  I think the only real
gain in the spec is that the prolog doesn't have any additional lexical
structure that has to be specifeid.  Overall I would say that the best spec
would be one that gave a clean BNF for a simplified subset of SGML DTDs.

>2.2 Integrity
>I for one, once we get XML done, plan to start browbeating the document
>management and authoring practitioners of the world along the lines of "now
>that XML exists, you have NO EXCUSE for not using descriptive markup in your
>structured documents!"  If I convince them, and they then start using XML,
>and the first structured document they encounter is a DTD, with its rather
>ad-hoc syntax, then I've lied to them.  I find this completely unacceptable.

I think this is totally bogus. A DTD is structured information.  That
doesn't make it a document, unless every set of structured information
constitutes a document.  It is no part of the philosophy of SGML to insist
that SGML notation be used to represent every kind of structured
information.  Do you feel the urge to convert C programmers over from writing:

int main(int argc, char **argv)
  printf("Hello world\n");
  return 0;


<procedure identifier="main">
<returns><basic-type kind="int"></basic-type></returns>
<argument identifier="argc">
<basic-type kind="int"></basic-type>
<argument identifier="argv">
<pointer-to><pointer-to><basic-type kid="int"></basic-type>
<identifier name="printf"></identifier>
<string-literal>Hello world&newline;</string-literal>
<integer-constant value=0></integer-constant>


I hope not.  Why do you feel the urge to impose such a change for DTDs?

>2.3 Ease of Implementation
>If there is only one syntax for declaration and instance, then the
>prospective XML processor author only needs one lexer and one parser.
>Granted, the DTD language is not the hardest in the world, but two
>lexer/parsers are always harder than one to build.

I don't believe this is true.  With the DSD syntax an implementation has to
construct its internal representation of DTDs from its internal
representation of elements and attributes.  This would probably be more work
than writing the YACC actions that construct the representation of DTDs
using the existing SGML syntax. I don't think YACC is really a good tool for
parsing instances, but I think that it could work quite well for DTDs. I
would concede that the lexer gets a little more complicated using the
existing SGML syntax, so overall I would say that there's not much
difference in implementation difficulty.

>2.4 Familiarity
>If we can go to the HTML community and tell them "not only can you now add
>your own structures to documents and still deliver them on the web, but
>here's how you do it, and it's just another bunch of tags", this removes
>another significant barrier to resistence.  I cannot in good conscience
>defend to these people the proposition that to do this something
>straightforward, obvious, and good, they have to learn another language.

Why is it so much easier to learn a bunch of new tags/attributes than it is
to learn a few more declarations?  The most complex part of DTDs is content
models.  I think people will have a much easier time understanding the
current SGML syntax because of its similarity to regular expressions.  The
DSD syntax for content-models is an order of magnitude less readable than
the existing DTD syntax.

>3. SGML Interoperability Issues

>(b) I propose hiding the XML markup declarations from SGML processors inside
>    a processing instruction as follows: <?XML DSD ... >.
>    Similarly, I propose <?XML SSD for XML declaration subsets.
>    [note: either we change PIC, or we live with a lot of &gt; in markup 
>    declarations, or we figure out a better way to hide XML markup 
>    declarations]

The fact that the DSD proposal is going to require some sort of gigantic
kluge like this makes it a non-starter for me.  I just don't see how people
can think that a proposal that involves this sort of thing is an improvement
over the existing SGML syntax.

>This is the RIGHT THING TO DO.

The SGML syntax for DTDs is not going to go away.  Anything that ends up
forcing SGML users and implementors to deal with two totally incompatible
syntaxes for exactly the same semantics is not my idea of the RIGHT THING TO DO.