- From: James Clark <jjc@jclark.com>
- Date: Tue, 08 Oct 1996 14:14:36 +0000
- To: W3C SGML Working Group <w3c-sgml-wg@w3.org>
I'm strongly against Tim's proposal. >2.1 Quality of Specification > >This is close to my heart as co-editor of the XML spec. If we use DTD >syntax, the spec will have to: (a) specify the DTD syntax, (b) specify how >the declarations in the DTD syntax impose constraints on instances. If >we use instance syntax, we lose almost all of (a) You don't lose it. You still have to have to include the DSD for DSDs. I don't think that's going to be any shorter that a BNF gramar for the SGML DTD syntax. All potential implementors will be familiar with BNF. Almost none will be familiar with DSDs. An implementor coming fresh to the spec is going to have a hard time understanding DSDs if the only specification of what is a legal DSD is itself in the form of a DSD. I think the only real gain in the spec is that the prolog doesn't have any additional lexical structure that has to be specifeid. Overall I would say that the best spec would be one that gave a clean BNF for a simplified subset of SGML DTDs. >2.2 Integrity > >I for one, once we get XML done, plan to start browbeating the document >management and authoring practitioners of the world along the lines of "now >that XML exists, you have NO EXCUSE for not using descriptive markup in your >structured documents!" If I convince them, and they then start using XML, >and the first structured document they encounter is a DTD, with its rather >ad-hoc syntax, then I've lied to them. I find this completely unacceptable. I think this is totally bogus. A DTD is structured information. That doesn't make it a document, unless every set of structured information constitutes a document. It is no part of the philosophy of SGML to insist that SGML notation be used to represent every kind of structured information. Do you feel the urge to convert C programmers over from writing: int main(int argc, char **argv) { printf("Hello world\n"); return 0; } to <procedure identifier="main"> <returns><basic-type kind="int"></basic-type></returns> <argument identifier="argc"> <basic-type kind="int"></basic-type> </argument> <argument identifier="argv"> <pointer-to><pointer-to><basic-type kid="int"></basic-type> </pointer-to></pointer-to <body> <expression-statement> <procedure-call> <function> <identifier name="printf"></identifier> </function> <argument> <string-literal>Hello world&newline;</string-literal> </argument> </procedure-call> </statement> <return-statement> <integer-constant value=0></integer-constant> </return-statement> </body> </procedure> ? I hope not. Why do you feel the urge to impose such a change for DTDs? >2.3 Ease of Implementation > >If there is only one syntax for declaration and instance, then the >prospective XML processor author only needs one lexer and one parser. >Granted, the DTD language is not the hardest in the world, but two >lexer/parsers are always harder than one to build. I don't believe this is true. With the DSD syntax an implementation has to construct its internal representation of DTDs from its internal representation of elements and attributes. This would probably be more work than writing the YACC actions that construct the representation of DTDs using the existing SGML syntax. I don't think YACC is really a good tool for parsing instances, but I think that it could work quite well for DTDs. I would concede that the lexer gets a little more complicated using the existing SGML syntax, so overall I would say that there's not much difference in implementation difficulty. >2.4 Familiarity > >If we can go to the HTML community and tell them "not only can you now add >your own structures to documents and still deliver them on the web, but >here's how you do it, and it's just another bunch of tags", this removes >another significant barrier to resistence. I cannot in good conscience >defend to these people the proposition that to do this something >straightforward, obvious, and good, they have to learn another language. Why is it so much easier to learn a bunch of new tags/attributes than it is to learn a few more declarations? The most complex part of DTDs is content models. I think people will have a much easier time understanding the current SGML syntax because of its similarity to regular expressions. The DSD syntax for content-models is an order of magnitude less readable than the existing DTD syntax. >3. SGML Interoperability Issues >(b) I propose hiding the XML markup declarations from SGML processors inside > a processing instruction as follows: <?XML DSD ... >. > Similarly, I propose <?XML SSD for XML declaration subsets. > [note: either we change PIC, or we live with a lot of > in markup > declarations, or we figure out a better way to hide XML markup > declarations] The fact that the DSD proposal is going to require some sort of gigantic kluge like this makes it a non-starter for me. I just don't see how people can think that a proposal that involves this sort of thing is an improvement over the existing SGML syntax. >This is the RIGHT THING TO DO. The SGML syntax for DTDs is not going to go away. Anything that ends up forcing SGML users and implementors to deal with two totally incompatible syntaxes for exactly the same semantics is not my idea of the RIGHT THING TO DO. James
Received on Tuesday, 8 October 1996 09:20:56 UTC