W3C home > Mailing lists > Public > w3c-sgml-wg@w3.org > November 1996

Re: Quran test file

From: Len Bullard <cbullard@HiWAAY.net>
Date: Wed, 27 Nov 1996 07:48:34 -0600
Message-ID: <329C46B2.326D@HiWAAY.net>
To: Jon Bosak <bosak@atlantic-83.Eng.Sun.COM>
CC: w3c-sgml-wg@w3.org
Jon Bosak wrote:
> 
> [Len Bullard:]
> 
> | BTW, for the Quran test file, is this right?
> |

> Can't be; you have two content models for v and vn.

Yep.  The stylesheet builder noted that to me and I 
forgot to take it out of the file on the disk.  Duh!!
 
> Interestingly (in light of a previous discussion about implied DTDs),
> the actual DTD for this example was one that I made to cover all three
> "testaments" in the sample set -- the King James Bible, the Quran, and
> the Book of Mormon.  This was a fundamentally misguided thing to do,
> and when I teach beginning SGML concepts I use the story of how I came
> to do this to show students the kinds of problems that come up in DTD
> design and the kinds of silly hacks that people make in attempting to
> deal with them.

Ok.  What is the misguided part?  I usually don't find it good 
to write "union" DTDs because I end up adding too much unnecessary 
markup possibilities to get different structures in the same tree.
It's what I don't like about the tag bag structures we get from 
38784 legacy.  It confuses the user of the editor, fattens the 
files, and overall, just bloats the information.  Since DTDs 
are pretty easy to write and select in editors, I find it better to 
keep smaller sets.  Of course, most documents I have to use 
DTDs for are structurally more complex.

>  The actual "union DTD" that I came up with back in
> 1992 was this:

Thanks. 

> But it is interesting in connection with the earlier
> thread that the actual DTD that was used for this set is one that no
> mechanical process on earth would produce from the instances.

Yes.  OTH, if we fed several documents to such a process, I wonder what 
the results would be.

> You should have seen the source for this thing; it
> took me several days of going over it with a specialized spell checker
> to get it to the point where you see it now.

Yes.  We do a lot of conversion work here where we are asked as a 
primary requirement to preserve look (?#&#^$^) and create a hypertext.
Lockheed also has some very sophisticated automated conversion software
in Orlando (ACS).  Our experience is that the scanning is in some 
ways, one of the most mind numbing bits.  The technology is certainly 
much better than it used to be, but 100%, not yet.

> On the contrary, the length of these examples is exactly why I put
> them out there.  XML makes it trivially easy to construct and process
> small examples.  I want to see what happens when prototype
> applications get fed something the size of the ot.xml file.

Good idea.  IDEAS isn't a prototype, so it works well with the 
exception of having to comment out the <? ?> version statement.
That's an easy feature to add.  On another product, I found the 
file size was a problem in the 16-bit version and that was good to know.

And as I said, good reading.  Much better than most of what I see.

Thanks again.

len
Received on Wednesday, 27 November 1996 08:48:08 EST

This archive was generated by hypermail pre-2.1.9 : Wednesday, 24 September 2003 10:03:44 EDT