Quran test file from Jon Bosak on 1996-11-27 (w3c-sgml-wg@w3.org from November 1996)

From: Jon Bosak <bosak@atlantic-83.Eng.Sun.COM>
Date: Tue, 26 Nov 1996 21:50:41 -0800
To: w3c-sgml-wg@w3.org
CC: bosak@atlantic-83.Eng.Sun.COM
Message-Id: <199611270550.VAA02103@boethius.eng.sun.com>
[Len Bullard:]

| BTW, for the Quran test file, is this right?
| 
| <!ELEMENT  tstmt       - - (tttitle, fm, sbtitle, sura+ ) >
| <!ELEMENT  sura        - - (bktlong, bktshort, epigraph, v+ ) >
| <!ELEMENT   v          - - (vn, p ) >
| <!ELEMENT   epigraph   - - (p+ ) >
| <!ELEMENT   fm         - - (p+ ) >
| <!ELEMENT   bktlong    - - (#PCDATA ) >
| <!ELEMENT   bktshort   - - (#PCDATA ) >
| <!ELEMENT   v          - - (#PCDATA ) >
| <!ELEMENT   vn         - - (#PCDATA ) >
| <!ELEMENT   ttitle     - - (#PCDATA ) >
| <!ELEMENT   sbtitle    - - (#PCDATA ) >
| <!ELEMENT   p          - - (#PCDATA ) >

Can't be; you have two content models for v and vn.

Interestingly (in light of a previous discussion about implied DTDs),
the actual DTD for this example was one that I made to cover all three
"testaments" in the sample set -- the King James Bible, the Quran, and
the Book of Mormon.  This was a fundamentally misguided thing to do,
and when I teach beginning SGML concepts I use the story of how I came
to do this to show students the kinds of problems that come up in DTD
design and the kinds of silly hacks that people make in attempting to
deal with them.  The actual "union DTD" that I came up with back in
1992 was this:

  <!-- DTD for testaments    J. Bosak    921115, 940326, 940401 -->

  <!ENTITY amp "&#38;">
  <!ENTITY % data0 "#PCDATA|i">
  <!ELEMENT i        - - (#PCDATA)>
  <!ELEMENT p        - -  (%data0;)*>

  <!ELEMENT tstmt    - -  (ttitle,fm,sbttitle?,preface?,(sura | book)+)>
  <!ELEMENT ttitle   - -  ((%data0;),ttitle2?)*>
  <!ELEMENT ttitle2  - -  (%data0;)*>
  <!ELEMENT fm       - -  (p)+>
  <!ELEMENT sbttitle - -  (p)+>
  <!ELEMENT preface  - -  (ptitle, p+)+>
  <!ELEMENT ptitle   - -  (%data0;)*>
  <!ELEMENT book     - -  (bktlong, bktshort, epigraph?, bksum?, chapter+)>
  <!ELEMENT sura     - -  (bktlong, bktshort, epigraph?, bksum?, v+)>
  <!ELEMENT bktlong  - -  (%data0;)*>
  <!ELEMENT bktshort - -  (%data0;)*>
  <!ELEMENT bksum    - -  (p)+>
  <!ELEMENT epigraph - -  (p)+>
  <!ELEMENT chapter  - -  (chtitle, epigraph?, chsum?, v+)>
  <!ELEMENT chtitle  - -  (%data0;)*>
  <!ELEMENT chsum    - -  (p)+>
  <!ELEMENT v        - -  (vn, p)>
  <!ELEMENT vn       - -  (%data0;)*>

As I said, this is instructive chiefly as a bad example; I get a good
exercise out of asking students what would be wrong in an authoring
environment if we allowed these definitions of "book" and "sura" in
the same DTD.  (Another indicator of its awfulness is that I never got
around to using the element "i", which I had intended to use in the
KJV and then abandoned when I realized how much work would be
involved.)  But it is interesting in connection with the earlier
thread that the actual DTD that was used for this set is one that no
mechanical process on earth would produce from the instances.

| Also, see verse 78 of The Cow.  Arc should be are.

Thanks, I'll try to remember to fix that if I ever get around to
revising the set.  You should have seen the source for this thing; it
took me several days of going over it with a specialized spell checker
to get it to the point where you see it now.

| That one was easy.     Good reading too.
| The files are a bit big for test files though.  
| IDEAS handles them without a burp but the size 
| is overkill for testing.

On the contrary, the length of these examples is exactly why I put
them out there.  XML makes it trivially easy to construct and process
small examples.  I want to see what happens when prototype
applications get fed something the size of the ot.xml file.

Jon
Received on Wednesday, 27 November 1996 00:52:49 UTC