W3C home > Mailing lists > Public > www-html@w3.org > September 2008

RE: DTD interrogations

From: Dubuc Jean-Pierre <jdubuc@oqlf.gouv.qc.ca>
Date: Tue, 30 Sep 2008 15:46:22 -0400
To: 'David Woolley' <forums@david-woolley.me.uk>
CC: "'www-html@w3.org'" <www-html@w3.org>
Message-ID: <BD03ABA010D90546AEA09271A47A275C0B0EFFE2F7@mercier.oqlf.gouv.qc.ca>

I think I found the way to formulate this SGML statement.

In the W3C document, this line :
"A | B   Either A or B must occur, but not both."

Should be followed by something like this :
"(A|B)   Either A or B must occur."

The meaning of this addition to the document is to specify that the '|' is "exclusive" without parentheses, but "inclusive" within them.

That way, this element declaration :
<!ELEMENT DL    - - (DT|DD)+>

Can be interpreted as :
"The DL element must contain one or more of either DT or DD elements in any order.

As an example, the following declaration :
<!ELEMENT DL    - - DT+|DD+>

Should be interpreted as :
"The DL element must contain one or more of either DT or DD elements, but not both.

Can any SGML expert confirms it's the right way to understand the "|" operator, within or without parentheses ?



-----Message d'origine-----
De : David Woolley [mailto:forums@david-woolley.me.uk]
Envoyé : 25 septembre 2008 17:41
À : Dubuc Jean-Pierre
Cc : 'www-html@w3.org'
Objet : Re: DTD interrogations

Dubuc Jean-Pierre wrote:
> I'm learning how to read and understand DTD and came up with those
> interrogations.
> ref.: http://www.w3.org/TR/REC-html40/intro/sgmltut.html#h-3.3

This is a tutorial, and isn't normative.  What actually defines how to
interpret HTML DTDs is the SGML specification (which isn't a free
download).  However, I would suggest that anyone with any familiarity
with regular expresssions, the meanings of most things should be clear.

The XML specification is normative for XML DTDs, but few new
developments use XML DTDs.

> According to this statement :
> A | B   Either A or B must occur, but not both.
> This element declaration :
> <!ELEMENT DL    - - (DT|DD)+>
> Should read in my perception :
> The DL element must contain one or more of either DT or DD elements in
> any order, but not both.

Exactly one of DT and DD must occur on every pass through the
expression, but the contents of the parentheses are re-interpreted each
time. Your interpretation would be written as (DT+)|(DD+), although I'm
not sure if the parentheses are needed.  How would you represent
(DT|DD)+ with your interpretation of the notation?

> Also, I'd like to know according to this statement :
> +(A)    A may occur.

+ is not monadic.  (B)+(A) means any number of As may occur within the
structure described by B.  A* means there can be a run of As, but they
must be at the current expression level and consecutive.  At most one
would be A?.

Note that diadic + and - are not regular expression operators.

David Woolley
Emails are not formal business letters, whatever businesses may want.
RFC1855 says there should be an address here, but, in a world of spam,
that is no longer good advice, as archive address hiding may not work.
Received on Tuesday, 30 September 2008 20:22:53 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 15:06:21 UTC