Formal Public Identifiers [Was: HTML 3.2 PR (fwd)]

Walter Ian Kaye <walter@natural-innovations.com> writes:
>At 2:44p -0800 11/15/96, MegaZone wrote:
>
>>DOCTYPE must stay for SGML.  And if you want your document to be truly
>>valid, you use it.  Why is this complicated?  Most decent editors insert
>>it, and experienced users can insert it by hand.
>>
>>No problem.  You'd have to be pretty clueless to have problems with it.

Agreed. And you can't leave it out: it is a required part of every HTML
document (see the HTML specification, IETF RFC 1866). It's not just "for
SGML"; it's what makes your document HTML. If you leave it out you no
longer have an HTML document, you have a text document with some HTML tags
in it.

>What's complicated is all the "-//xxx/" stuff inside the DOCTYPE tag.
>Mere mortals cannot guess what is valid in there, and thus the safest
>things is to leave it out completely. The suggested thing to put there
>has changed so many times over the years and months that it is impossible
>for anyone to know for sure what to put. Who assigns that stuff anyway?
>Does IANA decide? W3C/ERB? You? Me? Captain Kangaroo? Who knows? Not I.

Well, I never thought about myself as more than mere mortal. Hmmm. Maybe I
should think about this...

...on closer examination, I'm not any smarter than the average monkey (but
I have less hair). This is not exactly rocket science. It's the label for
the Document Type Definition (DTD) to which (theoretically) the document
conforms. Each piece of public text (such as a DTD) has a unique label
called a Formal Public Identifier (FPI). The label for a particular DTD
doesn't change one iota once it is published. In fact, that's the whole
point. If the text changes, there should be a new, unique FPI assigned to
it.

   ####  Rev' up your engines, we're going on an FPI road trip!  ####

NOTE: The FPI's double solidus (forward slash) are field delimiters.

              1    2       3   4    5   6                7
    <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0 Strict//EN" >

    1. "HTML"
       The SGML document type being declared: <HTML> ... </HTML>

    2. "PUBLIC"
       This means the following literal is a formal public identifier.
       Using SYSTEM here would mean the following literal was a system
       identifier (like a pathname, or even a URL on systems that
       support it (like SP)).

    3. "-"
       A minus sign means an unregistered organization. ISO, registered (+)
       or unregistered (-) are possibles here. IETF for whatever reason
       isn't registered.

    4. "IETF"
       This is the unique owner identifier or ownerID. This is the party
       responsible for creation/maintainance of the object/document/etc.
       If the DTD comes from IETF, W3C, etc. you'll see their ownerID here.

    5. "DTD"
       This describes the type of object, called a Public Text Class. It
       might be a DTD, a file full of entity declarations, a text document,
       etc. There's a bunch of possibilities; DTDs always get a "DTD" (what
       else?)

    6. "HTML 2.0 Strict"
       This is the Public Text Description, which describes the public text.
       Each piece of public text within the domain of its ownerID must
       have a unique public text description. Here you'll find the object's
       name, plus flavors such as version numbers, "strict", etc.

    7. "EN"
       This is called Public Text Language, describing the natural language
       in which the public text is written. This is the two, uppercase-only
       characters from ISO 639. In this case it is "EN" (English).

Any HTML book worth reading (and *any* SGML book) will discuss this stuff
in detail. Martin Bryan's "Author's Guide to SGML" is good if you're not
looking to spend too much money. If you want to fork out US$100, get
Goldfarb's "SGML Handbook" and begin your intense meditation on words from
the prophet hisself.

I hope this clears up any confusion. Pardon my antics. The stomach is calling.

Murray

```````````````````````````````````````````````````````````````````````````````
    Murray Altheim, Program Manager
    Spyglass, Inc., Cambridge, Massachusetts
    email: <mailto:murray@spyglass.com>
    http:  <http://www.cm.spyglass.com/murray/murray.html>
           "Give a monkey the tools and he'll eventually build a typewriter."

Received on Friday, 15 November 1996 20:41:09 UTC