Re: The furore over PUBLIC

Michael Sperberg-McQueen <U35395@UICVM.UIC.EDU> wrote:

> Alternate character encodings [...] are a good example.
> Note which of the following three positions we took:
>   1 fixed encoding:  everything in UTF-8 (alternate form:  everything
>     in UCS-2)
>   2 anarchy:  no requirements, processors do what they like
>   3 required minimum which all processors must support; nothing to
>     prevent them from supporting more.

That is a good example.  Along those lines, I believe that the
required minimum support for PUBLIC identifiers should be:

     "XML processors must supply to the application the PUBLIC ID
      of any entities for which one is specified (including the
      DTD entity)."

That's it.  No attempt at resolution is required, although
processors are of course free to do so if they like.

Remember that with <?XML RMD=NONE?> and <?XML RMD=INTERNAL?>
there is no need to retrieve the DTD external subset.  That's why
"no resolution mechanism at all" is perfectly acceptable as
a minimum requirement: it would let people put PUBLIC IDs
in <!DOCTYPE...> declarations, which is where they are most

>     If I know that *the processor(s)
>     I care about* support the encoding I like, I can use it.  If I
>     care about having *all* processors be able to read my docs, I
>     know what to do, and I can ensure that all processors can handle
>     my docs.

The same holds for PUBLIC IDs.  If I want to ensure that all processors
can handle my documents, I'll use RMD=NONE or a SYSTEM ID.
If I know that the particular processors I care about can resolve the
public identifier -- either because they support automatic FPI
resolution or because they have a built-in knowledge of the
FPI in question, for example a special-purpose CML browser --
that's OK too.

> I have a question for the WG members who want PUBLIC to be syntactically
> legal, but don't want to specify even a non-exclusive minimum
> resolution method.  Serious question, not sarcastic (despite the
> tone).  Why bother?

It would allow XML users to supply a name for their document types,
instead of just an address where a copy of the formal part of the DTD
may be found.  The former is useful; with RMD=NONE, the latter is not.

Why is an FPI useful?  Well, it would let (say) a special-purpose
CML or CDF browser know that a document is, in fact, marked
up as "Chemical Markup Language" or "Channel Data Format" and
not some other DTD that happens to have the same initials.
It would allow browsers to recognize that the stylesheet for
"-//Davenport//DTD DocBook XMLified//EN" that they downloaded
from the O'Reilly website will also work with DocBook XML documents
on Sun's website.  It would let people move a collection of
XML documents from one host to another without having to
change all the <!DOCTYPE...> declarations (which would be
especially annoying knowing that most browsers are never
going to use that SYSTEM ID anyway).

Allowing PUBLIC IDs for all entities, not just DTDs, would make
it possible to mirror and maintain large collections of XML
documents by updating SOCAT files.  True, such collections would
only be usable by XML processors that have SOCAT support, but
for publishers like Sun and the Linux Documentation Project that
tradeoff is probably worth it.

> If PUBLIC is not allowed, I agree life will be tough for you.  You'll
> have lots of non-conforming data that would be conforming if only PUBLIC
> were legal.  You'll have to write your own software to support XML.  On
> the plus side, you'll be able to write in whatever resolution mechanism
> you like, including the ever popular sequential-search-through-the-Web
> and the shell-out-and-search-altavista-for-the-FPI method.  On the minus
> side, you'll have to do it yourself.  So I agree, life will be hard,
> though perhaps still worth living.
> What I don't understand is this.  How would this picture change if
> PUBLIC were legal in XML, without a standard resolution mechanism?

For starters, all our non-conforming data that uses PUBLIC would be
conforming :-)

Implementors will have to do a tiny bit more work (to parse and pass
on PUBLIC IDs).  Those who feel that PUBLIC IDs are worthless without
a resolution mechanism can still use SYSTEM.  Those of us who use
PUBLIC IDs will not be forced to remove important information from
our documents in order to make them XML-conformant.  Who loses here?

--Joe English


Follow-Ups: References: