Re: XML catalog draft

At 02:29 PM 2/9/97 -0600, Alex Milowski wrote:
>Hello all,
>I have been following the discussion of XML catalogs *very* close.  I'm
>going to start this e-mail with something that many of you may not
>agree with:
>   Catalogs are *not* sufficient for manipulation and transmission of
>   XML (or SGML) documents.


>   The XML (SGML) document is at the same "class" or "level" as the
>   other information components--style-sheets, graphics, transformations,
>   etc.  It is a very important component in the system, but not
>   very useful in a practical way without the other components.  Thus, it
>   is *not* the starting point.
>   The collective information about all the components necessary to
>   handle, process, identify, etc. this document--the meta-document--is
>   the first-class construct.  The meta-document is the starting point.

This is what a HyTime hub document is (or can be used as): a document that
lists all the other documents in the purview of the system at the moment
(e.g., the document management system, retrieval system, etc.).  This
document would normally also provide some means to access the components,
either by linking to them or by transcluding them (so that controllable
access policies can be associated with the hub document--if access does not
start through the hub document, then there's no telling which hub
document's policies apply to a given access session, because you can always
create a new hub document).

The way I think about it is that any collection of documents is always
represented by a (usually virtual but always generatable) "god" document
that declares as entities either all the documents in the system or the
first level of hierarchy of documents in the system (either will do).

Thus, the contents of any document management system can be represented by
creating a document that declares at least the top-level of documents (and
other storage objects) in the system as entities.

Note too that the set of entity declarations required by the god document
becomes the catalog, with the difference that each entry in the catalog
also has a unique name within the god document's entity name space (one
reason to declare all entitites in the god document).  In a document
management environment, the entity names in the god document are synonymous
with the unique object IDs of the objects as known to the document
management system (and likewise, the public IDs of the entities (if
defined) are synonymous with the object IDs (although there may be a
many-to-one relationship of PUBIDs to object IDs)).  In other words,
outside a document management system, I can refer to entities by their
god-document entity name, their publid ID (if defined), or by a
document/entity name pair (as any entity name is always unique within a
given document).  If I load these documents into a document management
system, I can use, in addition to the previous three forms of address, the
unique object ID assigned to each document.  Any SGML document management
system that *doesn't* provide these forms of address for document entities
*is not a document manager*. [Quick: how many "SGML" document management
systems on the market today let you address documents by public ID or
entity name?  How many can create, on demand, a god document that
represents their content? Why not?]

This is why the idea of a BOS is so important: it's central to the basic
problem of document management.  BOSs, access policies, and so on are
simply abstractions of the basic facilities that must be provided by any
document management system.

(One of the interesting implications of this is that any generalized SGML
document manager (one that actually manages documents, of which I'm not
actually aware of any in existence) can in fact manage anything for which
there is an entity declaration, simply by creating the type of
meta-structure document Alex presents in his note.  I have, for example, a
demo of using Panorama to manage Frame documents (read-only, of course)).