Re: XML Entity Manager

> I am surprised that nobody has remarked on the woeful absence of discussion
> of the entity management problem both in our goals document and here on
> the list.  So... let me as usual advance the minimalist position:
> 
> XML offers only 4 kinds of external entities: the cross-product of 
> 
> (PUBLIC or SYSTEM) x (PCDATA or NDATA)
> 
> PCDATA/NDATA offer little for discussion, you either parse it or 
> you don't, right?
> 
> SYSTEM entities, somebody at some point suggested outlawing them, and there
> are certainly good reasons for that, but in XML I think we'd like easy stuff
> to be easy, which SYSTEM does; it should be fenced around with stern warnings
> about nonportability.

In every SGML system I have ever built, setup, or used, I have *never* used
system entities.  It will always lead to trouble.

I would support banning them, but would concede to allowing them.

There is a third type of entity to look at:  system default.

System default entities are a lookup by name.  They are not very good
for interchange but are excellent means for achieving system independence
in a document without having to declare the entities explicitly.  The
reason why this is good is that in a internal publishing system, you could
have thousands of entities (I have a client who has over 10000 separate
language entities).  In their internal use of entities, to declare all the
possible entities that a document *could* use is practically speaking
impossible--the parse of the document would take a *very* long time.

In my opinion, there are three to decide on:

1. System entities (Explicit system ids--non-transportable).
2. System defaulted (Lookup by name).
3. Public entities (Lookup by public identifier).

2 & 3 are covered by the SGML Open catalog format.

In SGML open, 2 is:

ENTITY foo "blah"

3 is:

PUBLIC "-//Someone//DTD Something//EN" "somewhere"

We should also embrace the concept of Formal System Identifiers.  This
would allow a document to push transport issues into the environment (catalogs)
instead of being embedded in the document. 

For example, an HTML HREF could be:

<A HREF=DocumentFoo>Foo</A>

and the catalog entry could be:

ENTITY DocumentFoo "<URL>http://mydomain.com/documentfoo.html"

Of course, there's a little more to location-independent linking than the
above.  (See the HyTime Clink and Nameloc.)

Comments?

==============================================================================
R. Alexander Milowski     http://www.copsol.com/   alex@copsol.com
Copernican Solutions Incorporated                  (612) 379 - 3608

Received on Saturday, 21 September 1996 12:11:09 UTC