Re: the return of the Public Identifier Question

Martin Bryan <mtbryan@sgml.u-net.com> wrote:
> Paul Prescod wrote in response to Paul Grosso's message:
[...]
> >I think that it is just a logical error to have the public identifier and
> >system identifier point to non-equivalent storage objects. 
> 
> It is not a logical error - it is an absolute necessity. 
> 
> Consider the case where an XML-LINK uses an option, such as REPLACE, that
> cannot be supported on a receiving system. The system may require that a
> local variant of the DTD be produced and that this local variant be used to
> override the vesion pointed to by the system identifier. 
> 
> Similarly I may need to restrict the set of colours that an HTML document
> can be displayed in to ensure that no one reading the document at my site
> who suffers from epilepsy will be effected by a bad colour combination in a
> bgcolour or colour attribute.

The SGML "promise", as I understand it, is that the result of fetching an
entity referred to by a Public Identifier will be the same today, tomorrow
and always, here and in Madagascar and on the Moon.

I say "promise" partly because
(1) I don't see this explicitly stated in ISO 8879:1986,
    so this fails my "conformance to folklore must not be required" test;
(2) it is obviously not enforced technically, and no mechanisms for
    so enforcing it appear to have been envisaged at any time;
(3) the SDATA character entity sets, given as examples in the Annex to
    ISO 8879, are meant to be edited on a per-system basis, as Charles
    has told us, in which case we have a clear contradiction, as you'd
    get a different entity on different systems.

Sometimes, then, a public identifier is like a URN, and refers to a constant
unchanging resource.  Sometimes it's just a label for indirection, for
administrative convenience, and the result of looking it up may be
undefined, may differ depending on your browser or the day of the week,
and so forth.

Certainly when you get down to resolving a URL with HTTP you are going
to be subject to the possibility of content negotiation, whether you like
it or not, and you may well get different results from time to time.
Wanting few colours could (at least in principle) be done that way.

I don't think it's reasonable to require servers to do subsetting based on
features.  The idea is that XML is so simple to implement that you won't
need that.  If some people aren't goiong to implement REPLACE, it needs to
be removed from the spec.  ISO is a forum for speculative standards that
have not yet been implemented, with features that "may prove useful".
But we aren't writing an ISO standard, and should not be burdened with
that kind of politics here.  Each feature must be known to be essential.

> >You can reliably continue this practice
> >by deleting the system identifier. 
> 
> 99.9% of XML files will be read only - you can't just delete their system
> identifier.

I agree with this.  In particular, if you deliver XML over the internet,
if the receiver has to get out Windows Notepad and edit a DOCTYPE line,
you're sunk.


If PUBLIC identifiers are there, we need to decide whether they are merely
strings, and can be treated as a sort of system identifier with indirection,
for administrative convenience when using poorly designed tools, or whether
they are formal names for fixed objects, or whether they are something else.

If we can't decide that, let's leave out PUBLIC identifiers at least for
this release.

Lee

Received on Thursday, 20 March 1997 13:39:49 UTC