Re: External parsed entities (Re: Inconsistency between IETF and W3C...) from Chris Lilley on 1999-11-30 (xsl-editors@w3.org from October to December 1999)

From: Chris Lilley <chris@w3.org>
Date: Tue, 30 Nov 1999 14:56:06 +0100
To: MURATA Makoto <murata.makoto@fujixerox.co.jp>
CC: Dan Connolly <connolly@w3.org>, timbl@w3.org, simonstl@simonstl.com, ietf-xml-mime@imc.org, Tsmith@parc.xerox.com, xsl-editors@w3.org, masinter@parc.xerox.com
Message-ID: <3843D776.67E325A2@w3.org>
MURATA Makoto wrote:
> 
> Chris Lilley wrote:
> 
> > Yes, agreed.
> >
> > > We have two choices.  One is to use text/xml or application/xml even for
> > > external parsed entities.  The other is to use application/xml-epe
> > > only for those external parsed entities which are not XML documents.  I think
> > > that the latter is a complicated rule.
> >
> > The former also has complications, sinc eit means that application/xml
> > is "sometimes but nnot allways, well-formed xml". Since the terms valid
> > xml and well-formed xml are defined, but there is no defined term for
> > "stuf that is not wellformed", this is a problem. I think that this is
> > significant complication.
> 
> Well, I do not think this is complicated.  text/xml or application/xml
> means either external parsed entities or document entities.  This is simple.

And you may or may not be able to send the result to an XML parser. This
is not simple.
> 
> > Wheras for the latter option, it is simple. Is the epe itself a
> > well-formed document (this is easy to check mechanically). if yes, label
> > it as applicatio/xml. If no,label it as application/xml-epe (or whatever
> > term is chosen). This seems a simple, readily understod, and
> > machine-processable rule.
> 
> Suppose that you make an XML document which references to an external
> parsed entity.  You are very likely to inform the URI of that document
> to recipients but not that of the external parsed entity.  The external
> parsed entity will thus be fetched only from XML processors during
> parsing.  The fact that it is labelled as text/xml or application/xml
> does not cause any problems.

Provided that it is not also referenced directly from anywhare - which,
if it is also a well-formed document in its won right, it might be.

I think "security through obscurity" is a poor plan when it would be
better justto have unambiguous labelling in the first place.

> But the URI of the external parsed entity may become disclosed and some
> program (e.g., WWW robots) may fetch it as a MIME entity.  This program
> does not know if this MIME entity is an XML document or external parsed entity.
> If it parses as XML, it is an XML document. 

If it does not, then it is a fatal error. Hwever, according to your
proposal, it might still have been correctly labelled. According to
mine, it would have ben incorrectly labelled.

Its the principle of least surprise, really.

>  Even if it does not, it may or may
> not be an external parsed entity.  Is this a problem? 

Yes, clearly. You describe a process whereby the MIME type told you no
useful information abnout the requested resource. That sounds like a
problem to me.

> > > However, I have assumed that this issue is not very important since
> > > we should anyway avoid external parsed entities at all in the Internet.
> >
> > (Out of curioisity - why? In the context of HTTp/1.1 keep alive - its
> > not very expensive to fetch an epe once. If the epe is shared between
> > two or more documents, ther eis a net win even with HTTP/1.0)
> 
> Because different processors emit different outputs.  I personally think that
> in the Internet, we should never use (1) default values declared in external
> DTD subsets and external pararmeter entities, and (2) external parsed entities.

That is your choice if you don't want to use these features. But the
features are a legal part of the XML 1.0 spec and thus, any solution for
a MIME type or types from XML has to address all legal cases, not just
the ones you plan to use. I would have thought that was straightforward.
Or did you mean that, you do not plan to use themand you believe that
no-one else either uses or will use them? If that is the case, i can
readilly provide counter-examples.


> > Well, there is a move to define a category of "full infoset" parsers -
> > non validating, but which fetch epe's and external DTD subsets - which
> > deals with this problem.
> 
> I am not aware of such a move, and I have been a member of the XML Syntax WG.

Ask Tim Bray about it. He proposed the term, i believe.

> I am aware of a move for so-called "trivial subset".  But I do not know
> what will happen.

No, this is a move in the opposite direction. It recognises that the XML
spec defines a high ground (full validation) and a low ground (no
external DTD or entities fetched) but that in practice, there is a
valuable middle ground (no validation, but all external DTDs, external
parameter entities they refer to, and external parsed entities
referenced from the instance are fetched and used for such purposes as
attribute defaulting, declaration of ID, and suchlike. In practice, it
is this middle ground which is frequently that implemented by parsers
and which itwould be good to rely on for XML applications, yet there is
no defined name for this thing and thus no way to claim conformance to
it.

But this is not the forum for sucha topic, I apologise for the
digression.
> 
> > Regardless, it is legal now to use epes, and thus, a rule needs tobe
> > established for labellingthem; and the rule needs to cover all legal
> > cases, not just some frequently occurring ones.
> 
> I think that the current rule satisfies there criteria.  The only
> caveat is that (1) to know if an XML MIME entity is an XML document,
> you have to parse it, and (2) even if an  XML MIME entity does not
> parse as an XML document, you are not sure if it is an external
> parsed entity or not.

I have difficulty accepting these caveats, when there are better options
available. The sniffing procedure you describe, for one thing, seems to
mandate recovery from a fatal error.

> I do not think they are problems.  As for (1), you have to parse it anyway,
> since the MIME header may be wrong. 

That argument could be used, in the limit, to show that ther eis no need
for any MIME labelling at all. Its not a good direction to take.

> As for (2), are they any requirements to
> distinguish those text which may become external parsed entities, and those which
> cannot?  To me, what does not parse as an XML document is useless.

But to others, perhaps not - if it parses as an epe and is being used as
such.

So, to be clear, I am suggesting

a) application/xml for xml files. All are required tobe well formed,as
per the XML specification, otherwise it is a fatal error.
b) application/xml-epe for external parsed entities which are not
themselves well-formed instances


> By the way, if there is a strong reason for introducing a specialized
> media type for external parsed entities, we also need another media
> type for external *parameter* entities.

Yes. Which would then mean that -epe would be a bad choice of name ;-)
and another one should be sought. Perhaps -pars and -pram

--
Chris
Received on Tuesday, 30 November 1999 08:57:20 UTC