Re: External parsed entities (Re: Inconsistency between IETF and W3C...) from MURATA Makoto on 1999-11-30 (xsl-editors@w3.org from October to December 1999)

From: MURATA Makoto <murata.makoto@fujixerox.co.jp>
Date: Tue, 30 Nov 1999 19:48:36 +0900
To: Chris Lilley <chris@w3.org>
Cc: Dan Connolly <connolly@w3.org>, timbl@w3.org, simonstl@simonstl.com, ietf-xml-mime@imc.org, Tsmith@parc.xerox.com, xsl-editors@w3.org, masinter@parc.xerox.com
Message-Id: <199911301048.AA03433@archlute.fujixerox.co.jp>

Chris Lilley wrote:

> Yes, agreed.
> 
> > We have two choices.  One is to use text/xml or application/xml even for
> > external parsed entities.  The other is to use application/xml-epe
> > only for those external parsed entities which are not XML documents.  I think
> > that the latter is a complicated rule. 
> 
> The former also has complications, sinc eit means that application/xml
> is "sometimes but nnot allways, well-formed xml". Since the terms valid
> xml and well-formed xml are defined, but there is no defined term for
> "stuf that is not wellformed", this is a problem. I think that this is
> significant complication.

Well, I do not think this is complicated.  text/xml or application/xml 
means either external parsed entities or document entities.  This is simple. 

> Wheras for the latter option, it is simple. Is the epe itself a
> well-formed document (this is easy to check mechanically). if yes, label
> it as applicatio/xml. If no,label it as application/xml-epe (or whatever
> term is chosen). This seems a simple, readily understod, and
> machine-processable rule.

Suppose that you make an XML document which references to an external 
parsed entity.  You are very likely to inform the URI of that document 
to recipients but not that of the external parsed entity.  The external 
parsed entity will thus be fetched only from XML processors during 
parsing.  The fact that it is labelled as text/xml or application/xml 
does not cause any problems.

But the URI of the external parsed entity may become disclosed and some 
program (e.g., WWW robots) may fetch it as a MIME entity.  This program 
does not know if this MIME entity is an XML document or external parsed entity.  
If it parses as XML, it is an XML document.  Even if it does not, it may or may 
not be an external parsed entity.  Is this a problem?  I do not see any 
problems.

> > However, I have assumed that this issue is not very important since
> > we should anyway avoid external parsed entities at all in the Internet.
> 
> (Out of curioisity - why? In the context of HTTp/1.1 keep alive - its
> not very expensive to fetch an epe once. If the epe is shared between
> two or more documents, ther eis a net win even with HTTP/1.0)

Because different processors emit different outputs.  I personally think that 
in the Internet, we should never use (1) default values declared in external 
DTD subsets and external pararmeter entities, and (2) external parsed entities.

> > If external parsed entities are used, different parses emit different
> > results.  (See "5. Conformance" of the XML recommendation
> > http://www.w3.org/TR/REC-xml#sec-conformance)
> > 
> > >For maximum reliability in interoperating between different XML processors,
> > >applications which use non-validating processors should not rely on any
> > >behaviors not required of such processors.
> 
> Well, there is a move to define a category of "full infoset" parsers -
> non validating, but which fetch epe's and external DTD subsets - which
> deals with this problem.

I am not aware of such a move, and I have been a member of the XML Syntax WG.  
I am aware of a move for so-called "trivial subset".  But I do not know 
what will happen.  

> Regardless, it is legal now to use epes, and thus, a rule needs tobe
> established for labellingthem; and the rule needs to cover all legal
> cases, not just some frequently occurring ones.

I think that the current rule satisfies there criteria.  The only 
caveat is that (1) to know if an XML MIME entity is an XML document, 
you have to parse it, and (2) even if an  XML MIME entity does not 
parse as an XML document, you are not sure if it is an external 
parsed entity or not.

I do not think they are problems.  As for (1), you have to parse it anyway, 
since the MIME header may be wrong.  As for (2), are they any requirements to 
distinguish those text which may become external parsed entities, and those which 
cannot?  To me, what does not parse as an XML document is useless.  

By the way, if there is a strong reason for introducing a specialized 
media type for external parsed entities, we also need another media 
type for external *parameter* entities.

Cheers,

Makoto

Fuji Xerox Information Systems

Tel: +81-44-812-7230   Fax: +81-44-812-7231
E-mail: murata.makoto@fujixerox.co.jp

Received on Tuesday, 30 November 1999 05:46:57 UTC