RE: Xinclude 17-Jul7-2000 draft comments (R.A.O'K) from Jonathan Marsh on 2000-08-08 (www-xml-xinclude-comments@w3.org from August 2000)

From: Jonathan Marsh <jmarsh@microsoft.com>
Date: Tue, 8 Aug 2000 10:52:15 -0700
To: "'Richard A. O'Keefe'" <ok@atlas.otago.ac.nz>, www-xml-xinclude-comments@w3.org
Message-ID: <116DFD732FA92E4D9B647C8EEF6DAF1015E2D6@red-pt-02.redmond.corp.microsoft.com>
> -----Original Message-----
> From: Richard A. O'Keefe [mailto:ok@atlas.otago.ac.nz]

Thanks for the detailed comments!  I have incorporated all the editorial
ones.

> Section 3.1
>   paragraph 2.
>     Appears to require that the prefix of an inclusion href attribute
>     MUST be xinclude: as opposed to some prefix that maps to the right
>     namespace, whatever that may be.

This will be solved by the expected return to element syntax.

>     xmlinclude: would be a better prefix to use than xinclude:
>     here and elsewhere in this document.

I am hoping that the final version will simply use the "xml:" namespace, in
which case the xinclude: prefix will disappear.  If not, we will revisit
this issue.

>   paragraph 4.
>     The example is important, but rather confusing.

The intent is to show that the order of processing does not affect the
result.  This drives the whole definition of XInclude as a transformation
from source to result infosets.  XPointers must be resolved against the
source document.  I clarified the text, because it was confusing.
 
> Section 3.2
>   paragraph 1.
>     We are told that "The value of the xinclude:href is a URI 
> reference".
>     There are two main problems with this.
> 
>     a) There is no requirement that the URI "designate a data object".
>        While it's reasonably clear what a URI with a file:, 
> ftp:, http:,
>        or nntp: scheme means, it is not clear what a telnet: 
> or mailto:
>        URI would mean in this context.

It doesn't matter.  XInclude dereferences the URI.  If it doesn't get a
resource back (because the net is down, security doesn't allow it, the
resource doesn't exist, or the scheme isn't a fetchable one), it reports an
error.
        
>     b) The examples show fragment identifiers.  But RFC 1808 says that
>        a "fragment identifier is not part of a URL" and RFC 
> 2396 section
>        4.1 also says it "is not part of a URI".  So
> 	"#xpointer(x/myinclude[1])" is not a URI,
> 	"common.xml#xptr(a/b)" is not a URI (although it contains one),
> 	"source.xml#xpointer(string-range....)" is not a URI.

The text of RFC2396 doesn't seem to be available right now, but my
recollection is that a "URI Reference" does include the fragment.  I have
tried to use the term URI reference throughout.  Does this clear up
anything?

>     What, *exactly*, is the value of an {Xinclude}:href attribute
>     supposed to be?  If a fragment identifier appears, does it *have*
>     to be an Xpointer fragment, or can it be something else?

A URI reference, that is, an absolute or relative URI with an optional
fragment identifier.
 
> Section 3.2.1
>   paragraphs 4, 8 conflict
>     Paragraph 4 says an include element _may_ reference 
> itself as text.
>     Paragraph 8 says an include element may _not_ reference itself or
>     an ancestor.  Presumably they should read
>     [para4]
> 	An include element with parse="text" may reference itself or
> 	any of its ancestors.
>     [para8]
> 	An include element with parse="xml" or omitted may not
> 	reference itself or any of its ancestors.

Yes.

> Section 3.3.2
>   paragraph 1 slightly garbled and seems inconsistent with
>   section 3 paragraph 2.
>     Section 3 paragraph 2 says
> 	Well-formed XML entities that do not have defined infosets
> 	(e.g. an external entity with multiple top-level elements)
> 	are outside the scope of this specification
>    Section 3.3.2 paragraph 1 says
> 	An include element might identify a subresource that
> 	contains more than a single information element.
> 	[Which presumably includes a subresource that is an entire
> 	 external entity with multiple top-level elements.]
> 	In this case these information items replace the information
> 	item representing [was "the" missing here?] include element
> 	in the order in which they appear in the included document.

This is intended to handle XPointers, not external entities.  I will
clarify.

>   Missing attribute.
>     When a document is fetched using HTTP, it may have an encoding
>     value in the HTTP header.  When a document that is fetched by
>     that or any other means is an XML document, it may (but need not)
>     contain an <?xml?> declaration specifying an encoding.  But if
>     a document is fetched by nfs:, afs:, file:, ftp:, and does not
>     contain an <?xml ... encoding='...'?> declaration or is to be
>     included as text, what encoding does it use?
>
>     There is a clear need for
>       xinclude:encoding
> 	The value of this attribute is an EncName as defined in
> 	XML 1.0 spec., section 4.3.3, rule [81], specifying how
> 	the resource is to be translated.

I'll add this as an issue.

>   Optionality.
>     Does an element have to have an xinclude:parse attribute as well,
>     or is it enough for it to have an xinclude:href attribute to be
>     an inclusion?

Only href is required.  I'll clarify. 

> 
> General observation.
> 
>   Inclusion sounds like a simple problem, but this seems like a
>   cumbersome and somewhat confusing way to solve it.
> 
>   I note that it has a number of limitations:
>     - the combined document is not validated.
>     - the included material must be well-formed.
> 
>   It would be interesting to know why a simpler scheme using 
> processing
>   instructions has not been adopted.  (Note that <?xml?> processing
>   instructions already affect parsing, so there is precedent.)

Processing instructions are not in favor at the W3C.  Using a processing
instruction to cause fundamental parsing changes is not supported by the XML
spec.  The <?xml?> declaration is not a processing instruction, according to
XML 1.0.

>   "<?xml-include" (S "type" Eq Type)? (S "encoding" Eq Enc)?
>                   (S ExternalID | S "href" Eq URI)"?>
>   Where Type is "(xml|cdata)" or '(xml|cdata)'
>   and Enc is "EncName" or 'EncName'
>   and EncName and ExternalID come from the XML 1.0 spec.
> 
>   If href appears in an <?xml-include?> PI, the text to be included is
>   located as in the present draft (whatever that method is).
>   If an ExternalId appears, the resource to be included is 
> the external
>   entity thus identified.
> 
>   If type="cdata", the characters will be treated as character data
>   and not parsed.  If type="xml" or omitted, the characters will be
>   parsed as if they had appeared literally in the place of the PI.
> 
>   This would allow
> 	+--start.inc----------------+
> 	|<html><head><title>        |
> 	+---------------------------+
> 
> 	+--body.inc-----------------+
> 	|</title><body>             |
> 	+---------------------------+
> 
> 	+--end.inc------------------+
> 	|</body></html>             |
> 	+---------------------------+
> 
> 	+--sample.xml--------------------------------+
> 	|<?xml-include href="start.inc"?>An example  |
> 	|<?xml-include href="body.inc"?>             |
> 	|<p>A tiny example PIs can handle</p>        |
> 	|<?xml-include href="end.inc"?>              |
> 	+--------------------------------------------+
> 
>   This would straightforwardly map yo
> 	<html><head><title>An example
> 	</title></head>
> 	<p>A tiny example PIs can handle</p>
> 	</body></html>
>   The resources included are not, and have no particular reason to be,
>   well-formed xml.  What matters is that the combined document is
>   well-formed xml.  Not only that, it can be validated.

This seems like an approach better suited to existing text manipulation
solutions, such as Active Server Pages.  It does not handle the complex
problems of keeping namespace declarations and base URIs straight, as
XInclude does.  I think we are trying to solve a different set of problems.
 
>   The XML Information Set and similar models would have no difficulty
>   with this either:  it would be as if the processing instructions had
>   never existed.  Yes, that would be a problem for XML editors, but
>   - making life easy for editors has made infosets and DOM difficult
>     for everything else, and
>   - it is a solvable problem.  Place a PI node
> 	(PI "xml-include" "begin <rest of PI>")
>     just before the inclusion and a second PI node
> 	(PI "xml-include" "end <rest of PI>")
>     just after the inclusion, and an XML editor can then recover the
>     original structure from the infoset.


Thanks again for your comments!

- Jonathan
Received on Tuesday, 8 August 2000 13:53:08 UTC