W3C home > Mailing lists > Public > public-xml-core-wg@w3.org > February 2016

Re: Agenda for XML Core WG telcon of 2016 February 3

From: Henry S. Thompson <ht@inf.ed.ac.uk>
Date: Wed, 03 Feb 2016 15:15:35 +0000
To: Paul Grosso <paul@paulgrosso.name>
Cc: core <public-xml-core-wg@w3.org>
Message-ID: <f5begctx1t4.fsf@troutbeck.inf.ed.ac.uk>
Paul Grosso writes:

> ...
> 4.  XInclude 1.1--see http://www.w3.org/XML/Group/Core#xinclude
> ...
> Henry points out that Section 4.4 [11] references RFC 3023
> which has been superseded by RFC 7303 [12].

Here's the relevant bit of the CR draft [11]:

The encoding of [a text-included] resource is determined by:

  * external encoding information, if available, otherwise

  * if the media type of the resource indicates, according to XML Media
    Types [IETF RFC 3023], that the resource is XML, for example
    text/xml or application/xml or matches text/*+xml or
    application/*+xml, then the encoding is determined as specified in
    [XML 1.0] or [XML 1.1] section 4.3.3, as appropriate, otherwise

  * the value of the encoding attribute if one exists, otherwise

  * UTF-8.

For consistency with RFC 7303 [12], I suggest something more along the
following lines:

  The encoding of [a text-included] resource is determined as follows
  (terminology as defined in sections 2.2 and 2.3 of [RFC 7303]):

    If external (out-of-band) information supplies encoding information,
    it is used, otherwise, by cases, depending on whether, and if so
    what, MIME information is available:

    * For XML MIME entities:
        * follow the guidelines given in section 3.2 of [RFC 7303] or
          its successors;
    * For MIME entities which are not XML MIME entities:
        * As determined by a BOM (see Section 3.3 of [RFC 7303]) if it
          is present;
        * In the absence of a BOM (Section 3.3), as determined by the
          charset parameter if it is present.
        * Otherwise, UTF-8
    * For non-MIME entities:
        * If external (out-of-band) information identifies them as XML,
          then according to section 4.3.3 of [XML];
        * Otherwise, UTF-8.
    
  It is implementation-defined whether non-MIME entities will be
  "sniffed" to determine whether they might be XML, and if so whether or
  not to proceed according to section 4.3.3. of [XML].

I remain uncertain as to whether a Note to the effect that implementing
this _requires_ the ability to as it were rewind an input stream.

Note also that it appears to me that this approach has in common with
the existing text that if 'broken' XML is served with an XML Media Type
processors may throw errors even when including as text.

ht

[11] https://www.w3.org/TR/xinclude-11/#text-included-items
[12] https://tools.ietf.org/html/rfc7303
-- 
       Henry S. Thompson, School of Informatics, University of Edinburgh
      10 Crichton Street, Edinburgh EH8 9AB, SCOTLAND -- (44) 131 650-4440
                Fax: (44) 131 650-4587, e-mail: ht@inf.ed.ac.uk
                       URL: http://www.ltg.ed.ac.uk/~ht/
 [mail from me _always_ has a .sig like this -- mail without it is forged spam]
Received on Wednesday, 3 February 2016 15:16:31 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 3 February 2016 15:16:31 UTC