Re: Regarding XML Proposed Erratum 71 from Martin Duerst on 2001-06-18 (xml-editor@w3.org from April to June 2001)

From: Martin Duerst <duerst@w3.org>
Date: Mon, 18 Jun 2001 16:49:06 +0900
To: Paul Grosso <pgrosso@arbortext.com>, Francois Yergeau <FYergeau@alis.com>
Cc: xml-editor@w3.org, w3c-xml-core-wg@w3.org, w3c-i18n-ig@w3.org, connolly@w3.org
Message-Id: <4.2.0.58.J.20010618163240.0389d600@sh.w3.mag.keio.ac.jp>

At 09:14 01/06/14 -0500, Paul Grosso wrote:
>At 13:14 2001 06 14 +0900, Martin Duerst wrote:
> >Dear XML core WG,
> >
> >By chance, I just discovered Proposed Erratum 71:
> >
> >http://www.w3.org/XML/Group/2000/10/proposed-xml10-2e-errata#PE71
> >
> >It is true that this is a bit vague in not saying who is
> >responsible for the escaping, but this has been fixed by
> >PE 51/E4 to say that the XML processor is responsible:
> >
> >http://www.w3.org/XML/Group/2000/10/proposed-xml10-2e-errata#PE51
> >http://www.w3.org/XML/xml-V10-2e-errata#E4
>
>
>Right, but I think this erratum is wrong, so I'm asking to
>reopen this issue.

If you think erratum http://www.w3.org/XML/xml-V10-2e-errata#E4
is wrong, then that's not a problem with that erratum, but it's
a problem with the XML Rec as it came out in Feb 1998:

http://www.w3.org/TR/1998/REC-xml-19980210#sec-external-ent:
An XML processor should handle a non-ASCII character in a URI
by representing the character in UTF-8 as one or more bytes,
and then escaping these bytes with the URI escaping mechanism
(i.e., by converting each byte to %HH, where HH is the hexadecimal
notation of the byte value).

As you see, it starts with "An XML processor". Erratum E4 just
restored that, after it got lost when working out the details
of the conversion in the second edition.

>A system id should be a string to the
>XML processor, and that's what production 11 makes clear.

Yes, the XML processor sees this as a string according to [11].
But the question is what the XML processor does with it.

>Escaping may be necessary before doing something URI-ish
>with the string, but that should be done by the process
>doing something URI-ish, not the XML processor.  Norm
>explains how an entity resolution process is one example
>of why the XML processor should not to the escaping.

The XML System Identifier *is* an URI (modulo some syntactical
differences that are dealt with as described).

Doing something 'uri-ish' with it means just dealing with
it according to it's nature. The term 'uri-ish' is therefore
not appropriate. It would be much better to say that you want
to do something 'catalog-ish' with the XML System Identifier
URI.

In this respect, XML is definitely different from SGML.
Changing that would change the nature of XML quite a bit.

Also, the process that does URI resolution with a system
identifier has to get an URI, within strict syntax limitations,
or it may cause an error. That's why the XML processor does
the conversion, before handing it off.

I agree with you that catalog-ish resolution doesn't
have to do the escaping, but it's the business of the
catalog spec to deal with that, not XML.

Regards,   Martin.

Received on Monday, 18 June 2001 03:56:50 UTC