(unknown charset) [Bjoern Hoehrmann] Re: IRIEverywhere-27

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

- --=-=-=
Content-Transfer-Encoding: quoted-printable

Could someone convince me that we _shouldn't_ do what Bjoern suggests
here, i.e. include an appeal to Unicode Normalization [C] in the new
universal IRI->URI algorithm?

Doing so will remove a potential serious roadblock for XLink 1.1.

I'm far from being an expert on this area, but to my naive eyes his
argument (which is a reply to my claim that Character Encoding rules
don't apply to IRI->URI mapping for XML specs, because they all start
From=20infoset values, which are defined as sequences of Unicode code
points) seems valid.

ht


- --=-=-=
Content-Type: message/rfc822
Content-Disposition: inline

X-From-Line: www-tag-request@frink.w3.org
Return-Path: <www-tag-request@frink.w3.org>
Received: from nutty-1.inf.ed.ac.uk [129.215.216.3]
	by localhost with IMAP (fetchmail-6.2.5)
	for ht@localhost (single-drop); Tue, 13 Dec 2005 18:11:29 +0000 (GMT)
Received: from pascoe.ucs.ed.ac.uk (pascoe.ucs.ed.ac.uk [129.215.128.240])
	by nutty.inf.ed.ac.uk (8.12.8/8.12.8) with ESMTP id jBDIAnui014429
	for <ht@inf.ed.ac.uk>; Tue, 13 Dec 2005 18:10:49 GMT
Received: from frink.w3.org (frink.w3.org [128.30.52.16])
	by pascoe.ucs.ed.ac.uk (8.13.4/8.13.4) with ESMTP id jBDIAmub009261
	for <ht@inf.ed.ac.uk>; Tue, 13 Dec 2005 18:10:48 GMT
Received: from lists by frink.w3.org with local (Exim 4.50)
	id 1EmEbL-00058A-EC
	for www-tag-dist@listhub.w3.org; Tue, 13 Dec 2005 18:10:11 +0000
Received: from maggie.w3.org ([193.51.208.68])
	by frink.w3.org with esmtp (Exim 4.50)
	id 1EmEbJ-00057X-OW
	for www-tag@listhub.w3.org; Tue, 13 Dec 2005 18:10:09 +0000
Received: from mail.gmx.de ([213.165.64.21] helo=mail.gmx.net)
	by maggie.w3.org with smtp (Exim 4.50)
	id 1EmEbF-00035V-8I
	for www-tag@w3.org; Tue, 13 Dec 2005 18:10:09 +0000
Received: (qmail invoked by alias); 13 Dec 2005 18:09:49 -0000
Received: from dslb-084-056-253-248.pools.arcor-ip.net (EHLO hive) [84.56.253.248]
  by mail.gmx.net (mp001) with SMTP; 13 Dec 2005 19:09:49 +0100
X-Authenticated: #723575
From: Bjoern Hoehrmann <derhoermi@gmx.net>
To: ht@inf.ed.ac.uk (Henry S. Thompson)
Cc: www-tag@w3.org
Date: Tue, 13 Dec 2005 19:10:03 +0100
Gnus-Warning: This is a duplicate of message <btutp1pa3msmgtnq43111fp5v195s6ka24@hive.bjoern.hoehrmann.de>
Message-ID: <btutp1pa3msmgtnq43111fp5v195s6ka24@hive.bjoern.hoehrmann.de>
References: <20051212154138.7a71f0ff.Vincent.Quint@inrialpes.fr> <4bdtp1p5tq99u6f7bqlrgarfg980kjvu1t@hive.bjoern.hoehrmann.de> <op.s1p3xerix1753t@ibm-60d333fc0ec.customers.eurospot.com> <qfitp19u9qs4eib2ss166gbphht88481ki@hive.bjoern.hoehrmann.de> <f5bzmn5t11e.fsf@erasmus.inf.ed.ac.uk>
In-Reply-To: <f5bzmn5t11e.fsf@erasmus.inf.ed.ac.uk>
X-Mailer: Forte Agent 3.0/32.763
X-Y-GMX-Trusted: 0
Received-SPF: pass (maggie.w3.org: domain of derhoermi@gmx.net designates 213.165.64.21 as permitted sender)
X-W3C-Hub-Spam-Status: No, score=-2.5
X-W3C-Scan-Sig: maggie.w3.org 1EmEbF-00035V-8I 2f236e530719b5944388e8af4c7e2b79
X-Original-To: www-tag@w3.org
X-Archived-At: http://www.w3.org/mid/btutp1pa3msmgtnq43111fp5v195s6ka24@hive.bjoern.hoehrmann.de
Resent-From: www-tag@w3.org
X-Mailing-List: <www-tag@w3.org> archive/latest/7695
X-Loop: www-tag@w3.org
Sender: www-tag-request@w3.org
Resent-Sender: www-tag-request@w3.org
Precedence: list
List-Id: <www-tag.w3.org>
List-Help: <http://www.w3.org/Mail/>
List-Unsubscribe: <mailto:www-tag-request@w3.org?subject=unsubscribe>
Resent-Message-Id: <E1EmEbL-00058A-EC@frink.w3.org>
Resent-Date: Tue, 13 Dec 2005 18:10:11 +0000
X-Spam-Level: 
X-Spam-Status: hits=0 tests= version=2.64+local
X-Edinburgh-Scanned: at pascoe.ucs.ed.ac.uk
    with MIMEDefang 2.52, Sophie, Sophos Anti-Virus, Clam AntiVirus
X-Scanned-By: MIMEDefang 2.52 on 129.215.128.240
Subject: Re: IRIEverywhere-27
X-Bogosity: No, tests=bogofilter, spamicity=0.500000, version=0.92.4
Lines: 32
Xref: erasmus.inf.ed.ac.uk www-tag:3703
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: quoted-printable


* Henry S. Thompson wrote:
>Precisely.  IRI-to-URI processing for XML is only coherently
>understood as a process that is defined in the terms provided by the
>Infoset spec., where _all_ values are sequences of Unicode code
>points.

Well, then you have this step,

  If the IRI is written on paper, read aloud, or otherwise represented
  as a sequence of characters independent of any character encoding,
  represent the IRI as a sequence of characters from the UCS normalized
  according to Normalization Form C (NFC, [UTR15]).

I.e., you always normalize. Could you elaborate on why e.g. the XML Core
Working Group did not adopt this step in the various specifications that
define string-to-URI conversion (XML 1.0, XML 1.1, XInclude, XLink, ..)?
The normalization step has been in the various IRI drafts for more than
7 years now and

  The XML Core WG would also like TAG input on the wisdom of early
  adoption given the "Internet Draft" status of the IRI draft [10]. So
  far adoption has relied on "copy and paste", but there is potential
  for these definitions to get out of sync.

out of sync specifications were a concern when the issue was raised.
=2D-=20
Bj=F6rn H=F6hrmann =B7 mailto:bjoern@hoehrmann.de =B7 http://bjoern.hoehrma=
nn.de
Weinh. Str. 22 =B7 Telefon: +49(0)621/4309674 =B7 http://www.bjoernsworld.de
68309 Mannheim =B7 PGP Pub. KeyID: 0xA4357E78 =B7 http://www.websitedev.de/=
=20



- --=-=-=
Content-Transfer-Encoding: quoted-printable



=2D-=20
 Henry S. Thompson, HCRC Language Technology Group, University of Edinburgh
                     Half-time member of W3C Team
    2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440
            Fax: (44) 131 650-4587, e-mail: ht@inf.ed.ac.uk
                   URL: http://www.ltg.ed.ac.uk/~ht/
[mail really from me _always_ has this .sig -- mail without it is forged sp=
am]

- --=-=-=--
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)

iD8DBQFD1kkokjnJixAXWBoRAtfNAJ9Uu65piy18Acq7NUJYtHRugMHz/QCdHCpW
yUCGE/3sq1+a5JpT3g9ziOY=
=zPH9
-----END PGP SIGNATURE-----
--=-=-=--

Received on Tuesday, 24 January 2006 15:35:10 UTC