W3C home > Mailing lists > Public > www-international@w3.org > July to September 2006

Re: ZWJ&XML

From: John Cowan <cowan@ccil.org>
Date: Wed, 13 Sep 2006 12:21:25 -0400
To: Mark Davis <mark.davis@icu-project.org>
Cc: unicode@unicode.org, www-international@w3.org
Message-ID: <20060913162125.GN10145@ccil.org>

Mark Davis scripsit:

> As I recall, the problem with XML 1.1 adoption was that XML 1.1 was
> not fully backwards compatible with XML 1.0: there were XML 1.0
> documents that were not valid XML 1.1. 

In the sense that "XML 1.0" names a countably infinite set of abstract
objects, true; in the sense that "XML 1.0" names a set
of texts physically fixed in a tangible medium, I venture to doubt it.
Specifically, I doubt that any Real World XML 1.0 documents contained
any instances of U+007F through U+009F not as character references.

In exactly the same sense, Unicode 2.0 was not backward compatible with
Unicode 1.1, a fact which does not seem to have seriously impeded its
adoption.

The issues with XML 1.1 were in fact political; I say no more.

> As for ZWJ/NJ - the original intent was for these to not make any
> semantic difference. There is a UTC action to collect cases where they
> are being used to make a clear semantic difference (eg XXX means "sea
> gull" and XX<ZWNJ>X means "republican"), so any feedback on such cases
> would be useful.

IIRC the leading case is the plural ending in Persian.  It's not just
a matter of a clear semantic difference:  there is no semantic difference
between "they're" and "theyre" in English, but the latter is unambiguously
wrong in the standard orthography.

-- 
If you have ever wondered if you are in hell,         John Cowan
it has been said, then you are on a well-traveled     http://www.ccil.org/~cowan
road of spiritual inquiry.  If you are absolutely     cowan@ccil.org
sure you are in hell, however, then you must be
on the Cross Bronx Expressway.          --Alan Feuer, NYTimes, 2002-09-20
Received on Wednesday, 13 September 2006 16:21:30 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:08 GMT