W3C home > Mailing lists > Public > www-xml-schema-comments@w3.org > January to March 2002

Suggestion for an anyString datatype

From: Rick Jelliffe <ricko@topologi.com>
Date: Thu, 28 Mar 2002 16:55:36 +1100
Message-ID: <00ad01c1d61d$2e83f4c0$4bc8a8c0@AlletteSystems.com>
To: <www-xml-schema-comments@w3.org>
From: "Mike Brown" on XML-DEV <mike@skew.org>

> I think the solution should not have to involve changing the semantics or the
> level of abstraction at which a character reference operates. They should not
> tread some middle ground between the fairly discrete levels of abstraction
> (between characters, code points, encodings) that have been established in XML
> 1.0 and that are, IMHO, not crying out to be broken just to make it easier for 
> XML to carry binary payloads.

But I think the real issue here is a flaw in XML Schemas: that the bin64 datatype
was introduced to allow transmission of data that could not be fitted into XMLs
constraints, but that once it has been received there is no way to restore it
to its original form: we can un-encode it, but into what?  The recent duscussion
by the XML Core WG to open up XML to include more control characters
is predicated on the failure of bin64, it seems.*

And there needs to be a change in the type hierarchy to introduce
where anyString allows any characters (except 0x00) in Unicode except
surrogates (by definition) and has a facet 
  transmissionEncoding ( plain | bin64 | bin16 | q ) "plain" 
which expresses the lexical form of the data being sent. 

So anyString is the primitive, and string is the derived type with 
transmissionEncoding set to "plain".   The PSVI of a document
must provide the unencoded text of the document. 

Rick Jelliffe

* On the subject of control characters, I believe it is important for an
XML 2.0 to move in the opposite direction: to ban the C1 controls.
See http://www.topologi.com/public/XML_Naming_Rules.html
Received on Thursday, 28 March 2002 00:45:17 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 23:08:57 UTC