W3C home > Mailing lists > Public > uri@w3.org > September 2003

definition of "character"

From: by way of Martin Duerst <mike@skew.org>
Date: Fri, 26 Sep 2003 09:52:33 -0400
Message-Id: <>
To: uri@w3.org

 >From the outset, there are many mentions of "character":


   "A Uniform Resource Identifier (URI) is a compact string of

1.1 Overview of URIs

   "A URI is an identifier that consists of a sequence of

1.3 Syntax Notation

   "Although the ABNF defines syntax in terms of the US-ASCII
   character encoding [ASCII], the URI syntax should be
   interpreted in terms of the character that the ASCII-encoded
   octet represents..."

2 Characters

   "A URI consists of a restricted set of characters..."

2.1 Encoding of Characters

   "As described above, the URI syntax is defined in terms of

Yet nowhere is "character" actually defined. I don't feel that
the disclaimers regarding syntax notation are adequate to impress
upon the reader that a character is an abstract unit in a written
language, rather than a relatively concrete data type that manifests
as octets, as so many readers of this spec probably believe. The many
perceptions of character are discussed at length in section 3.1 of
http://www.w3.org/TR/charmod/, although that document seems to stop
short of settling on a canonical definition for the W3C's purposes.
(Martin, care to comment?)

Early in section 2, I think you should at least make an attempt
to define "character", as used in this spec, for the sake of the
masses who are not used to distinguishing between the various levels
of abstraction covered in UTR#17.

Received on Friday, 26 September 2003 09:53:00 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:25:06 UTC