definition of "character" from by way of Martin Duerst on 2003-09-26 (uri@w3.org from September 2003)

From: by way of Martin Duerst <mike@skew.org>
Date: Fri, 26 Sep 2003 09:52:33 -0400
To: uri@w3.org
Message-Id: <4.2.0.58.J.20030926095228.02dc59b8@localhost>

 >From the outset, there are many mentions of "character":

Abstract

   "A Uniform Resource Identifier (URI) is a compact string of
   characters..."

1.1 Overview of URIs

   "A URI is an identifier that consists of a sequence of
   characters..."

1.3 Syntax Notation

   "Although the ABNF defines syntax in terms of the US-ASCII
   character encoding [ASCII], the URI syntax should be
   interpreted in terms of the character that the ASCII-encoded
   octet represents..."

2 Characters

   "A URI consists of a restricted set of characters..."

2.1 Encoding of Characters

   "As described above, the URI syntax is defined in terms of
   characters..."

Yet nowhere is "character" actually defined. I don't feel that
the disclaimers regarding syntax notation are adequate to impress
upon the reader that a character is an abstract unit in a written
language, rather than a relatively concrete data type that manifests
as octets, as so many readers of this spec probably believe. The many
perceptions of character are discussed at length in section 3.1 of
http://www.w3.org/TR/charmod/, although that document seems to stop
short of settling on a canonical definition for the W3C's purposes.
(Martin, care to comment?)

Early in section 2, I think you should at least make an attempt
to define "character", as used in this spec, for the sake of the
masses who are not used to distinguishing between the various levels
of abstraction covered in UTR#17.

-Mike

Received on Friday, 26 September 2003 09:53:00 UTC