Re: Link-6: Addressing at the sub-element level from Martin Bryan on 1997-05-19 (w3c-sgml-wg@w3.org from May 1997)

From: Martin Bryan <mtbryan@sgml.u-net.com>
Date: Mon, 19 May 1997 12:12:37 +0100
To: w3c-sgml-wg@w3.org
Message-Id: <1.5.4.32.19970519111237.006dd018@mail.u-net.com>

At 11:38 18/5/97 -0700, Tim Bray wrote:
>A lot of people want to support addressing by char count, token
>count, or regexp, within #PCDATA (or [danger, Will Robinson!] mixed)
>content.
>
>This is obviously a good idea for many applications.  It is also 
>somewhat more difficult than you'd expect in the context of 
>wide Unicode characters.  Opinions are solicited as to whether
>this should be done for V1.0, and if so, which ones should be done,
>and if so, what to do about the internationalization issues.
>
Nobody seems to be complaining about having to tokenize attribute values
that contain Unicode values. In what way is this any easier than the
tokenization of Unicode PCDATA? (Or is this just being overlooked at present?)

We have to bite the Unicode tokenization issue at some stage. The mixed
content thing seems to be confusing the issue. Are we agreed that counting
of tokenized Unicode characters within individual pieces of PCDATA is no
more difficult thatn counting them in attribute values? If so, would it help
to apply the rules used for white space recognition in attribute values to
counting of white space within mixed content?
----
Martin Bryan, The SGML Centre, Churchdown, Glos. GL3 2PU, UK 
Phone/Fax: +44 1452 714029   WWW home page: http://www.sgml.u-net.com/

Received on Monday, 19 May 1997 07:17:08 UTC