- From: Peter Murray-Rust <Peter@ursus.demon.co.uk>
- Date: Sun, 18 May 1997 17:06:26 GMT
- To: w3c-sgml-wg@w3.org
In message <3.0.32.19970518113134.00b18a30@pop.intergate.bc.ca> Tim Bray writes: > A lot of people want to support addressing by char count, token I'm one :-). I'm also asking for regexp. The applications below don't need a UNICODE regexp... [...] I hope this isn't too anglophone, but I would imagine that many non-textual applications did not require UNICODE, or if they used it would not see the char-by-char addressing as a problem. The sorts of things I want to address into are: "1.2 2.3 3.4" (1) "GIVEQ..." (sequence of human insulin) (2) or resolve: "error at line 27, char 23" In the first I'd like to be able to address white-space separated tokens. Because occasionally tokens have white space, I'd like the ability to use non-whitespace as separators when necessary. The reason for preferring (1) rather than <VAR>1.2</VAR><VAR>2.3</VAR><VAR>3.4</VAR> is not that it saves space (I don't buy this) or even that it makes the difference between selling XML to newcomers (I do buy this :-), but that the components may all have common attributes. Thus imagine: <ARRAY XML-TYPE="FLOAT" DICTNAME="AGE" CONVENTION="CENSUS" UNITS="year" TITLE="Age">1.2 2.3 3.4</ARRAY> and replace it by the non-tokenised form. This is not only immensely larger but doesn't announce the communality of the attributes - i.e. an application might have to search through all of the elements to make sure they all had the same data type. Since this comes up in the MathML DTD (which uses <SEP/> and also is required for databases I think we need a means of encapsulating array-like data. I have no knowledgeable comments to make about addressing into 'text' :-). P. > > > Cheers, Tim Bray > tbray@textuality.com http://www.textuality.com/ +1-604-708-9592 > > -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/
Received on Sunday, 18 May 1997 12:22:48 UTC