Re: Link-6: Addressing at the sub-element level

In message <3.0.32.19970518113134.00b18a30@pop.intergate.bc.ca> Tim Bray writes:
> A lot of people want to support addressing by char count, token

I'm one :-).  I'm also asking for regexp.  The applications below
don't need a UNICODE regexp...

[...]

I hope this isn't too anglophone, but I would imagine that many non-textual
applications did not require UNICODE, or if they used it would not see the
char-by-char addressing as a problem.  The sorts of things I want to
address into are:
"1.2 2.3 3.4"  (1)
"GIVEQ..." (sequence of human insulin) (2)

or resolve: "error at line 27, char 23"

In the first I'd like to be able to address white-space separated tokens.
Because occasionally tokens have white space, I'd like the ability to use
non-whitespace as separators when necessary.

The reason for preferring (1) rather than
<VAR>1.2</VAR><VAR>2.3</VAR><VAR>3.4</VAR>
is not that it saves space (I don't buy this) or even that it makes the
difference between selling XML to newcomers (I do buy this :-), but
that the components may all have common attributes.  Thus imagine:

<ARRAY XML-TYPE="FLOAT" DICTNAME="AGE" CONVENTION="CENSUS" UNITS="year"
  TITLE="Age">1.2 2.3 3.4</ARRAY>
and replace it by the non-tokenised form.  This is not only immensely larger
but doesn't announce the communality of the attributes - i.e. an application
might have to search through all of the elements to make sure they
all had the same data type.

Since this comes up in the MathML DTD (which uses <SEP/> and also is
required for databases I think we need a means of encapsulating array-like
data.

I have no knowledgeable comments to make about addressing into 'text' :-).

	P.

 

> 
> 
> Cheers, Tim Bray
> tbray@textuality.com http://www.textuality.com/ +1-604-708-9592
> 
> 

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

Received on Sunday, 18 May 1997 12:22:48 UTC