Re: clarification re. tokens

At 3:14 PM +0000 3/10/97, Lou Burnard wrote:
>The rule is that the first SGML namecharacter found starts a new token,
>and the
>first non namecharacter terminates it, with the proviso that any character
>whose status as a namecharacter is not defined in the current SGML declaration
>is treated as a non-name-char (i.e. as white space). See TEI P3 p 419, from
>which the XML spec has been derived).

This rule (unnatural as it is) was chosen for compatiblity with HyTimes's
TOKEN addressing mode -- I don't know its status nowadays, but I would be
happier with "space-delimited groups" than any other form of token. But I'd
rather leave it out, not becuase it isn't usefulk, but because it's hard to
make any definition of "token" that will work across the many contexts
where they may make sense -- scripts with their punctuation conventions,
programming and scripting languages, and no doubt others. We're better off
with REGEXPs I think...

>This means that the underscore is, by
>default, regarded as white space, so the location specification TOKEN (3 5)
>starts with the "not" of "not_". The example is perhaps a little confusing
>because it requests a *span* of tokens (from the 3rd i.e. "not"  to the 5th
>i.e. "very"), and a span, naturally enough, includes all the characters from
>the start of its beginning to the end of its end, including therefore one of
>the underscores, but not the other. Seemed a good idea at the time!

It shows all the features -- so it's actually good that way. But the
definition is weird enough that its effects are not necessarily clear from
a casual reading.

  -- David


David Durand              dgd@cs.bu.edu  \  david@dynamicDiagrams.com
Boston University Computer Science        \  Sr. Analyst
http://www.cs.bu.edu/students/grads/dgd/   \  Dynamic Diagrams
--------------------------------------------\  http://dynamicDiagrams.com/
MAPA: mapping for the WWW                    \__________________________