clarification re. tokens

From:	OXVAXD::LOU          "Lou Burnard" 10-MAR-1997 14:51:18.60
To:	MX%"Peter@ursus.demon.co.uk"
CC:	LOU
Subj:	Re: 4.5 TEI Extended Pointer Locators?

|Fnd I am mystified by the example in A.1.1.13 (the 3rd, 4th 5th tokens
|in "This is _not_ a very good idea"  are apparently:
|"not_", "a" "very"
|The _ acts as a delimiter if it leads a token but not if it trails it?
|(Any default use of _ as a delimiter will cause scientists a lot of
|grief, I suspect).]

The rule is that the first SGML namecharacter found starts a new token, and the
first non namecharacter terminates it, with the proviso that any character
whose status as a namecharacter is not defined in the current SGML declaration
is treated as a non-name-char (i.e. as white space). See TEI P3 p 419, from
which the XML spec has been derived). This means that the underscore is, by
default, regarded as white space, so the location specification TOKEN (3 5)
starts with the "not" of "not_". The example is perhaps a little confusing
because it requests a *span* of tokens (from the 3rd i.e. "not"  to the 5th
i.e. "very"), and a span, naturally enough, includes all the characters from
the start of its beginning to the end of its end, including therefore one of
the underscores, but not the other. Seemed a good idea at the time!


Lou

Received on Monday, 10 March 1997 10:14:31 UTC