- From: David Durand <dgd@cs.bu.edu>
- Date: Mon, 10 Mar 1997 12:15:29 -0500
- To: Lou Burnard <lou@vax.ox.ac.uk>, W3C-SGML-WG@w3.org
At 3:14 PM +0000 3/10/97, Lou Burnard wrote: >The rule is that the first SGML namecharacter found starts a new token, >and the >first non namecharacter terminates it, with the proviso that any character >whose status as a namecharacter is not defined in the current SGML declaration >is treated as a non-name-char (i.e. as white space). See TEI P3 p 419, from >which the XML spec has been derived). This rule (unnatural as it is) was chosen for compatiblity with HyTimes's TOKEN addressing mode -- I don't know its status nowadays, but I would be happier with "space-delimited groups" than any other form of token. But I'd rather leave it out, not becuase it isn't usefulk, but because it's hard to make any definition of "token" that will work across the many contexts where they may make sense -- scripts with their punctuation conventions, programming and scripting languages, and no doubt others. We're better off with REGEXPs I think... >This means that the underscore is, by >default, regarded as white space, so the location specification TOKEN (3 5) >starts with the "not" of "not_". The example is perhaps a little confusing >because it requests a *span* of tokens (from the 3rd i.e. "not" to the 5th >i.e. "very"), and a span, naturally enough, includes all the characters from >the start of its beginning to the end of its end, including therefore one of >the underscores, but not the other. Seemed a good idea at the time! It shows all the features -- so it's actually good that way. But the definition is weird enough that its effects are not necessarily clear from a casual reading. -- David > >Lou _________________________________________ David Durand dgd@cs.bu.edu \ david@dynamicDiagrams.com Boston University Computer Science \ Sr. Analyst http://www.cs.bu.edu/students/grads/dgd/ \ Dynamic Diagrams --------------------------------------------\ http://dynamicDiagrams.com/ MAPA: mapping for the WWW \__________________________
Received on Monday, 10 March 1997 12:14:23 UTC