- From: Peter Murray-Rust <Peter@ursus.demon.co.uk>
- Date: Mon, 10 Mar 1997 16:07:50 GMT
- To: w3c-sgml-wg@w3.org
[From a message from Lou Burnard. I'm not sure if this went to the WG, but if it didn't, Lou, I hope you don't mind my replyinh there.] In message <009B10EA.F5FEC08B.4@vax.ox.ac.uk> Lou Burnard writes: > |Fnd I am mystified by the example in A.1.1.13 (the 3rd, 4th 5th tokens > |in "This is _not_ a very good idea" are apparently: > |"not_", "a" "very" > |The _ acts as a delimiter if it leads a token but not if it trails it? > |(Any default use of _ as a delimiter will cause scientists a lot of > |grief, I suspect).] > > The rule is that the first SGML namecharacter found starts a new token, and the ^^^^ The rule is in TEI, not SGML I assume? Therefore we don't *have* to accept it precisely for XML...? > first non namecharacter terminates it, with the proviso that any character > whose status as a namecharacter is not defined in the current SGML declaration > is treated as a non-name-char (i.e. as white space). See TEI P3 p 419, from > which the XML spec has been derived). This means that the underscore is, by > default, regarded as white space, so the location specification TOKEN (3 5) I understand the logic! This is going to create great fun with numbers. Let's try something like: <NUMBERS>-.2 +1.3 0.13E-14 -2.3E+14</NUMBERS> These are all (I think) valid IEE representations. If the rule is applied as stated then (I think) namechars include '-' but not '+'. Remarkably the first 3 numbers appear to survive, but the fourth gets its exponent deleted. In fact, given this rule, I cannot see how numbers can be transmitted in text (and I'm sure that there are even worse things we want to transmit! I would very much like to avoid having to use the construction: <NUMBER>-.2</NUMBER><NUMBER>+1.3</NUMBER>, etc. This will drive scientists away quite rapidly. Many languages (including XML) have a concept of whitespace, so please can we qualify TOKEN with (say) DELIMITER="XML-S" or similar if we want it. > starts with the "not" of "not_". The example is perhaps a little confusing > because it requests a *span* of tokens (from the 3rd i.e. "not" to the 5th > i.e. "very"), and a span, naturally enough, includes all the characters from > the start of its beginning to the end of its end, including therefore one of > the underscores, but not the other. Seemed a good idea at the time! ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ It always does! The challenge is how can we use it for free format material which is not solely 'text'. I think we must address this. P. -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/
Received on Monday, 10 March 1997 11:58:04 UTC