- From: Rzepa, Henry <h.rzepa@ic.ac.uk>
- Date: Thu, 3 Aug 2000 10:15:20 +0100
- To: html-tidy@w3.org
- Cc: g.gkoutos@ic.ac.uk
I wrote a) In chemistry, element attributes such as script="..." often contain semantic line breaks. for example # comment where the # starts on a new line and the comment ends with a line break b) Even worse, the scripts can use the < or > operators. Dave R replied > >Do you have any examples of such legacy scripts, and why these >need to be included in attribute values rather than as the >content of script elements, or in linked scripts? and Richard O'Keefe added; > I don't see any way around problem (a). > Problem b). > So script='cout << "foo"' should be perfectly ok in HTML. >... > HTML Tidy automatically translates < > to < > in attribute > value literals, so problem (b) would seem to be solved. I agree a) is semantically wrong. A (temporary) solution might have been to simply suppress ALL processing of the value of an attribute, thus preserving line breaks, but running the risk of producing invalid XHTML. Our basic problem was that a lot of chemists have produced a great many HTML pages containing scripts with problem a) and we need to produce a solution. This answers Dave's question about why they had to be inlined rather than linked scripts, ie its what 1000s of pages unfortunately contain! The best I think will be to pre-process the page, identifying the scripts that show this, and replace them with a line-break independent version Problem b) is actually much more complicated than we realised. Richard is correct in stating that Tidy translates >, but in fact the answer is "sometimes", ie not always. Its this behaviour that perhaps someone who is familiar with the Tidy code might wish to comment upon. The best way of illustrating that is to give two examples Example 1 Fails with Tidy with the error "Error: missing quotemark for attribute value" Example 2 Works as Richard Indicates Example 1 <EMBED SRC="caffeine.pdb" name="caffeine" bgcolor=#D28543 align=abscenter width=250 height=300 spiny=360 startspin=true display3D=ball&stick SCRIPT=" # Avoid Colour Problems! select (atomno>=15) and (atomno<=24) select atomno=(atomno>=1) and (atomno<=14) select (atomno>=16) and (atomno<=23) "> Example 2 <EMBED SRC="caffeine.pdb" name="caffeine" bgcolor=#D28543 align=abscenter width=250 height=300 spiny=360 startspin=true display3D=ball&stick SCRIPT=" # Avoid Colour Problems! select (atomno>=15) and (atomno<=24) select (atomno>=16) and (atomno<=23) "> More generally, we have identified the following conditions c) Only the presence of > fails Tidy, < always is translated to an entity d) In the case that an occurrance of > exists, only up to 3 new existances of either < or > are translated. More cause failure. That is why example 2 works and not 1. e) After the first occurance of >, in the case that neither < or > occur again , only 8 new script lines are allowed with unlimited number of commands on each line. f) If our script attribute value is put in a single line separated by ; (that is the syntax support in our script language) in the case of an > occurance, up to 10 occurances of < or > are allowed but not more. The number of other commands is unlimited. Clearly, there must be buffer and counter limits in Tidy relating to handling the value of an attribute. Of the cases above, fixing d) would help a great deal, and extending the limits found in e) and f) would be nice. -- Henry Rzepa. +44 (0)20 7594 5774 (Office) +44 (0)20 7594 5804 (Fax) Dept. Chemistry, Imperial College, London, SW7 2AY, UK. http://www.ch.ic.ac.uk/rzepa/
Received on Thursday, 3 August 2000 05:15:50 UTC