Re: Tidying the value of an attribute?

I wrote

a) In chemistry, element attributes such as script="..."
 often contain semantic line breaks. for example
 # comment
where the # starts on a new line and the comment ends with a line break

b) Even worse, the scripts can use the  < or  > operators.

Dave R replied
>
>Do you have any examples of such legacy scripts, and why these
>need to be included in attribute values rather than as the
>content of script elements, or in linked scripts?

and  Richard O'Keefe added;

>    I don't see any way around problem (a).
>
Problem b).
>    So script='cout << "foo"'  should be perfectly ok in HTML.
>...
>    HTML Tidy automatically translates < > to &lt; &gt; in attribute
>    value literals, so problem (b) would seem to be solved.


I agree  a) is  semantically wrong. A (temporary) solution might have been to
simply suppress ALL processing of the value of an attribute, thus preserving
line breaks, but running the risk of producing invalid  XHTML.
Our basic problem was that
a lot of chemists have produced a great  many  HTML pages containing
scripts with problem a) and we need to produce a solution. This answers
Dave's question about why they had to be inlined rather than linked scripts,
ie its what 1000s of pages unfortunately contain!

 The best I think will be to pre-process the page, identifying the scripts that show
this, and replace them with a line-break independent version

Problem  b) is actually much more complicated than we realised. Richard
is correct in stating that Tidy translates >, but in fact the answer is
"sometimes", ie not always. Its this behaviour that perhaps someone who
is familiar with the Tidy code might wish to comment upon. The best way
of illustrating that is to give two examples

Example 1 Fails with  Tidy with the error "Error: missing quotemark for attribute value"
Example 2 Works as Richard Indicates

Example 1

<EMBED SRC="caffeine.pdb" name="caffeine" bgcolor=#D28543
align=abscenter width=250 height=300 spiny=360
startspin=true display3D=ball&stick SCRIPT="
# Avoid Colour Problems!
select (atomno>=15) and (atomno<=24)
select atomno=(atomno>=1) and (atomno<=14)
select (atomno>=16) and (atomno<=23)
">

Example 2

<EMBED SRC="caffeine.pdb" name="caffeine" bgcolor=#D28543 align=abscenter width=250 height=300 spiny=360 startspin=true display3D=ball&stick SCRIPT="
# Avoid Colour Problems!
select (atomno>=15) and (atomno<=24)
select (atomno>=16) and (atomno<=23)
">

More generally, we have identified the following conditions


c) Only the presence of  > fails Tidy, <  always is translated to an entity
d)  In the case that an occurrance of > exists, only  up to  3 new existances of either < or >
are translated.  More cause failure. That is why example 2 works and not 1.
e) After the first occurance of >, in the case that neither < or > occur again , only  8 new
script lines are allowed with unlimited number of commands on each line.
f) If our script attribute value is put in a single line separated by ;  (that is the
syntax support in our script language) in the case of an >
occurance,  up to 10 occurances of < or > are allowed but not more.
The number of other commands is unlimited.


Clearly, there must be buffer and counter limits in Tidy relating to handling the value
of an attribute. Of the cases above, fixing  d) would help a great deal, and extending
the limits found in  e) and  f) would be nice.
-- 

Henry Rzepa. +44 (0)20 7594 5774 (Office) +44 (0)20 7594 5804 (Fax)
Dept. Chemistry, Imperial College, London, SW7  2AY, UK. 
http://www.ch.ic.ac.uk/rzepa/

Received on Thursday, 3 August 2000 05:15:50 UTC