Subject=Re: Parsing Scripts - was Issue #65

There's not a lot of discussion leading me to think this doesn't impact a
lot of people.

Using tidy from November 2014 (assuming no work added for this bug since
then), given the input:

<!DOCTYPE html>
<html>
<head><title></title>
<body>
<script>
var a = '<script';
</script>
</body>
</html>

I get the output:

<!DOCTYPE html>
<html>
<head>
<meta name="generator" content="HTML Tidy (Balthisar Tidy) for HTML5 for
Mac OS X dated 2014/11/22">
<title></title>
</head>
<body>
<script>
var a = '<script';
<\/script>
<\/body>
<\/html>
</script>
</body>
</html>


...and so I wonder if this is something a new configuration option should
handle, or if it's an inherent bug?

I think the question comes down to are we trying to identify errors in
strings? The behavior currently seems to be that Tidy is simply not taking
into account that something is quoted, and interpreting the string contents
as markup.

The danger I see in adding a new configuration option is expecting the user
to know the difference. Yes, users SHOULD know the difference, but they
don't always.

If it's a configuration option, then the default should definitely be on
the side of safety -- ignore anything that's in legal quotes. Advanced uses
could turn off the option when required.

I hope we spur some more discussion before making a huge decision.

Given that this is a very old bug, I also suggest we move this beyond the
5.0.0 milestone.



-- 
---
Jim Derry
Clinton Township, MI, USA
Nanjing, Jiangsu, China PRC

Received on Saturday, 31 January 2015 18:07:03 UTC