W3C home > Mailing lists > Public > public-htacg-contrib@w3.org > January 2015

Subject=Re: Parsing Scripts - was Issue #65

From: Jim Derry <balthisar@gmail.com>
Date: Sun, 1 Feb 2015 02:06:35 +0800
Message-ID: <CABUm+BdDdGmrezLFczMyvp3WMPmMkR4ZBsa0fOpdMjqy8S_djg@mail.gmail.com>
To: "To=" <54C79596.4050902@geoffair.info>, public-htacg-contrib@w3.org
There's not a lot of discussion leading me to think this doesn't impact a
lot of people.

Using tidy from November 2014 (assuming no work added for this bug since
then), given the input:

<!DOCTYPE html>
<html>
<head><title></title>
<body>
<script>
var a = '<script';
</script>
</body>
</html>

I get the output:

<!DOCTYPE html>
<html>
<head>
<meta name="generator" content="HTML Tidy (Balthisar Tidy) for HTML5 for
Mac OS X dated 2014/11/22">
<title></title>
</head>
<body>
<script>
var a = '<script';
<\/script>
<\/body>
<\/html>
</script>
</body>
</html>


...and so I wonder if this is something a new configuration option should
handle, or if it's an inherent bug?

I think the question comes down to are we trying to identify errors in
strings? The behavior currently seems to be that Tidy is simply not taking
into account that something is quoted, and interpreting the string contents
as markup.

The danger I see in adding a new configuration option is expecting the user
to know the difference. Yes, users SHOULD know the difference, but they
don't always.

If it's a configuration option, then the default should definitely be on
the side of safety -- ignore anything that's in legal quotes. Advanced uses
could turn off the option when required.

I hope we spur some more discussion before making a huge decision.

Given that this is a very old bug, I also suggest we move this beyond the
5.0.0 milestone.



-- 
---
Jim Derry
Clinton Township, MI, USA
Nanjing, Jiangsu, China PRC
Received on Saturday, 31 January 2015 18:07:03 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:54:14 UTC