- From: Arthur Clifford <art@artspad.net>
- Date: Wed, 7 Apr 2010 19:03:38 -0700
- To: <public-html-comments@w3.org>
I mentioned <code> when I probably meant <samp> but in earlier posts I did mention <pre>. For user specified code that needs to be displayed as code and not interpreted there are or were tags for doing that. These tags should be considered to have CDATA as their child node so that the CDATA tag itself should not be necessary. The problem is what happens when you want html, or xml, or any other syntax heavy code in one of those tags and to do so in such a way that it is not treated as parsable/executable code. The solution to date is to use entities and other escapes which will be interpreted as visual characters so that the result when it renders looks like the intended syntax heavy code. In the context of blogs, the user specified data, say html sample code, is going to be sent to a server stored and eventually returned in a blog page somewhere. In that process any text provided will, or should be, run through sanitizers. The general trend is to escape everything that will confuse the html/xml parser. The debate then is whether it is time to consider an alternative where sanitizing is not necessary because the content block is known to be treated as plain text? If so, what is the approach? T.J. was suggesting the XML-industry-standard CData tag approach. I was suggesting contentLength (or maybe just length) as an attribute, thus negating the need for escaping anything in the text content. My thinking, in the user-submitted blog context, is that the server is going to be yanking the content from somewhere and dynamically putting it in a page, it would be really easy for it to programmatically get the length when sending back responses. The only reason to use contentLength would be that you know you have characters that would confuse the browser, and such an attribute would have to be optional. I suggested a mime attribute, I should have called it a syntax attribute. Because if you have pre or samp or whatever treated as cdata where you know the text is text, then there's also the possibility of allowing a savvy user-agent to make sample code even more readable by providing syntax coloring and indenting; which can be very useful for developer blogs especially for longer examples. The other suggestion was to provide an attribute for a special end sequence; borrowing from the PHP Heredoc technique where you have <<<CUSTOM_ENDING Text More stuff CUSTOM_ENDING So, I'm thinking: <pre end="CUSTOM_ENDING"> Text More stuff CUSTOM_ENDING </pre> The user agent would have to know to leave out CUSTOM_ENDING though. If you wanted to include/exclude entity sanitizing in the pre and samp tags you could have an attribute for that as well. Any solution that is chosen, would need to accommodate folks who are sanitizing things as well as work with the newer technique. Obviously, there's a common practice in place, but is it a good practice or one that has been necessary because nobody's taken the time to address this issue in detail? T.J. FYI, I agree in principal with the need for cdata equivalent functionality, but I'd rather see pre or samp or an equivalent html tag be used and updated to include new optional attributes/parameters. Of course, until every browser complies correctly, you'll probably have to detect browser version and sanitize content anyway :/ Art
Received on Thursday, 8 April 2010 02:03:54 UTC