RE: HTML 5 from Arthur Clifford on 2010-04-08 (public-html-comments@w3.org from April 2010)

From: Arthur Clifford <art@artspad.net>
Date: Wed, 7 Apr 2010 21:10:13 -0700
To: <public-html-comments@w3.org>
Message-ID: <016c01cad6d1$6580f6e0$0e14a8c0@iMacPCVirtualMachine>
My appolgoies for confusing people, thanks for pointing that out Eduard.

"Ok, this is quite at the same level as CDATA, at least on technical
benefits and costs: both approaches need UAs to update their parsers,
and require some non-trivial updates to the spec text. Your idea,
however, sacrifices the flexibility to have both CDATA and parsed
content within the same element, which might get useful on some
circumstances. For example, you could have something like this:
<code><![CDATA[
   (some code here...)
]]>
<mark class="error" title="Compiler error 1234: that ain't
work"><![CDATA[ offending code line here ]]></mark>
<![CDATA[
   (and more code here...)
]]></code>
which would look in the DOM as:
<code>
    TEXT
    <mark>
        TEXT
    TEXT
reflecting the natural structure of the content (ie: a single code
block, text, <mark>, text inside it, and text inside the <mark>). Any
form to implement this with your proposal would either sacrifice the
mark, or have multiple <code> nodes in the DOM despite the whole thing
being only a single block in nature.
Besides the better flexibility; CDATA has the slight advantadges of
being based on previously existing web-related technologies, so some
degree of implementation experience is already available; and it
allows making the feature consistent between "soup" and XML
serializations (remember that XHTML inherently has CDATA, which is
part of XML)."

How would having cdata blocks be any more beneficial than multiple code
blocks. The idea is to markup a chunk of text as text. If you wanted a
section of html that had a combination of code and markup you would make a
div tag and have whatever markup you want with code/pre blocks. The content
of the block just needs to be treated as text, preferably without having to
escape anything. So:

<div>
<h3>Here's some example of my code, why doesn't this work?</h3>
<code syntax="html">
	<html>
	<body>
	<span>hello world</code><html></body>
</code>
<span class="sig" ><a href="mailto:confused@users.com">Confused
User</a></span>

DOM:
Div
  h3
  code
  span
    a

Or:

<div>
<h3>Bad code</h3>
<code syntax="html" end="END">
	<html>
	<body>
	<span>hello world</code><html></body>
	END
</code>
<h3>Good code</h3>
<code syntax="html" end="END">
	<html>
	<body>
		<span>hello world</span>
	</body>	
	</html>
	END
</code>
<span class="sig" ><a href="mailto:confused@users.com">Confused
User</a></span>
</div>

DOM:
div
  h3
  code
  h3
  code
  span
   a

That's what the pre and 

As to XML
I know Wikipedia is hardly the best place to quote. But as it mentions:
" CDATA-type element content
An SGML DTD may declare an element's content as being of type CDATA. Within
a CDATA-type element, no markup will be processed. It is similar to a CDATA
section in XML, but has no special boundary markup, as it applies to the
entire element." 
  	
SO, it is also industry standard to define a tag as being of TYPE CDATA so
UAs know not to process markup. I would bet that xhtml defines the pre tag
as cdata and the html/xhtml standard says to render it using a monospaced
font/style. If you want XML syntax, use XHTML. Personally (and yes it is
just my opinion) I find the use of CDATA tag in xml as a hack solution for
when a schema is poorly defined or incomplete. All you are doing is marking
up something as text and you aren't defining the document structure.  

I don't care either way about the content length option, it obviously
wouldn't help manual data entry at all. But if a connection is flaky you are
going to have weird results no matter what. And if anything, you'd want some
way to know that text treated that should be text is not rendered nor
anything after it be rendered as anything other than text. THAT is actually
the best argument in favor of escaping and sanitizing, because if anything
hiccups it means you have non-functioning escaped text rather than
potentially harmful scripts. Someobody brought up injection concerns
earlier. However, a browser *should* know packets or data were dropped in
transfer and cease rendering content as a basic safety measure. I'm not
deeply familiar with the http standard, isn't there something in the
handshaking between client and server to deal with that?

Art
Received on Thursday, 8 April 2010 04:10:11 UTC