[Bug 23145] Add <textarea> content restrictions for XHTML5

https://www.w3.org/Bugs/Public/show_bug.cgi?id=23145

--- Comment #1 from Michael[tm] Smith <mike@w3.org> ---
The spec already makes it very clear that no elements are allowed as children
of <textarea>:

  http://www.w3.org/html/wg/drafts/html/master/forms.html#the-textarea-element

It says, "Content model: Text".

It doesn't matter whether that <textarea> is in a text/html document or an
XML/XHTML document. It's still only allowed to have text. If any elements were
allowed as children of <textarea>, the "Content model" field there would
explicitly list which elements, instead of saying "Text".

(In reply to comment #0)
> Seemingly, the spec allows any content for <textarae>.

No, it very clearly only allows Text.

> For instance:ยจ
> 
>    <textarea><html></textarea>

That's no "any content". In a text/html document, that is just text. It doesn't
matter at all that it's text which looks like markup. There is nothing special
about that "<html>" text. It could just as well be
<textarea><<html>></textarea> or <textarea><>html<></textarea> or
<textarea>>html<</textarea> or whatever.

In a text/html document, the string of characters "<html>" is not always a
start tag. It may just be just, depending on where it occurs in the document.
When it is text, it doesn't matter that it happens to look like a start tag. 

> But in XML, the first example would count as NOT well-formed. And if you do
> it the well formed way, like so:
> 
>    <textarea><html /></textarea>
> 
> then the <html> element will be parsed as an element, and not as text (thus:
> difference from how it is parsed in HTML).

Yeah, that's because unlike in a text/html document, where "<html />" can
sometimes just be text, in an XML document, it can never be. So it's not text
in your example, and so it's not allowed as a child of <textarea>. Because the
spec says that content model for <textarea> is Text.

> CONCLUSION:
> 
> It is seems like an omission that the spec does not state that elements are
> not permitted as child of <textarea>.

It does state that, very clearly, by defining the content model of <textarea>
as Text. That explicitly disallows any elements. So there's no omission.

> For contrast, then, for the <iframe>
> element, it is specified that it must be empty if used in XHTML. 
> 
> http://www.w3.org/html/wg/drafts/html/master/embedded-content-0.html#iframe-
> content-model
> 
> Thus, XHTML requires different use. And the same goes for <textarea> - which
> does not need to be empty, but which does need to have escaped content.

No, <textarea> doesn't need to have escaped content. It just needs to have
text.

> May be you should simply say that it is not required to escape the "<" in
> HTML,

A statement like that would be fine as a non-normative Note in section on
<textarea>. But if it's added it should also be added as Note in the section on
<title>.

However, it's not really necessary to add it at all, because the details on
what kind of Text is allowed where in text/html documents are already provided
in "HTML syntax" section of the spec:

  http://www.w3.org/html/wg/drafts/html/master/syntax.html

That sections says:

  [1] http://www.w3.org/html/wg/drafts/html/master/syntax.html#syntax-text
  "Text is allowed inside elements, attribute values, and comments. Extra
  constraints are placed on what is and what is not allowed in text based
  on where the text is to be put, as described in the other sections."

Then it says,

  [2] http://www.w3.org/html/wg/drafts/html/master/syntax.html#elements-0
  "There are five different kinds of elements: void elements, raw text
  elements, escapable raw text elements, foreign elements, and normal
elements."

And it lists "textarea, title" as being "escapable raw text elements".

Then it says,

  [3] http://www.w3.org/html/wg/drafts/html/master/syntax.html#normal-elements
  "Normal elements can have text, character references, other elements,
  and comments, but the text must not contain the character "<" (U+003C) or an
  ambiguous ampersand."

So that means that for text in normal elements, if you want to use the
character "<", you have it escape it as a character reference.

Then it says,

  [4]
http://www.w3.org/html/wg/drafts/html/master/syntax.html#escapable-raw-text-elements
  "Escapable raw text elements can have text and character references, but
  the text must not contain an ambiguous ampersand...
  "The text in raw text and escapable raw text elements must not contain any
  occurrences of the string "</".

So unlike for "normal elements", "Escapable raw text elements" don't have any
restriction about the "<" character. Therefore you don't need to escape it.

> but that it is required to escape it in XHTML.

That's because unlike in text/html, in XML there is no such thing as "escapable
raw text". But that fact is already documented in the spec in the "HTML syntax"
section and "XHTML syntax" section.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.

Received on Wednesday, 4 September 2013 01:58:35 UTC