[Bug 24451] New: editorial comments on LCWD from bugzilla@jessica.w3.org on 2014-01-31 (public-html-admin@w3.org from January 2014)

From: <bugzilla@jessica.w3.org>
Date: Fri, 31 Jan 2014 05:20:02 +0000
To: public-html-admin@w3.org
Message-ID: <bug-24451-2495@http.www.w3.org/Bugs/Public/>
https://www.w3.org/Bugs/Public/show_bug.cgi?id=24451

            Bug ID: 24451
           Summary: editorial comments on LCWD
           Product: HTML WG
           Version: unspecified
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: HTML/XHTML Compatibility Authoring Guide (ed: Eliot
                    Graff)
          Assignee: eliotgra@microsoft.com
          Reporter: liam@w3.org
        QA Contact: public-html-bugzilla@w3.org
                CC: eliotgra@microsoft.com, mike@w3.org,
                    public-html-admin@w3.org,
                    public-html-wg-issue-tracking@w3.org

There are a lot of comments here, but I think they are mostly or all editorial,
except for a Process comment at the end, so I have sent them all in one
comment. If you prefer I can separate them into multple Bugzilla entries.

This is a ueful and good document - I'm pleased to see it move forward, but I
found quite a few minor typos and some slightly confusing passages, as noted...



Status of this Document

Please don't refer to "legacy XML" - I think you mean just "XHTML 1.x".
XML in general is not deprecated by W3C.

"this recommendation" - it's not yet a W3C Recommendation, although I do hope
it becomes one!


2.1 Principles

s/requiremetn/requirement/


3.1 Processing instructions and 3.2

Forbidding the XML Declaration - 3. says, "character encoding MAY be left
undeclared in XML" but 3.1 forbids the XML declaration, which is where an
encoding would be declared in XML. (the document goes on to clarify, but
suggest change

"As such, character encoding MAY be left undeclared in XML with the result that
UTF-8 is still supported"

to

"Documents served with an XML content type therefore do not need to use any of
the HTML encoding declaration methods, although if the document might be
interpreted as text/html it SHOULD do so."

However, the green NOTE further down restates this, sojust removing the "As
such" sentence would also be fine.

Further down you note that the I18N WG recommaends [that one] always include an
encoding declaration, which is helpful but may leave the reader confused as to
whether this applies to HTML or to XHTML.


3.3 The DOCTYPE


The note that the string may be in mixed case or uppercase letters and still be
well-formed XML is perhaps confusing since it starts talking about valid xml
and then, later in the same sentence, moves to well-formed XML.

Suggest,

Note
For valid XML the document element named in the document type declaration must
exactly match the top-level element of the document, including in case.  This
rule is relaxed for well-formed, rather than valid, XML documents. Since XHTml
requires a lower-case <code>html</code> element, Polyglot documents
<rfc>should</rfc> use lower-case <code>html</code> for the element named in the
DOCTYPE declaration.


but not sure if it's worth the extra length.


It would probably be worth saying something about customized XHTML DTDs here,
with element and entity declarations inside the document type definition subset
within the document, or that point to an alternate DTD.


3.4 Namespaces


In XML it is the URI, not the prefix, that is the namespace, so the first
paragraph (3.4.1, [HTML5] introduces..) is, formally, meaningless.

What is meant, I think, is that the HTML 5 specification requires that HTML
processors implicitly associate the prefixes html, svg and math with their
respective URIs, which are as follows [...].

Regarding the paragraph, [[
Note that there are other prefixed attributes that can be used beyond
xlink:href (such as xml:base). Polyglot markup does not declare these prefixes
via xmlns. The prefixes are implicitly declared in XML and are automatically
applied to the appropriate attributes in HTML
]]

Is this a note or is it normative? It says it's a note but does not use Note
markup. Also, "such as xml:base" seems far too wishy-washy for a specification.
Is foaf:email such a prefixed attribute? what about xml:id?

I _think_ what is meant is,

The "xml" namespace prefix used e.g. in xml:base, xml:lang, xml:space and
xml:id
does not need to be declared in XML documents. See CSS namespaces
[CSS3NAMESPACE] for how to use CSS selectors with these attributes.

The following paragraph seems to be attempting to say this.

I don't think the "Note" means to say anything about attributes associated with
namespace URIs other than the URI normally associated with the "xml" prefix.

I do like the "can be sued as CSS selectors" and have contacted my attorneys
already :-)


3.5.1 Required elements and tags

The first paragraph seems superfluous, but maybe it's needed for HTML people?

In the next paragraph there's an extra comma in "optional tags, may create".

s/in their code/in their markup/

Remove the extra comma in " with regard to tags, is"

3.5.1.1

"Every polyglot markup document therefore ontains an html, head, title, and
body element, represented in the code with their tags." -- that's true in HTMl
too, as the previous section just explained, although they are not represented
in the "code" [please let's call it markup, not code]. Maybe you have an extra
comma there just before "represented"?

s/following source code/following markup/


3.5.1.2 Required tags examples

I think this section is talking about required _elements_, not required _tags_.
Of course, in XML, the presence of an element is never inferred, so tags are
always required at the start and end of element boundaries.


3.5.2 Excluded elements and tags

This should just be Excluded elements. All three XML tags (start, end, null)
are used in XHTML and polyglot HTML.


Delete spurious comma in "Elements with features designed for HTML alone, are
non-polyglot from the outset."

(the rationale for excluding noscript is a nonsense of course: there's also no
mechanism for producing img or a or table in XML directly. But we'll let that
pass)

In this section (3.5.2) you say that noscript is not allowed, then have a
non-normative note that says there are other elements that are also not allowed
but which you do not list. Since this is non-normative, how should the reader
know which elements have features designed for HTML? I'd say "a" and "img" are
the obvious candidates, but surely you don't mean these?


3.5.3.1 Element names.

"Polyglot markup uses the correct case for element names."

I think this sentence translates to, "conforming documents conform to this
specification", and can be deleted. I'd suggest making the bullet list that
follows it be three simple paragraphs instead.

3.5.3.2 Attribute names

Again, since no conforming document could use an "incorrect" case, I'd delete
the first sentence, and maybe promote the bullet list items to paragraphs.


3.6 Element Contents

The term strictly speaking in both SGML and Xml is "Element content",
although I think everyone will understand "Element contents" not to be a
reference to an element called Contents :-)

3.6.1 "Example: Polyglot markup uses the minimized tag syntax for void
elements"

It uses the empty element tag syntax.

You (mis)use the "minimized form" term again in the Example and, confusingly,
use the undefined term "self-closing" in the note. Please either use the same
term in all places or define all the terms.

3.6.2 Raw text elements

XML does not have "comment tags" or "cdata tags". SGML does have CDATA
elements, but that's not what you mean here.

A better way to put it is that in HTML the content of the script and style
elements is treated as if it were CDATA, so that & and < are not special except
when they occur as the end tag to close the element.

The "As a result" paragraph doesn't seem to add anything except suggesting that
the editor of this document prefers HTML in some way :-)

In the last column of the table, </script and </style should have the same
description as for HTMl - they terminate the corresponding element.


3.6.2.2.1 Safe CDATA usage rules

s/These rules assumes that CDATA is of limited use for CSS./These rules assumes
that CDATA is of limited use for CSS and therefore focos on JavaScript used
with the script element./

HTML's restrictions on <script>/<style> -- probably you should say what they
are, and I sugget using an "and" instead of a virgule/slash here, as it looks
like part of markup syntax.

"Before the CDATA section there can only be one node" - preferrably only one
line of code" -- by code here do you mean JavaScript code? There aren't any
nodes at all in an XML document, nor in an HTMl document until it's aprsed, and
then you get nodes in the DOM representation (XML systems mostly don't use DOM
at all). So I don't understand this phrase.

EXAMPLE 12

has a </script> but no <script>, is that intended?

"Disadvantage: Less safe for templating since the comment could become treated
as part of the template." I think this needs an explanation. Are you referring
to XSLT templates here?

You probably need an example in which the string ]]> occurs as part of the
text, to demonstrate how to handle it.

You may want to mention the problem of CDATA injection in which a malicious
user creates data that looks like ]]> nasty stuff here <![CDATA[


3.6.3 Escapable raw text elements


delete spurious comma after "permitted"

you could also delete the comma after 'safe text content"

s/permittd/permitted/


3.6.5 Normal Elements

add a missing comma after iframe element to end the paranthetical clause in
"Normal elements have no special restrictions other than those that normally
apply to polyglot markup. But note that some elements, such as the iframe
element must be empty"

When you say these elements must be empty,
1. which elements exactly?
2. do you mean EMPTY, using the empty element tg syntax <iframe/> ?
3. If not, what do you mean?


3.7.1 newlines

You probably need to explain that the problem is that HTML/SGML-based systems
will delete the initial newline on parsing, but XML parsers will not.

3.8 Attributes

"the literal character '\t'" -- that's actually four characters. Do you mean a
literal tab character or do you mean that in HTML one can use \t to represent a
tab? (I have no idea which you mean)

It might be worth noting that javaScript and CSS in attribute values are
affected by attribute value normalization, because a comment will end up
commenting out not to the end of the source line but to the end of the entire
attribute value. (whether CSS has comments to end of line is up for debate, but
browsers behave as if it does, which is all most authors care about)


In 3.8.1 Disallowed attributes you say that xm:space and xml:base are not
allowed in HTML but are allowed on SVG and MathML elements - do you mean, even
when those SVG or MathML elements occur within HTML documents? (if so, you
shoudl probably say so; as it stands it could be taken to mean that they are
allowed by those specs but not when SVG or MathML are used inside HTML)


3.8.3.1 The id attribute

Note that for valid XHTML the value of every id attribute must unique within
the document and must be a legal XML name, starting with a letter.

[[
Polyglot markup always uses character references for the less than sign (<) and
ampersand (&) when they are used as characters, except when those characters
appear inside a CDATA section.
]]
s/ inside a CDATA section/ inside a CDATA section or a comment/


3.10 Comments

"Polyglot markup does not begin a comment with either ">" or "->". "
That's good because neither HTML nor XML do this - they use <! and <!--
respectively.


3.11.1 s/XHTM/XHTML/

3.11.2 CSS

I think the example at the start should be [attr]{property:value;}

Remove spurious comma in "required by polyglot markup, are namespaced"

[[
 As result, a selector such as [xmlns]{rule:foo} will only work in HTML – it
will not work in XHTML, where it is a namespace attribute.
]]

The selector is not a namespace attribute. I think you mean, where the
attribute has an associated namespace.

[[
And the same goes for prefixed attributes – even if one escapes the colon
([xml\:lang]{rule:foo}), such selectors will only work in HTML, except that for
the namespace declaration for the xlink: prefix, then it works like in XML even
in the HTML syntax and must thus be selected in a namespaced way in both
syntaxes.
]]

This sentence is confusing for me and hard to read. Part of the problem is that
the editor seems unaware of the distinction between a prefix, a namespace and a
namespace URI, but most of the problem is that it's a run-on sentence. "it
works like in XML" -- what works "like in XML"? Suggest rewriting as multiple
sentences. I can't comment on correctness because I don't understand it, sorry.

I think this section overall is good and correct, but needs a slight polishing.
Hey, it's a draft :-)

3.12 Templating restrictions

This section appears to be empty.


*

What is the repationship between Polyglot and XML 1.1? Is NEL allowed in
whitespace in HTML? What about c0 and c1 controls?

*

Please remember to send a formal request to the XML Core Working Group to
review this document; they/we may decline, or may accept these (personal)
comments and endorse them, or do something else, but they must obviously be
consulted just as the XML Working Group would consult the HTML Working Group in
similar circumstances.

Thank you, and thank you for working on this important and helpful document.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
Received on Friday, 31 January 2014 05:20:05 UTC