W3C home > Mailing lists > Public > public-html-bugzilla@w3.org > January 2011

[Bug 11909] The principles of Polyglot Markup - validity? well-formed? DOM-equality?

From: <bugzilla@jessica.w3.org>
Date: Sun, 30 Jan 2011 20:33:32 +0000
To: public-html-bugzilla@w3.org
Message-Id: <E1Pjdxk-00053C-Pf@jessica.w3.org>
http://www.w3.org/Bugs/Public/show_bug.cgi?id=11909

--- Comment #8 from Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no> 2011-01-30 20:33:31 UTC ---
(In reply to comment #7)

> you seem to want to document the set of well formed documents that produce a
> compatible DOM if parsed as XML and that is a vastly more complicated set to
> describe.

This bug is about the principles for polyglotness - and thus the principles for
the polyglot spec. I think the principles should be documented. I have proposed
10 principles in Comment 5 and Comment 6. Some of those principles are clear
that we agree about. Others are new, I think.

Do you agree with the principles? If yes, should we place them in the doc?

Comprehensive principles would be able to cover "the set of well formed
documents that produce a compatible DOM if parsed as XML". So therefore you are
against having/listing principles? 

I think principles can a) make the document clearer {that is: more _writeable_
as opposed to _unwritable_}, b) can help us make som decisions about what the
document should say c) can help authors getting a 'polyglot mindset',

> > Isn't it a good start to just list them?
> 
> No, it would be at best misleading.

There is a tendency in some of the things you have said to simply reject having
any examples at all - for fear that it doesn't become complete. E.g. you have
not filed any bugs against the documentation of the requirement to escape
tabs,linefeeds,carriage returns in attributes. But you immediately thought it
to be a bad idea to put in the document that the & and < needs to be escaped.

I think it is the task fo this document to be much more complete than XHTML 1.0
Appendix C became.

A proven method for discerning between requirements and examples is to use the
phrases "this is normative" and "this is not normative". For example:

This is normative: Polyglots are XML well-formed. 
This is not normative: Thus, for example, & and < must be escaped - except
inside CDATA sections. See XML 1.0 for complete definition of well-formed.

> > Just create a header saying "the
> following features are autogenerated if you don't insert them, and must
> therefore be explicitly added for DOM compatibility". And then list the
> features/elements. 
> 
> That's nowhere near close to a usable specification.

I don't understand why it is not a useful list. But anyway, the most important
thing for myself is to document the principles. If a list is too much, then
don't have it. That said, the polyglot spec is currently full of lists.

> The HTML parser doesn't
> just insert elements it moves things around in ways that are fully specified
> but that you can not specify here without duplicating much of the html5 parsing
> spec.

I don't see that giving such a list does mean that you have to specify all
that.  However, the most important thing for me is to list the principles,
rather than particular lists of elements etc.

Of course, HTML5 itself contains many lists and categories. So e.g. if we want
to say something general about block elements, then it is OK for me to point to
HTML5 for a list of them.

> You'd have to specify all the ways in which p (and other) elements are
> auto-closed, all the ways in which form elements get moved around tables, all
> the html elements that force-close math and svg. You'd have to specify not to
> use image. The list is endless, and unless it was complete the end result would
> be that authors would be able to generate documents that complied with all the
> constraints in the polyglot spec, but which were not parsed in compatible ways
> by xml and html parsers.

I understand that your view is that if we say A, the must list the entire
alphabet. It might also be that you are right  - that the task I had in mind,
is too complicated.

Again, the most important thing for myself is to document the principles,
rather than lists.

> If you restrict to conforming documents the complete set of constraints is more
> or less listed in the current document (there may be some bugs here and there
> but nothing that would make the document ten times bigger). If you do not
> restrict to conforming documents the downside is that the spec becomes
> unwritable and the only possible upside is that people are informed how to make
> non conforming documents using xml tools, but I don't see that making non
> conforming documents should be a valid use case.

One problem that we must deal with is the fact that what a conforming HTML
document is, is a moving target. I will also remind you that HTML5 has the
concept of "applicable specification" which can e.g. add other elements and
namespace prefixes. Or do you really, honestly, want that only documents that
conform to HTML5 "proper", can ever earn the right to be called polyglot? 

It is also so that authors would want to use polyglot markup in order to achive
the benefits of doing so. Thus it is not *only* about meeting some formal
requirement. So, for example, if a document gets quirks-mode parsing in IE6-9
because the author inserted <!--comments--> before <!DOCTYPE html>, then that
author has got a practical problem. 

So far the document doesn't speak about quirks mode. But I suggest, as one of
the principles for polyglot markup, that it leads to no-quirks mode. Not to
cover any set of documents. But to document and incorporate no-quirks mode into
the polyglot concept.

-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
Received on Sunday, 30 January 2011 20:33:35 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Sunday, 30 January 2011 20:33:35 GMT