W3C home > Mailing lists > Public > public-html-xml@w3.org > July 2011

Re: Revised HTML/XML Task Force Report

From: Robin Berjon <robin@berjon.com>
Date: Tue, 12 Jul 2011 17:24:19 +0200
Cc: "www-tag@w3.org List" <www-tag@w3.org>, public-html-xml@w3.org
Message-Id: <4313987F-D481-4326-A333-EAFDFDB61A94@berjon.com>
To: Larry Masinter <masinter@adobe.com>
Hi Larry,

I'm copying public-html-xml to move the thread there. I think it would be better to move replies to that list alone.

On Jul 12, 2011, at 02:04 , Larry Masinter wrote:
> "It seems that the world at large is unlikely to adopt polyglot markup as the standard way to encode all HTML documents, so this solution has limited applicability."
> 
> I think it would be helpful to be precise here. I'm not sure what "this solution" refers to in this sentence, or in which ways the applicability is limited. What are the limits? For example, think it would be useful to consider ways in which Polyglot markup could be modified to increase the applicability of using it for content that is recognized as being later processed by XML tool chains.

The stated problem is consuming HTML content with an XML tool chain. The paragraph before that one states that there are two possible approaches: polyglot markup and fronting the tool chain with an HTML parser. The given paragraph clear speaks of polyglot markup  that's what the "this solution" refers to. It's quite possible that the TF has probably read the text too many times to see the difficulty here, so given this context some suggested editorial improvements would likely be welcome!

As for the limited applicability it stems from the problem. You have a world wide web full of non-polyglot HTML. You want to consume it with XML tools. Getting all of those documents to switch to polyglot is highly unrealistic, and therefore this solution has an applicability limited to the cases in which you control the HTML production (or whoever controls it has been nice to you). That's where the limit really is (the two following paragraphs expand on that).

There are indeed some limitations stemming for instance from the difference between the namespaces you'll get, how the content of <script> will be parsed, etc. but if you're in control you can plan for those and work around them so the issue really isn't so much on what you can do, but rather on the practicality of this solution in an open world.

> "Even this is not a 100% solution as is still possible to encounter HTML documents that cannot be represented perfectly in XML"
> 
> It would be helpful to elaborate how common such documents are and what kinds of problems in the lack of "perfect" representation might encounter.

I'm not sure that documenting these are this TF's job. I would say that the list of those issues and how to address them should be in the polyglot document.

-- 
Robin Berjon - http://berjon.com/ - @robinberjon
Received on Tuesday, 12 July 2011 15:24:43 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 12 July 2011 15:24:43 GMT