- From: Michael[tm] Smith <mike@w3.org>
- Date: Mon, 21 Jan 2013 23:47:40 +0900
- To: "Martin J. Dürst" <duerst@it.aoyama.ac.jp>
- Cc: Henri Sivonen <hsivonen@iki.fi>, public-html WG <public-html@w3.org>, "www-tag@w3.org List" <www-tag@w3.org>
"\"Martin J. Dürst\"" <duerst@it.aoyama.ac.jp>, 2013-01-21 20:14 +0900: > On 2013/01/21 18:46, Henri Sivonen wrote: > Very clear explanation. But just a question: What would be the effort of > checking for polyglot markup? I think that's a reasonable question to ask but I think an even more reasonable question to ask is whether the supposed benefits of adding it are worth the effort at all. > I don't know the internal structure of your validator, but at least in > some ideal implementation, "validates as polyglot" could just be defined > as "validates as HTML" AND "validates as application/xhtml+xml". So people can already determine that with the validator just by manually running their documents through it twice: once with the HTML option selected, and then again with the XHTML option selected. > So even for implementing polyglot validation, we might not need a > document describing polyglot markup :-). > > The problems with the above simple plan that I managed to come up in the > five minutes I wrote this mail are: a) although a document might be valid > both ways, the DOMs wouldn't match; b) merging errors may be quite tricky > (but maybe not necessary); and c) there may be additional user interface > overhead (but it could be as simple as changing the HTML/XHTML choice from > radio buttons to checkmarks. In the simplest implementation, the validator would need to automatically parse and validate the document twice: once with the HTML parser and once with with the XML parser. But the error messages would not be merged. It would show the messages from the HTML pass and then the messages from the XML pass. So the information shown to the user would be pretty much the same as if they just did the two validation passes manually. So the only thing they'd be gaining would be the relatively minor convenience of having the validator automate one extra step for them. As far as implementing it to merge the error messages from separate passes, that would take quite a lot of effort. The validator does streaming parsing and validation and error reporting. Merging the error messages on the server side would require them to not be emitted in a streaming way but instead stored in memory and processed further before emitting them. It would be possible to merge them on the client side, in JavaScript but that also would take significant effort. Another way to implement it would be to not do two parsing/validation passes at all but instead to add some additional error-checking/reporting to the HTML parser. That would require less effort than doing the two-pass thing, and actually Sam Ruby has already contributed patches that do some of that. But it's not complete as far as reporting all the things it should report to conform with the Polyglot spec. The partial support was for quite a while actually implemented (using Sam's patches) and exposed as an option in the validator. But it was not clear that very many users were actually using it. And after it was removed, we didn't get any bug reports asking where it had gone. So I don't think we have any evidence to suggest that having it is a high priority for a lot of users. --Mike -- Michael[tm] Smith http://people.w3.org/mike
Received on Monday, 21 January 2013 14:47:51 UTC