[whatwg] several messages about XML syntax and HTML5

On Mon, 2006-12-18 at 23:26 +1300, Matthew Paul Thomas wrote:

> Humans don't work that way. If the words "HTML (WARNING)" or "XHTML 
> (WARNING)" started appearing next to over 90 percent of search results, 
> people would not think that something was wrong with 90 percent of Web 
> pages. They would think that something was wrong with the search 
> engine. 

I see no reason why that should be the case; and short of actual user
tests with well-designed warnings I don't suppose we'll ever be sure.

I would however definitely suggest better messages, since "WARNING"
verges on being meaningless. Perhaps "HTML (corrupted)" and "XHTML
(corrupted)" for documents that cite (or imply) a standard document type
but clearly fail to conform to it, "text/html (non-standard variant)"
for text/html documents that do not cite (or imply) a standard document
type, and "XHTML (broken)" for non-well-formed XHTML.

I can imagine end-users ignoring such warnings because they don't
understand or care. But a search engine isn't doing its job properly if
it fails to explain its own messages. That's the potential usability
flaw, not the inclusion of the messages themselves.

I think you underestimate the brand power of Google, Yahoo, and MSN.
Rightly or wrongly, end-users trust these guys. If Google says 90% of
the web is corrupted, but Google otherwise functions normally, then 90%
of the web is corrupted.

Conversely, Site authors and developers, however, would be most unlikely
to ignore such warnings from one of the big three search engines,
because they're incredibly embarrassing. Which would mean that 90%
figure would shrink fast. It would become an SEO priority.

> And they would be right.

How so? Search engines have long provided format information about
search results. This is little different. It would make even more sense
if the search engine offered to Tidy the corrupted content up (just as
Google offers to transform PDF and Word documents to HTML).

--
Benjamin Hawkes-Lewis

Received on Monday, 18 December 2006 02:57:08 UTC