[whatwg] several messages about XML syntax and HTML5

Henri Sivonen wrote:

> Search engines should not list ill-formed application/xhtml+xml at  
> all, because a user following the link would see the YSoD.

Ah, but what about XHTML 1.0 served as text/html, which is in a weird
twilight zone where it is neither "HTML" nor quite the same as
"text/html (non-standard)"? (But then I suppose one could argue such
XHTML doesn't need to be well-formed either. Maybe just labelling all
such documents as "HTML compatible" would be better.)

> However, in cases of slightly broken text/html, the user could still find the  
> page useful. The search engines are in the business of providing  
> results that users find useful, so search engines should list  
> slightly broken text/html documents.

I don't follow this. How can search engines distinguish between
"slightly broken text/html" and very broken text/html? How can search
engines prejudge how a given breakage will affect how the user wants to
use the page (as a blind user, as a microformats user, as a minority
browser user, etc.)?

> The point is that you shouldn't show users something that they  
> don't understand or care about.

What, like ads? ;)  Or, more seriously, like the information about the
sizes of pages offered by Google search? My guess (and I admit it's only
that) is that "39k" means nothing to an average user, even the ones on
dial-up who might care. Anyhow, this all prejudges what users care
about. If I'm an ordinary user, it's handy to know a page may not be
working because it's broken, not because of some flaw in my browser. And
a /lot/ of pages on the web don't work. Understanding might be a
problem, but that's true of most of the stuff on search engines. The
non-technical users I talk to can't understand the difference between
the address bar, the search bar, and the search input on their homepage.

> Google, Yahoo and MSN aren't in the business of enforcing a standards- 
> compliance agenda. 

Nothing I said implied they were. The apparent absence of validity
warnings from Google's Accessible Search may be more surprising, but I
think the chance of any of them implementing such warnings in their main
search results is zero, regardless of the merits of the case either way.
(It would be /way/ too embarrassing since many, if not most, of those
companies' own webpages don't validate.) I just don't think the
particular argument against it put forward earlier in this thread (about
it scaring users away from Google search) stands up.

> On the contrary, they compete on how well they can  
> rank the relevance of search results even in the absence of the  
> supposedly seache-engine-helping semantic markup.

Generally true, though some important aspects of valid markup do help
search engines; e.g. the requirement of an ALT attribute for IMG
provides search engines with additional text data.

--
Benjamin Hawkes-Lewis

Received on Monday, 18 December 2006 15:42:28 UTC