W3C home > Mailing lists > Public > www-archive@w3.org > February 2013

Re: Information on HTML 5

From: Julian Reschke <julian.reschke@gmx.de>
Date: Thu, 14 Feb 2013 14:03:23 +0100
Message-ID: <511CE09B.8030802@gmx.de>
To: Ian Hickson <ian@hixie.ch>
CC: Bjoern Hoehrmann <derhoermi@gmx.net>, www-archive@w3.org
On 2013-02-14 06:42, Ian Hickson wrote:
> On Wed, 13 Feb 2013, Bjoern Hoehrmann wrote:
>>
>> Pages like http://www.rfc-editor.org/info/rfc4329 unfortunately have
>> code like `<title>Information on RFC&nbsp4329</title>` currently, which
>> interestingly shows up exactly like that in Google search results
>
> Not for me; do you have a sample (Google) URL showing this?
>
> This page seems to have no "nbsp" in the titles:
>
> https://www.google.com/search?rls=en&q=http://www.rfc-editor.org/info/rfc4329&ie=UTF-8&oe=UTF-8#hl=en&client=safari&tbo=d&rls=en&sclient=psy-ab&q=http:%2F%2Fwww.rfc-editor.org%2Finfo%2Frfc4329&oq=http:%2F%2Fwww.rfc-editor.org%2Finfo%2Frfc4329&gs_l=serp.3...157993.157993.4.158293.1.1.0.0.0.0.86.86.1.1.0.les%3B..0.0...1c.2.3.psy-ab.N0gVqvbVekM&pbx=1&bav=on.2,or.r_gc.r_pw.r_cp.r_qf.&bvm=bv.42452523,d.cGE&fp=d5bbde17dd6cca0a&biw=1397&bih=1323
>
>
>> even though browsers treat the kaput reference as `&nbsp;`.
>
> As per the spec.
>
>
>> Surely at least if documents switch on the right doctype mode, Google
>> will use a standards compliant HTML parser? Maybe this will suffice to
>> find out in a while...
>
> There's only one parser per the standard, it defines how you parse pages
> regardless of DOCTYPE (there's maybe two things in the parser I think
> that are affected by the precise DOCTYPE).
> ...

The point being that the Google Search Engine apparently does not parse 
HTML in the way described by the spec (at least with respect to "kaput 
references"). It would be interesting to know why.

Best regards, Julian
Received on Thursday, 14 February 2013 13:03:59 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 14 February 2013 13:03:59 GMT