Re: Information on HTML 5 from Julian Reschke on 2013-02-14 (www-archive@w3.org from February 2013)

From: Julian Reschke <julian.reschke@gmx.de>
Date: Thu, 14 Feb 2013 14:03:23 +0100
To: Ian Hickson <ian@hixie.ch>
CC: Bjoern Hoehrmann <derhoermi@gmx.net>, www-archive@w3.org
Message-ID: <511CE09B.8030802@gmx.de>

On 2013-02-14 06:42, Ian Hickson wrote:
> On Wed, 13 Feb 2013, Bjoern Hoehrmann wrote:
>>
>> Pages like http://www.rfc-editor.org/info/rfc4329 unfortunately have
>> code like `<title>Information on RFC&nbsp4329</title>` currently, which
>> interestingly shows up exactly like that in Google search results
>
> Not for me; do you have a sample (Google) URL showing this?
>
> This page seems to have no "nbsp" in the titles:
>
> https://www.google.com/search?rls=en&q=http://www.rfc-editor.org/info/rfc4329&ie=UTF-8&oe=UTF-8#hl=en&client=safari&tbo=d&rls=en&sclient=psy-ab&q=http:%2F%2Fwww.rfc-editor.org%2Finfo%2Frfc4329&oq=http:%2F%2Fwww.rfc-editor.org%2Finfo%2Frfc4329&gs_l=serp.3...157993.157993.4.158293.1.1.0.0.0.0.86.86.1.1.0.les%3B..0.0...1c.2.3.psy-ab.N0gVqvbVekM&pbx=1&bav=on.2,or.r_gc.r_pw.r_cp.r_qf.&bvm=bv.42452523,d.cGE&fp=d5bbde17dd6cca0a&biw=1397&bih=1323
>
>
>> even though browsers treat the kaput reference as `&nbsp;`.
>
> As per the spec.
>
>
>> Surely at least if documents switch on the right doctype mode, Google
>> will use a standards compliant HTML parser? Maybe this will suffice to
>> find out in a while...
>
> There's only one parser per the standard, it defines how you parse pages
> regardless of DOCTYPE (there's maybe two things in the parser I think
> that are affected by the precise DOCTYPE).
> ...

The point being that the Google Search Engine apparently does not parse 
HTML in the way described by the spec (at least with respect to "kaput 
references"). It would be interesting to know why.

Best regards, Julian

Received on Thursday, 14 February 2013 13:03:59 UTC