- From: Julian Reschke <julian.reschke@gmx.de>
- Date: Thu, 14 Feb 2013 14:03:23 +0100
- To: Ian Hickson <ian@hixie.ch>
- CC: Bjoern Hoehrmann <derhoermi@gmx.net>, www-archive@w3.org
On 2013-02-14 06:42, Ian Hickson wrote: > On Wed, 13 Feb 2013, Bjoern Hoehrmann wrote: >> >> Pages like http://www.rfc-editor.org/info/rfc4329 unfortunately have >> code like `<title>Information on RFC 4329</title>` currently, which >> interestingly shows up exactly like that in Google search results > > Not for me; do you have a sample (Google) URL showing this? > > This page seems to have no "nbsp" in the titles: > > https://www.google.com/search?rls=en&q=http://www.rfc-editor.org/info/rfc4329&ie=UTF-8&oe=UTF-8#hl=en&client=safari&tbo=d&rls=en&sclient=psy-ab&q=http:%2F%2Fwww.rfc-editor.org%2Finfo%2Frfc4329&oq=http:%2F%2Fwww.rfc-editor.org%2Finfo%2Frfc4329&gs_l=serp.3...157993.157993.4.158293.1.1.0.0.0.0.86.86.1.1.0.les%3B..0.0...1c.2.3.psy-ab.N0gVqvbVekM&pbx=1&bav=on.2,or.r_gc.r_pw.r_cp.r_qf.&bvm=bv.42452523,d.cGE&fp=d5bbde17dd6cca0a&biw=1397&bih=1323 > > >> even though browsers treat the kaput reference as ` `. > > As per the spec. > > >> Surely at least if documents switch on the right doctype mode, Google >> will use a standards compliant HTML parser? Maybe this will suffice to >> find out in a while... > > There's only one parser per the standard, it defines how you parse pages > regardless of DOCTYPE (there's maybe two things in the parser I think > that are affected by the precise DOCTYPE). > ... The point being that the Google Search Engine apparently does not parse HTML in the way described by the spec (at least with respect to "kaput references"). It would be interesting to know why. Best regards, Julian
Received on Thursday, 14 February 2013 13:03:59 UTC