- From: Mark Birbeck <mark.birbeck@x-port.net>
- Date: Sat, 22 Jan 2005 18:09:56 -0000
- To: "'Trejkaz'" <trejkaz@trypticon.org>
- Cc: <www-html@w3.org>
Trejkaz, > On Sunday 23 January 2005 02:07, Mark Birbeck wrote: > > For example, all the blogging software companies could > indicate that > > the 'type' of the page was a blog. Or they could mark up > the comments > > area as 'comments'. > > I would hope it's the latter, unless you intend to penalise > the owner of the > weblog the same way that the spammers are penalised. You could deduce various things about the page -- or 'infer' as they like to say in the RDF-world ;) -- if you knew its type. Having said that, I favour a generalised version of the second solution, where a block is marked up to indicate that it originates from another site. This would work for RSS feeds, portals, imported weather reports, as well as blog comments. What Google does with that information is up to it -- it could reduce the rank, choose to 'not follow' or whatever. I've mentioned the 'inclusion' problem before, but here's an example. If you look at the RSS feed for World News on the BBC: <http://news.bbc.co.uk/rss/newsonline_world_edition/uk_news/rss091.xml> you'll see that there are stories such as: <item> <title>Iraq exile vote runs into trouble</title> <description> Iraqis living abroad are being given more time to register for the 30 January poll because of a low turnout. </description> <link>http://news.bbc.co.uk/go/click/rss/0.91/public/ -/2/hi/middle_east/4198071.stm</link> </item> <item> <title>Stately home fire 'still burning'</title> <description> Pockets of fire are still burning at Allerton Castle in North Yokrshire more than 12 hours after the devastating blaze started. </description> <link>http://news.bbc.co.uk/go/click/rss/0.91/public/ -/2/hi/uk_news/england/north_yorkshire/4197345.stm</link> </item> (The spelling mistake on Yorkshire is the BBC's, not mine!) Now, if both you and I make an explicit link to the first story from our web-sites or blog pages, Google rightly takes note of that and assigns some significance to the fact that we have both referred to the same article. If both of our web-sites were about Iraq, it may even assign the *target* article a higher ranking for searches on 'Iraq'. However, if on our servers we automatically read the BBC's RSS feed, apply some XSLT and then deliver the page, we haven't really linked to the story, we've linked to the feed. We've actually given up control of part of our web-site to some 3rd party, and in this case it seems to me that it would be wrong to give the link the same weight as the first situation where we both linked explicitly to that article. Now, it may be the case that the quick turnaround on RSS feeds means that situations like this don't distort the numbers that much, but I think the example is worth thinking about since it is actually the same problem as the 'comments on blogs' one; search engines need to differentiate between "first order" and "second order" links, but at the moment they can't. It seems to me that on my portal, web-site, or blog, I can easily indicate that the links to the news stories are from an external source -- I could use the @class attribute or the new @role attribute on a <div>, or a <link> in the header that points to all the included blocks, or even <blockquote> and <cite>. (The latter having the bonus that stating the source allows Google to do further clever things, by assigning weight to more frequently 'quoted' sources or feeds.) So, taking a spam comment from the example that [1] actually links to [2] (Dave Barry is probably fed up of all the extra traffic!): <div class="comments-body"> <p> approved less) gotta you. for by not DEA. 325 pain acute Each is <br /> mg FDA love Care or the 50 is http://www.propecia-i.com by Ortho-McNeil. You brand for pill pills. mg by <a href='http://www.propecia-i.com'>Propecia</a> short-term controlled sexual in owned Designed and days 2004. (5 </p> <span class="v1"> <br />Posted by: <a href="http://www.propecia-i.com">Propecia</a> on November 9, 2004 08:52 AM </p> </span> </div> We could represent this in XHTML 1/HTML 4 as: <div class="comments-body"> <blockquote> ... </blockquote> <cite class="v1"> <br />Posted by: <a href="http://www.propecia-i.com">Propecia</a> on November 9, 2004 08:52 AM </p> </cite> </div> We have now indicated that this block does not originate with us. The search engines can conclude whatever they want from that. Given that so much of the web nowadays is made up of blogs, discussion forums, web-rendered newsgroups, RSS feeds, portals and portlets, and so on, it seems to me that this is a far better way of addressing the problem. (And in XHTML 2 it's even easier.) Regards, Mark [1] http://www.google.com/googleblog/2005/01/preventing-comment-spam.html [2] http://weblog.herald.com/column/davebarry/archives/012729.html Mark Birbeck CEO x-port.net Ltd. e: Mark.Birbeck@x-port.net t: +44 (0) 20 7689 9232 w: http://www.formsPlayer.com/ b: http://internet-apps.blogspot.com/ Download our XForms processor from http://www.formsPlayer.com/
Received on Saturday, 22 January 2005 18:10:47 UTC