RE: Suggestion: 'rel="unrelated"' from Mark Birbeck on 2005-01-22 (www-html@w3.org from January 2005)

From: Mark Birbeck <mark.birbeck@x-port.net>
Date: Sat, 22 Jan 2005 18:09:56 -0000
To: "'Trejkaz'" <trejkaz@trypticon.org>
Cc: <www-html@w3.org>
Message-ID: <013701c500ad$97711540$6f01a8c0@W100>
Trejkaz,

> On Sunday 23 January 2005 02:07, Mark Birbeck wrote:
> > For example, all the blogging software companies could 
> indicate that 
> > the 'type' of the page was a blog. Or they could mark up 
> the comments 
> > area as 'comments'.
> 
> I would hope it's the latter, unless you intend to penalise 
> the owner of the 
> weblog the same way that the spammers are penalised.

You could deduce various things about the page -- or 'infer' as they like to
say in the RDF-world ;) -- if you knew its type.

Having said that, I favour a generalised version of the second solution,
where a block is marked up to indicate that it originates from another site.
This would work for RSS feeds, portals, imported weather reports, as well as
blog comments. What Google does with that information is up to it -- it
could reduce the rank, choose to 'not follow' or whatever.

I've mentioned the 'inclusion' problem before, but here's an example. If you
look at the RSS feed for World News on the BBC:

  <http://news.bbc.co.uk/rss/newsonline_world_edition/uk_news/rss091.xml>

you'll see that there are stories such as:

  <item>
    <title>Iraq exile vote runs into trouble</title> 
    <description>
      Iraqis living abroad are being given more time to register
      for the 30 January poll because of a low turnout.
    </description> 
    <link>http://news.bbc.co.uk/go/click/rss/0.91/public/
      -/2/hi/middle_east/4198071.stm</link> 
  </item>
  <item>
    <title>Stately home fire 'still burning'</title> 
    <description>
      Pockets of fire are still burning at Allerton Castle in North
      Yokrshire more than 12 hours after the devastating blaze started.
    </description>
    <link>http://news.bbc.co.uk/go/click/rss/0.91/public/
      -/2/hi/uk_news/england/north_yorkshire/4197345.stm</link> 
  </item>

(The spelling mistake on Yorkshire is the BBC's, not mine!)

Now, if both you and I make an explicit link to the first story from our
web-sites or blog pages, Google rightly takes note of that and assigns some
significance to the fact that we have both referred to the same article. If
both of our web-sites were about Iraq, it may even assign the *target*
article a higher ranking for searches on 'Iraq'.

However, if on our servers we automatically read the BBC's RSS feed, apply
some XSLT and then deliver the page, we haven't really linked to the story,
we've linked to the feed. We've actually given up control of part of our
web-site to some 3rd party, and in this case it seems to me that it would be
wrong to give the link the same weight as the first situation where we both
linked explicitly to that article.

Now, it may be the case that the quick turnaround on RSS feeds means that
situations like this don't distort the numbers that much, but I think the
example is worth thinking about since it is actually the same problem as the
'comments on blogs' one; search engines need to differentiate between "first
order" and "second order" links, but at the moment they can't.

It seems to me that on my portal, web-site, or blog, I can easily indicate
that the links to the news stories are from an external source -- I could
use the @class attribute or the new @role attribute on a <div>, or a <link>
in the header that points to all the included blocks, or even <blockquote>
and <cite>. (The latter having the bonus that stating the source allows
Google to do further clever things, by assigning weight to more frequently
'quoted' sources or feeds.)

So, taking a spam comment from the example that [1] actually links to [2]
(Dave Barry is probably fed up of all the extra traffic!):

  <div class="comments-body">
    <p>
      approved less) gotta you. for  by not DEA. 325 pain acute
      Each is <br /> mg FDA love Care or the 50 is
      http://www.propecia-i.com  by Ortho-McNeil. You
      brand for pill pills. mg by
      <a href='http://www.propecia-i.com'>Propecia</a>
      short-term controlled sexual in owned Designed
      and days 2004. (5
    </p>
    <span class="v1">
      <br />Posted by:
      <a href="http://www.propecia-i.com">Propecia</a> on November
      9, 2004 08:52 AM
      </p>
    </span>
  </div>

We could represent this in XHTML 1/HTML 4 as:

  <div class="comments-body">
    <blockquote>
      ...
    </blockquote>
    <cite class="v1">
      <br />Posted by:
      <a href="http://www.propecia-i.com">Propecia</a> on November
      9, 2004 08:52 AM
      </p>
    </cite>
  </div>

We have now indicated that this block does not originate with us. The search
engines can conclude whatever they want from that. Given that so much of the
web nowadays is made up of blogs, discussion forums, web-rendered
newsgroups, RSS feeds, portals and portlets, and so on, it seems to me that
this is a far better way of addressing the problem.

(And in XHTML 2 it's even easier.)

Regards,

Mark

[1] http://www.google.com/googleblog/2005/01/preventing-comment-spam.html
[2] http://weblog.herald.com/column/davebarry/archives/012729.html


Mark Birbeck
CEO
x-port.net Ltd.

e: Mark.Birbeck@x-port.net
t: +44 (0) 20 7689 9232
w: http://www.formsPlayer.com/
b: http://internet-apps.blogspot.com/

Download our XForms processor from
http://www.formsPlayer.com/
Received on Saturday, 22 January 2005 18:10:47 UTC