- From: Dan Brickley <danbri@w3.org>
- Date: Sat, 22 Jan 2005 21:52:55 +0000
- To: www-html@w3.org
- Cc: dean@w3.org, shellen@google.com, kmarks@technorati.com, tantek@technorati.com
I've been following the threads in http://lists.w3.org/Archives/Public/www-html/2005Jan/ about the http://www.google.com/googleblog/2005/01/preventing-comment-spam.html proposal, and the draft definition at http://developers.technorati.com/wiki/RelNoFollow I share the various www-html misgivings about the name, but am otherwise quite optimistic. Mark's review comments sugggest this is just for from blog content, but I think that's not quite it. There are a number of scenarios (eg. the automatic hyperlinking of URLs in emails archived by W3C at lists.w3.org, or in Wikis) where untrusted or semi-trusted hypertext content is published on the Web. While a normal hyperlink doesn't 'formally' assert that the linking document (or its author or hosting site) endorses the referenced content, it is nevertheless true that in the normal course of civilised Web behaviour there are statistical trends that can be very usefully exploited. Most Web content doesn't contain random links to unrelated or horribly spammy sites. Rather, it tends to link to something that the document author(s) have considered pertinent to their own content. I welcome the "nofollow" effort because it shows there is willingness from a broad group to work towards making the intent behind hyperlinks a little more evident to computers. The name I think does mislead. The name is in the tradition of the robots.txt and <meta name="robots" content="noindex,nofollow"> construct (talking of which, is [1] the best existing documentation for deployed practice?). The "don't follow this link" sense seems overly imperative. I wonder if the name could be retained, but given a more passive, declarative reading. I don't believe we should get into the business of taxonomising user-agent types (classic Web browsers, blog readers, voice browsers, offline readers, search engines, directories, feedster/technorati/etc...). If the intent were really that the link not be followed by certain kinds of agent, the spec ought really to say which kind. Which seems a fruitless path, since it gets stuck in implementation details instead of focussing on the core business: providing all such agents with evidence that assists evaluation of the significance of the hyperlink and the chances that it references something dodgy. IMHO "nofollow" is a small part of a bigger story, but one that's worth getting spec'd up properly (eg. so something like it could be used in SVG, etc). http://developers.technorati.com/wiki/RelNoFollow has (currently; it's a wiki) the following text: > nofollow > Indicates that the referred resource was not necessarily linked to > by the author of the page, and thus said reference should not > afford the referred resource any additional weight or ranking by user > agents. The name and definition seem disconnected somewhat, and the reference to 'weight' and 'ranking' allude to popular knowledge of techniques such as those used by Google, without really defining them. It also appeals to a notion of "author of page" that may well not be universally applicable (eg. who is "the author" of the page whose URI is [2]? Björn? webmaster@w3.org? nobody?). I would like to see "nofollow" (or a renaming) couched purely in terms of a relationship between two documents, minimising references to other entities such as author(s), publisher(s), search engines, user agents etc. Here's an experiment to that end, following something of the style used in http://www.w3.org/TR/html401/types.html#type-links nofollow: "Refers to a document whose contents do not necessarily follow in any way from the topic or themes of the current document. This type of relationship is typically expressed within a document that includes hypertext content whose origin is unknown or untrusted." (elaborating) Examples include user-supplied comments, feedback forms, aggregated content, weblog trackback excerpts, Wiki systems, HTML views of mail/news content, discussion boards, and Web-based email clients. The "nofollow" relationship is designed to provide a simple construct that can be used when (re)publishing pieces of hypertext by adding "nofollow" to a rel attribute. The absence of such an attribute in no way implies an endorsement of the linked document; "nofollow" simply provides one very basic mechanism for representing skepticism about referenced content. Richer metadata (RDFXML, PICS etc) can be used in applications that need more than the basic information provided by "nofollow". Having attempted this, I'm not 100% myself convinced yet. I can see "nofollow" fever taking off in a way that could obscure the original motivating usecases. Eg. political blogs using it to cancel out link-karma to sites they critique; newspaper sites using it to avoid boosting or appearing to endorse blog articles, etc. I also like Mark Birbeck's suggestion that entire DIV'd or class'd sections of a page might be marked in this way, rather than focussing solely on hyperlinks. In passing, http://developers.technorati.com/wiki/VoteLinks is pretty interesting, although a much bigger undertaking than "nofollow"... cheers, Dan [1] http://www.robotstxt.org/wc/meta-user.html The Robots META tag is a simple mechanism to indicate to visiting Web Robots if a page should be indexed, or links on the page should be followed. [2] http://lists.w3.org/Archives/Public/www-html/2005Jan/0089.html
Received on Saturday, 22 January 2005 22:04:40 UTC