[whatwg] Trying to work out the problems solved by RDFa

On Fri, 9 Jan 2009, Ben Adida wrote:
> 
> Is inherent resistance to spam a condition (even a consideration) for 
> HTML5?

We have to make sure that whatever we specify in HTML5 actually is going 
to be useful for the purpose it is intended for. If a feature intended for 
wide-scale automated data extraction is especially susceptible to spamming 
attacks, then it is unlikely to be useful for wide-scale automated data 
extraction.


> If so, where is the concern around <title>, which is clearly featured in 
> search engine results?

Nobody is suggesting that user agents derive any behavior from <title>, so 
it doesn't matter if <title> is spammed or not. The only effect would be 
some spam in the user's session history. Furthermore, <title> is page- 
wide, meaning that the actual page author would have to spam the page for 
it to be spamed. It is less likely for a user to intentionally visit a 
spammy page than for a user to visit a page that happens to contain spammy 
content embedded within it (e.g. in blog comments).

If browsers were expected to crawl all pages for all links and then 
populate the browser's interface with the most popular links, then one 
would quickly expect everyone's browsers to be advertising Viagra, porn 
sites, and the like. However, browsers don't do this kind of processing -- 
indeed, this kind of processing appears to be exactly what RDFa proponents 
are trying to enable (though to what end, I'm still trying to find out, 
since nobody has actually replied to all the questions I asked yet [1]).

Note that search engines aren't the problem here -- large operations like 
search engines are quite capable of running the massive processing 
required to filter spam. The problem is automated processing on the 
client, where those resources aren't available.

[1] http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2008-December/018023.html

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Friday, 9 January 2009 15:37:30 UTC