Re: Getting beyond the ping pong match (was RE: Cleaning House)

On May 6, 2007, at 00:46, Jukka K. Korpela wrote:

> On Sun, 6 May 2007, Henri Sivonen wrote:
>> When Google started inferring authority from links, did they ask  
>> anyone if they had the right?
> I have no idea of what you mean by "inferring authority", but why  
> would that matter?

I meant computing the Page Rank. It is based on an observation about  
how markup is used without being based on a standard stating that  
markup should be used to communicate what is inferred.

> If someone makes wrong or questionable inferences, is that an  
> excuse for making such inferences part of a specification?

In some cases, inferences may be useful if they are predominantly  
right and wrong only a relatively rarely. When making inferences from  
Web content, it isn't useful to reject an idea based on a single  
counter example (or a handful thereof) because if you did, you  
couldn't infer anything. (Which, I presume, would be less useful than  
making a lot of right inferences and a few wrong ones.)

>>> (For example, in my page about intellectual rights, I may well  
>>> have marked parts _discussing_ copyright issues with such an  
>>> attribute,
>> I know you have pages discussing copyright but do you really use  
>> class='copyright' for something other than copyright notices?
> I don't think my actual usage matters the least here. I simply  
> presented a plausible example.

Actual usage does matter when assessing what inferences can be made  
in *practice*.

>> Do you expect the usage of class='copyright' for something other  
>> than copyright notices to be a common practice to a degree that it  
>> would be unreasonable to assume that class='copyright' marks a  
>> copyright notice?
> I'm not making assumptions. _You_ would make a wild assumption if  
> you assume that class='copyright' marks a copyright notice.

Considering that research shows that "copyright" is one of the most  
common class values, it is a reasonable hypothesis that it is used to  
mark up copyright notices and that there is demand for markup for  
copyright notices. If it indeed is the case that the vast majority of  
class='copyright' is meant to mark up copyright notices, paving this  
cowpath requires fewer people to change their way than badgering all  
of them to switch to something new and theoretically more pure.

Of course, in making this call, consumer use cases for identifying  
human-readable copyright notices should be considered to determine if  
there are consumer use cases and if so how much they are harmed by  
noise (i.e. class='copyright' not marking a copyright notice).

Henri Sivonen

Received on Sunday, 6 May 2007 12:54:54 UTC