Re: Research for class="copyright" from Lachlan Hunt on 2007-05-06 (www-html@w3.org from May 2007)

From: Lachlan Hunt <lachlan.hunt@lachy.id.au>
Date: Mon, 07 May 2007 04:22:21 +1000
To: Terje Bless <link@pobox.com>
CC: public-html@w3.org, www-html@w3.org
Message-ID: <463E1CDD.9070300@lachy.id.au>
Terje Bless wrote:
> 
> lachlan.hunt@lachy.id.au (Lachlan Hunt) wrote:
> 
>> Here's a study of 500 pages done by Philip Taylor […]
>>
>> http://canvex.lazyilluminati.com/misc/copyright.html
>>
>> A quick review of the results shows that the majority of them are using
>> class="copyright" for actual copyright notices.
> 
> It also shows that even given that miniscule and statistically 
> insignificant sample there are a disproportionate number that do _not_ 
> use the class 'copyright' for anything even remotely resembling a 
> copyright notice.

What?  Let's look at the numbers and see what you're calling a 
disproportionate number.

There were a total of 86 pages listed, out of 500 searched.  The sites 
that were not listed didn't have class=copyright, or similar, so they 
were excluded.

Of those 86 pages, we can further reduce the count to unique domains 
only.  e.g. there were a couple of wikipedia pages and a couple of sites 
that used a class or id value that conained "copyright" more than once 
per page.  After eliminating duplicates, we're left with 58.

Of those 58, 46 sites used class="copyright" only for elements that 
actually did contain copyright information in the markup.

An additional 2 sites actually did use the elements for copyright 
information, but it gets added to the page using a script, which wasn't 
executed by the parser used for the survey.

There were 6 that contained used class=copyright multiple times for 
content including both copyright notices and non-copyright notices:

* http://www.theelvisweddingchapel.com/
* http://www.themeatrix.com/
* http://www.thex-files.com/
* http://www.aig.com/
* http://www.theooze.com/
* http://www.anomalist.com/


Of the remaining 4 sites:

* http://www.theday.com/
* http://www.history.com/
* http://www.thejournal.com/
* http://www.dickblick.com/

theday.com can be elimiated because it actually used 
id="ctl00_lblCopyRightYear", and dickblick.com can be elimiated because 
it used id="PageTemplate_sharedFooter1_footerCopyrightRow"

They were counted because the survey looked for class or id attributes 
that contained "copyright" within the value, even if it wasn't an exact 
match.

history.com actually had the following markup:

<p class="copyright">
<p>&copy; 1996-2007, A&amp;E Television Networks. All rights reserved.</p>

It's clear that their intention was to use it for copyright information, 
but made a mistake by inserting an additional <p> start tag.

thejournal.com made a similar mistake, where they assigned the 
class="copyright" to the wrong div.

So, to summarise:

54/58 used a class or id value that contained "copyright" in the value 
for copyright notices (including the 6 that used it for non-copyright 
notices as well)

2 were elimiated because they didn't use class="copyright"
2 look like they had the intention of using it for copyright, but made a 
mistake.

To me, that looks like strong evidence in favour of defining 
class=copyright.

> Thank you. It was gracious of you to cite a study that actually 
> disproves your claim.

I never claimed that there were no sites that misused the value.  I only 
asked for evidence to be supplied by those making the claims that there 
was misuse, which would then show whether or not the misuse was of any 
significance.  From this survey, the results show that the misuse is of 
little significance.

-- 
Lachlan Hunt
http://lachy.id.au/
Received on Sunday, 6 May 2007 18:22:40 UTC