Re: copyright issues for proxy caches and archival services

On Tue, 4 Mar 1997, Fred Douglis wrote:

> I understand the issue of copyright and caching has come up on this list in 
> the past, before I subscribed.  I don't want to open a can of worms, but I'd 
> like to get a sense of how people think this will go.  
> 
> I'm interested in two respects: one that's probably been debated, and is being
> debated, and another more obscure case.  The obvious debate is whether proxy
> caches (and browsers, for that matter) violate copyright by caching documents.
> See for instance http://www.ipmag.com/schlacht.html for a discussion of this,

schlacht.html (23-Aug-96) says that there are no current standards for
recognizing expiry information, but "Expires" has been defined in HTTP/1.0
for some considerable time (and I know has been implemented correctly by
Netscape Navigator for a while). Points about advertisements, news
stories, etc. are addressed by proper use of Expires in HTTP/1.0 and
Cache-Control: Max-Age in HTTP/1.1. Recent servers such as Apache 1.2
allow for easier control of Expires and other HTTP headers outside of CGI
scripts. 

There has been much discussion of hit-metering in the HTTP and cache
communities. I believe it's still not completely resolved (how
to meter hits while continuing to efficiently cache pages). Workarounds 
include using a non-cacheable redirect to a page or image and counting hits
to that. In any case, advertisers are used to uncertainty in media such
as newspapers, billboards, TV, etc.


> I understand that folks on this list have discussed using headers to state 
> explicitly that something can or can't be cached.  Of course some headers do 
> this already in one way or another, but a big question is what the default 
> should be.
Making things totally uncacheable by default would have a huge negative
effect on the Web - and you think it's slow now? Cache is an important
tool, especially for users outide the continental USA, where otherwise
millions of identical copies of rotating email icons etc. would consume
expensive undersea bandwidth. See 
http://vancouver-webpages.com/CacheNow/ for a pro-cache viewpoint and
sundry links (URLs of related sites welcome).

> I would be very interested in having some standard for controlling caching, 
> archival, and markup, with respect to copyright.  This makes sense as optional 
> HTTP headers, rather than (say) embedding them in the content -- though I 
> suppose that would work too.  While such optional headers could be established 
> de facto by coming into common use, defining them in the standard should lead 
> to more widespread use as well as perhaps having a greater weight should 
> issues actually arise (how does an optional header with no predefined meaning 
> actually waive one's copyright and give permission to do anything?).

Cache-control is well covered in the HTTP spec.
Archival (by search engines) is controlled somewhat by the /robots.txt file
and ROBOTS META tag.
I have suggested 
ftp://ds.internic.net/internet-drafts/draft-daviel-web-copy-control-00.txt
as a standalone META tag and there is also 
ftp://ds.internic.net/internet-drafts/draft-reagle-pics-copyright-00.txt
using the PICS format which address these issues but have received little
discussion. These provide for a lightweight tag to simply say whether
an object may be copied or quoted, which could be read by an automated agent
such as a search engine.

The Dublin Core initiative 
http://purl.org/metadata/dublin_core_elements
provides a RIGHTS element whose use is currently not well defined.


> Comments are solicited.  I do hope this doesn't turn into a flame war; I 
> certainly am not advocating strict copyright enforcement and I dearly hope 
> that copyright law will catch up with the technology at least as far as 
> caching goes.  But in the meantime, and with respect to the other copyright 
> issues, I would like to push forward.

IMO, if you're trying to sell something copyrighted on the Web then
you use authentication or a secure server (transactions on which cannot
be cached). Then you hope no-one buys it and copies it to a free server
somewhere that doesn't abide by the Berne Convention. If it's time-value
material such as current stock quotes that's not so much of a problem.

Electronic copy-control measures are in development at various places;
keyword ECMS, e.g. http://www.mcps.co.uk/hometest , 
http://www.imprimatur.alcs.co.uk
Some of these use decryption tools in a special browser that does not
permit saving or printing (except perhaps as a bitmapped screen dump -
horribly inefficient for text). Indeed such a device could conceivably
coexist with http cache - the cached object would be unreadable
without the key, which a user would pay for, assuming that many keys
could decrypt the same object.

There is a question of what people expect when they put up an insecure Web 
page which says "Copyright". Obviously they want people to see it, which
ipso facto involves many copies being loaded into DRAM, video RAM, 
probably private disk cache and perhaps public or semi-public cache.
Clear copyright violations, IMO, involve:
 Transferring the work to another media such as CD-ROM or paper.

 Reusing an embedded object such as a photograph, movie, sound file, 
 applet etc. in another work (if forbidden by the copyright. Some
 authors might allow this with attribution), whether or not the 
 embedded object is linked or copied. From a global point of view, it
 may be more efficient if such objects are linked. (From the hosts point
 of view, linked objects cost bandwidth, even if it's just 304's.)

 Stripping advertisements from a page and re-serving the modified
 page.

 Plagiarism; lifting chunks of content and changing the authors
 name.

Activities which IMO do not involve moral copyright violation (legal, maybe)
include:
 Search engine indexing - the engine is effectively quoting a small portion,
  which could be allowed under the fair use doctrine. More problematic
  for objects such as reviews which cannot be quoted. Current engine practice
  only quotes the beginning of the document, though.
 Cacheing - a request for the original URL gets a cached, unmodified, copy.
 Mirroring - a request for a local copy (different URL) gets a
 mirrored unmodified copy.
In these cases, the organisation which is doing the cacheing or mirroring
derives little if any direct benefit from the activity. A user viewing the
cached or mirrored objects is probably unaware of the copy, since the 
objects do not necessarily endorse or link to the copy host. The copy host
is merely providing an enhanced service for local users, in effect making 
the copyright holder look good by providing their objects at greater 
bandwidth than is available directly from the origin. I agree that 
objects must not be allowed to get too stale, but cache and mirror software
deals with this. (Note - mirroring requires careful treatment of relative
URLs, and may affect PICS information). 

Andrew Daviel
mailto:andrew@vancouver-webpages.com

Received on Saturday, 8 March 1997 00:34:03 UTC