- From: Andrew Daviel <andrew@andrew.triumf.ca>
- Date: Sat, 8 Mar 1997 00:32:10 -0800 (PST)
- To: Fred Douglis <douglis@research.att.com>
- Cc: http-wg@cuckoo.hpl.hp.com
On Tue, 4 Mar 1997, Fred Douglis wrote: > I understand the issue of copyright and caching has come up on this list in > the past, before I subscribed. I don't want to open a can of worms, but I'd > like to get a sense of how people think this will go. > > I'm interested in two respects: one that's probably been debated, and is being > debated, and another more obscure case. The obvious debate is whether proxy > caches (and browsers, for that matter) violate copyright by caching documents. > See for instance http://www.ipmag.com/schlacht.html for a discussion of this, schlacht.html (23-Aug-96) says that there are no current standards for recognizing expiry information, but "Expires" has been defined in HTTP/1.0 for some considerable time (and I know has been implemented correctly by Netscape Navigator for a while). Points about advertisements, news stories, etc. are addressed by proper use of Expires in HTTP/1.0 and Cache-Control: Max-Age in HTTP/1.1. Recent servers such as Apache 1.2 allow for easier control of Expires and other HTTP headers outside of CGI scripts. There has been much discussion of hit-metering in the HTTP and cache communities. I believe it's still not completely resolved (how to meter hits while continuing to efficiently cache pages). Workarounds include using a non-cacheable redirect to a page or image and counting hits to that. In any case, advertisers are used to uncertainty in media such as newspapers, billboards, TV, etc. > I understand that folks on this list have discussed using headers to state > explicitly that something can or can't be cached. Of course some headers do > this already in one way or another, but a big question is what the default > should be. Making things totally uncacheable by default would have a huge negative effect on the Web - and you think it's slow now? Cache is an important tool, especially for users outide the continental USA, where otherwise millions of identical copies of rotating email icons etc. would consume expensive undersea bandwidth. See http://vancouver-webpages.com/CacheNow/ for a pro-cache viewpoint and sundry links (URLs of related sites welcome). > I would be very interested in having some standard for controlling caching, > archival, and markup, with respect to copyright. This makes sense as optional > HTTP headers, rather than (say) embedding them in the content -- though I > suppose that would work too. While such optional headers could be established > de facto by coming into common use, defining them in the standard should lead > to more widespread use as well as perhaps having a greater weight should > issues actually arise (how does an optional header with no predefined meaning > actually waive one's copyright and give permission to do anything?). Cache-control is well covered in the HTTP spec. Archival (by search engines) is controlled somewhat by the /robots.txt file and ROBOTS META tag. I have suggested ftp://ds.internic.net/internet-drafts/draft-daviel-web-copy-control-00.txt as a standalone META tag and there is also ftp://ds.internic.net/internet-drafts/draft-reagle-pics-copyright-00.txt using the PICS format which address these issues but have received little discussion. These provide for a lightweight tag to simply say whether an object may be copied or quoted, which could be read by an automated agent such as a search engine. The Dublin Core initiative http://purl.org/metadata/dublin_core_elements provides a RIGHTS element whose use is currently not well defined. > Comments are solicited. I do hope this doesn't turn into a flame war; I > certainly am not advocating strict copyright enforcement and I dearly hope > that copyright law will catch up with the technology at least as far as > caching goes. But in the meantime, and with respect to the other copyright > issues, I would like to push forward. IMO, if you're trying to sell something copyrighted on the Web then you use authentication or a secure server (transactions on which cannot be cached). Then you hope no-one buys it and copies it to a free server somewhere that doesn't abide by the Berne Convention. If it's time-value material such as current stock quotes that's not so much of a problem. Electronic copy-control measures are in development at various places; keyword ECMS, e.g. http://www.mcps.co.uk/hometest , http://www.imprimatur.alcs.co.uk Some of these use decryption tools in a special browser that does not permit saving or printing (except perhaps as a bitmapped screen dump - horribly inefficient for text). Indeed such a device could conceivably coexist with http cache - the cached object would be unreadable without the key, which a user would pay for, assuming that many keys could decrypt the same object. There is a question of what people expect when they put up an insecure Web page which says "Copyright". Obviously they want people to see it, which ipso facto involves many copies being loaded into DRAM, video RAM, probably private disk cache and perhaps public or semi-public cache. Clear copyright violations, IMO, involve: Transferring the work to another media such as CD-ROM or paper. Reusing an embedded object such as a photograph, movie, sound file, applet etc. in another work (if forbidden by the copyright. Some authors might allow this with attribution), whether or not the embedded object is linked or copied. From a global point of view, it may be more efficient if such objects are linked. (From the hosts point of view, linked objects cost bandwidth, even if it's just 304's.) Stripping advertisements from a page and re-serving the modified page. Plagiarism; lifting chunks of content and changing the authors name. Activities which IMO do not involve moral copyright violation (legal, maybe) include: Search engine indexing - the engine is effectively quoting a small portion, which could be allowed under the fair use doctrine. More problematic for objects such as reviews which cannot be quoted. Current engine practice only quotes the beginning of the document, though. Cacheing - a request for the original URL gets a cached, unmodified, copy. Mirroring - a request for a local copy (different URL) gets a mirrored unmodified copy. In these cases, the organisation which is doing the cacheing or mirroring derives little if any direct benefit from the activity. A user viewing the cached or mirrored objects is probably unaware of the copy, since the objects do not necessarily endorse or link to the copy host. The copy host is merely providing an enhanced service for local users, in effect making the copyright holder look good by providing their objects at greater bandwidth than is available directly from the origin. I agree that objects must not be allowed to get too stale, but cache and mirror software deals with this. (Note - mirroring requires careful treatment of relative URLs, and may affect PICS information). Andrew Daviel mailto:andrew@vancouver-webpages.com
Received on Saturday, 8 March 1997 00:34:03 UTC