- From: Steven Rowat <steven_rowat@sunshine.net>
- Date: Mon, 21 Sep 2009 13:02:29 -0700
- To: www-tag@w3.org
Ten Use-Cases of Individual Content Authors Requiring Rights/Commerce
Metadata: success in HTML4; HTML5; and CCN
Introduction:
In this document I list 10 individual content-author use cases that
require rights/commerce metadata, and attempt a preliminary
exploration of how well they will be served by HTML4/5 versus
content-centric networking (CCN) [1,2,3]. I place this list and its
discussion on the TAG list because it expands on ideas I presented in
a previous TAG post [4] and a related bug [5], and because the June
F2F meeting of the TAG [6] discussed metadata at length, and made
reference to CCN and the work of Van Jacobson in that regard. As well,
HTML5 and Metadata are major items on the September F2F agenda [7]. I
have also seen calls for metadata related use-cases widely recently;
including from the TAG, the HTML WG, and the RDFa group.
After the use-case list I present the following:
-> Discussion of HTML4 and HTML5 ability to deliver the metadata
preferences of the 10 individual authors
-> Predictions of the Content-centric model (a thumbnail sketch)
-> Proposal (informal) for W3C further involvement
-> Notes
-> References
Ten Use-Cases of Individual Content Authors Requiring Rights/Commerce
Metadata
This is partly an imaginary list, but I believe all are highly
plausible. I've lived situations very similar to three or four of
them, and for three or four others I personally know the people who
live them. Certainly the list is not perfect or exhaustive;
nonetheless, in my view it suffices to indicate that there is a great
range of individuals who have their own content to supply to the
world; some who wish to sell it; and some not to sell but who
nonetheless have other metadata rights needs.
In each of the ten cases the author and their content is described in
general terms, and then the metadata needs for their rights/commerce
preferences are given. In each case the so-called Moral Rights of the
author (authorship; inviolability of the content), which are not
legally transferable [8], are listed in the first line of metadata
attributes; the various desired commercial rights are on the lines
following.
1. An independent medical researcher in the USA who produces a pdf
report about side-effects of a new prescription drug.
He specifies:
authorship; no content modification;
payment per download; no downstream commercializing.
2. A journalist in Africa with an ogg or mp4 video of atrocities in an
ongoing war.
She specifies:
anonymity; no content modification;
free; no downstream commercializing.
3. A writer with a complete novel in pdf, doc, html, and other text
formats.
He specifies:
authorship as pseudonym; no content modification;
payment per download;
downstream commercializing allowed with constraints of: no
advertising (direct sale only); payment of 20% of gross per copy
re-sold; any additional commercial rights for other media must obtain
agreement of the original author.
4. A folk-musician in Siberia who records local throat-singing into
mp3 or ogg vorbis files.
He specifies:
authorship; no content modification;
free streaming of first 10% of content online;
payment per download at sliding scale proportional to user-country's
average yearly income per person;
no downstream commercializing.
5. A whistleblower leaking documents from inside the government
showing evidence of torture practices, in text or pdf.
She specifies:
anonymity; no content modification;
free, but donation requested (user chooses);
no commercializing.
6. A software engineer with a program for a particular OS, in zip or
other compressed format.
He specifies:
authorship; no content modification;
payment per download with constraints/permissions of: demo use is
free on one machine for a month; payment thereafter is due for each
machine in use with the program;
downstream commercializing is possible with constraint of: payment
to original author of 25% of net profits from resale or from any other
commercializing, specifically including advertising.
7. A carpenter with photos and text describing how to build a solar
outhouse, in html or pdf.
He specifies:
authorship; no content modification;
payment per download (pdf); payment per page view (HTML);
no downstream commercializing.
8. A visual artist (oils and acrylics) with high- and low-resolution
JPEGs of the paintings.
She specifies:
authorship; no content modification;
free access online to low-res images of paintings;
payment per download for hi-res images of paintings, calculated by
total surface area of the image;
downstream commercializing permitted with explicit author agreement.
9. An inventor of a patented simplified mechanical tool with a
description in pdf and html text accompanied by embedded jpegs.
He specifies:
authorship; no content modification;
free HTML access to a summary of the tool specification;
payment per pdf download of the complete specification including
patent document;
downstream commercializing allowed with constraints of: no resale of
the patent or description itself; manufacturing of the tool for sale
in any given country is permitted after explicit agreement with the
inventor for that country.
10. A digital game programmer with a new multi-player game for online
and/or offline use.
He specifies:
authorship; content modification allowed with constraint of: no
commercialization once modified;
payment for online use by subscription (monthly, yearly); or single
payment for downloaded version;
downstream commercializing permitted for unmodified download version
only, with constraints of: 30% of gross sales receipts paid to
original author, as well as 10% of site advertising revenue (if any)
from the downstream page selling the game.
Discussion:
I think all can agree that there currently exist all over the globe
individuals with their own digitized content (of any kind: science,
art, music, education, journalism, programming, etc.) who wish to
distribute that content on-line and who wish various levels of control
over their legal rights and/or sale of that content.
I believe the unresolved questions about this fact that are relevant
for the W3C to consider are these two:
first, how many of these individuals exist (1 million? 100
million? a billion?);
second, how well served can any of them be by the existing
tools in, or extended from: HTML4; HTML5 as recently proposed [9]; or CCN.
The first question -- of how many such individuals there are -- is
both important and elusive. If there are many -- say 50-100 million or
more worldwide -- then it may be worthwhile applying a major
architectural change to the web as a whole to accommodate their needs.
I reasoned that studying ten widely different use-cases and their
needs would give at least some initial clues as to how many people
like this there are.
And after constructing the list, I think it's reasonable to suggest
that this is neither a local issue nor a small one; the exchange of
money, and the exchange of information, are things that all human
beings engage in. So a truly simple and direct system enabling both
together could be used by a significant proportion of human beings:
the authors represented above could eventually be billions of people.
We have no way of knowing until the system is available and works
well. At present it doesn't.
The second question, then, is how well HTML4, HTML5, and the projected
abilities of CCN, could accomplish this.
For HTML4, on prima-facie evidence, the answer appears to be:
extremely poorly or not at all. HTML4 has been the web standard for 12
years and there has been no successful development of what used to be
called 'micropayments', or digital rights controls by open standards.
A full language has been developed to enable this (ODRL [10]) but it
has not been implemented widely. Yet twelve years is an age in
internet time; apparently some thing or things are preventing
rights/commerce from proceeding at the individual/browser level.
It seems best at this point to skip to the first of the reasons why
this might be: which has been expressed recently very succinctly by
Maciej Stachowiak on the HTML WG list, in answer to my concerns: he wrote:
"HTML5 does not provide anything specific to enable selling of
content, but then, neither did HTML4. E-commerce and revenue models
are out of scope for HTML." [11]
So, in both HTML4 and HTML5 there is no attempt to specifically
include the rights/commerce preference controls listed for the 10 use
cases. And such individual author controls have not developed in any
useful way (with HTML4) in 12 years. Based on this, it appears that
the same thing would happen for HTML5: no progress will occur (for
individuals). As I've argued elsewhere [4, 5, 12], those with deep
pockets can monetize the web under HTML4 and do so more easily under
HTML5; and are doing so; but not individuals.
And why has the attempt not been made? I've come to believe there are
more fundamental reasons: reasons why neither HTML4 nor HTML5
attempted to facilitate information commerce in its most core sense,
content going from one person to another. And here it seems
appropriate to turn to the content-centric networking theory, which
provides several such reasons why individual rights/preference
controls might be so difficult as to be actually impossible in the
current architecture.
Predictions of the Content-centric model
The following summary is based almost exclusively on Van Jacobson's
descriptions and discussion of past, current, and hypothetical-future
data flows in our society [1,2]; he defines three states; "1" is
historic; "2" is present; "3" is projected:
1. telephony (specified path)
2. internet (point to point calls, using TCP)
3. content-centric (multi-point to multi-point)
These are expressed in detail in his Google tech talk of 2006 [1] and
his print interview in 2009 [2]. Summaries are given on the PARC page
[3] (where he is leading a group who are developing CCN for eventual
deployment throughout the internet) and PC Magazine in 2007 [13].
I will attempt a thumbnail summary of his ideas here:
1. Internet via TCP (stage 2 above) was originally designed to
share scarce resources (hardware, like printers), but the evolution of
the actual use of the internet instead evolved into the sharing of
plentiful digital content via software. This is a completely different
problem.
2. Thus the architecture was never designed to do what it's being
asked to do. It does its original job well, but a new goal has evolved.
3. The internet attains this new goal badly at present. There are
major difficulties on the internet in several areas, including:
a. Security
b. Scalability
c. Complexity of interoperability.
4. These are not improving, and will remain problematic and will
prevent certain desired goals from being reached, unless the
architecture is changed.
5. Content-centric networking (stage 3 above) can solve all of
these; or at least improve them dramatically relative to stage 2.
a. Security: each packet will be named; the naming will be
registered and secure (in the same way that IP addresses are now).
This is contrasted with the current system, where only the end points
are secure, and false data is regularly inserted between those points,
with false location data.
b. Scalability: since location of the named data is
irrelevant, it does not have to come from where it was first created,
and can smoothly be supplied in any quantity by internet caching and
copying.
c. Complexity of interoperability: since data is named and
secure, it can be carried across borders more easily; firewalls and
ways of checking credentials are less relevant; secure content can be
moved through any OS or medium and still perform the same function.
6. The change from the current internet to content-centric
networking could facilitate just as major a change as the one from
telephony to internet was.
There is far more that I have not attempted to explain here; and far
more than that, that I didn't understand. However, given that the
alternative is a stalemate, I feel I understood enough to say: we need
to take the chance and start actively studying what is required to
move to the CCN model. According to PARC [3] it can be done incrementally.
Finally, in terms of the specific problem that I find myself pursuing
in this essay: supposing that CCN does what it is predicted to do,
will these things help individual authors who supply internet content
that requires rights/commerce data control? Consider the three main
problem areas in #5 above, security; scalability; and complexity:
a. Security
Yes: security will increase dramatically, and the
current lack of security for money/privacy in transactions is
obviously a large impediment to developing a widespread information
rights/commerce system. Conceivably it is the single largest reason
why such a system does not yet exist.
b. Scalability
Yes: Van Jacobson expresses it well:
"Right now, if you're not Google or YouTube — somebody
who's big enough — there's a curse in creating popular content. If you
make something that a lot of people look at and say, "Oooh, this is
really cool!" you've just blown your Web site off the air because the
only way that content can be distributed is from its original source.
"...If you move to a content-centric model, then you
can stop disenfranchising creators because they pay no cost and you
actually stop disenfranchising all the intermediaries, too." [2]
In other words, an original author needs to make no
more than one Registered copy of the content in question; the
internet, which is after all a huge copying machine, will take care of
the rest, even if there is a spike in demand.
c. Complexity of interoperability.
Yes, although the gains here appear to be more
internet-wide and less specific to the problem faced by individual
creators. But still, they may be considerable for both; for instance,
Van Jacobson said:
" ...you're opening yourself up to a world of grief if
you don't have what [Tony Hoare] called 'referential transparency'. If
you can refer to only the container and not the thing that's
contained, then contents can change on you. You have security issues;
you have decidability issues; you have robustness issues. You don't
really know what the bits are, and you can't reason about what the
bits are, because all you can name is a container." [2]
I interpret this to mean that the 'complexity' issue is
often identical with the security issue; attempts to solve the
security issue create more complexity; if it is solved innately by the
architecture, then complexity is reduced, increasing overall
efficiency of the whole system.
Proposal (informal):
Based on the discussion above, I would like to see:
1. A W3C liaison group [*see note 1] formed to consult with
Jacobson's PARC group and determine:
a. What steps could be taken to test implementations of CCN in the
existing internet.
b. What the PARC group still needs in terms of information,
use-cases, or testing resources that the W3C might be able to provide,
in order to enable such implementations.
c. What is a fuller list of advantages and disadvantages, relative
to the current architecture, of different forms of CCN implementation.
2. Based on the results of #1, if implementation in some form seems
likely to bring sufficient advantages, the same or another W3C group
could study:
a. What is the optimum form of the metadata to be placed into (or
accompany) the content packets in CCN. [*see Note 2]
b. What other handshaking might be required to fulfill the actions
intended to flow from this metadata (in browsers, ISPs, etc.) and how
W3C can facilitate this. [*see Note 3]
c. Whether HTML5 (or even 4) can be tailored to allow a full or
partial CCN implementation directly.
d. If not, whether HTML5/4 can at least install hooks that would
allow graceful degradation/interoperability with CCN metadata
protocols, in order to effectively anticipate a large-scale shift to
CCN at a later date.
Notes:
1. In terms of the issue directly addressed in this essay (individual
content-creators control of their work), I suggest that W3C groups
studying the CCN option should have a majority without
conflict-of-interest; in other words, only a minority can be making
their living from direct sales on the web, from corporations
monetizing the web, or from consulting about web monetization. If it
turns out that this is impractical, that most of the interested
parties are involved in some way in web monetization, then at least
the group should have a numerical balance between individual authors
distributing their own content and professionals who code for others
or are members of corporations that do.
2. For example, I believe for the current use-cases ODRL [10] can
carry all the commercial calculations (as well as the moral rights).
However metadata from other vocabularies such as Dublin Core [14] and
FOAF [15] and the extensibility to many others should also be
available; in other words, a protocol like RDFa [16] and/or Microdata
[17] will likely be required to present the metadata, just as it is
currently in HTML4 and is planned for HTML5. (Or possibly a form of
Adobe's XMP [18, 19], if it can become an open standard).
3. For example, an interesting idea provided by Peter Dolan recently
in a late-night talk about CCN and commercialization, is that ISPs are
in a unique position relative to individual content authors and users
both:
a) ISPs taken as a whole already hold credit and private contact
information for both authors and users.
b) ISPs already are capable of counting the flow of packets.
c) ISPs already are accustomed to performing monetary transactions.
d) ISPs already are capable of performing secure transactions when
necessary.
He suggested that they would therefore best suited to assume the role
of counting and validating the outflow of author's registered works
and aggregating the inflow of user payments via a special ISP clearing
house set up for the purpose, in the same way a bank does for cheques.
References:
[1] "A New Way to Look At Networking:
Van Jacobson, Google Tech Talks, 2006"
http://www.youtube.com/watch?v=gqGEMQveoqg&feature=PlayList&p=68A083F6EAFEFE01&index=29
[2] Interview with Van Jacobson, 2009: content-centric networking
http://mags.acm.org/queue/200901/
[3] PARC's "Networking" web page:
http://www.parc.com/work/focus-area/networking/
[4] "HTML 5's proposed basis in DOM/JS skews web control and
monetization towards corporations and away from individual
authors/researchers, to the detriment of society."
http://lists.w3.org/Archives/Public/www-tag/2009Sep/0028.html
[5] Bug 7546 ""HTML 5" Editor's draft misnamed and suboptimal for HTML
content authors unless refactored into HTML (main) and DOM API
(appendix)."
http://www.w3.org/Bugs/Public/show_bug.cgi?id=7546
[6] TAG F2F June
http://www.w3.org/2001/tag/2009/06/24-minutes.html
[7] TAG September Agenda (Preliminary)
http://www.w3.org/2001/tag/2009/09/23-agenda
[8] "Musicians and the Law in Canada", Paul Sanderson (Carswell;
2000); p 12.
[9] HTML5 (Editor's Draft)
http://dev.w3.org/html5/spec/Overview.html
[10] ODRL (Open Digital Rights Language)
http://odrl.net/2.0/WD-ODRL-Core-Metadata.html
http://odrl.net/2.0/DS-ODRL-Model.html
[11] http://lists.w3.org/Archives/Public/public-html/2009Sep/0814.html
[12] http://lists.w3.org/Archives/Public/public-html/2009Sep/0827.html
[13] Five Ideas That Will Reinvent Modern Computing:
Extreme Peer-to-Peer
http://www.pcmag.com/article2/0,2817,2147451,00.asp
[14] DC (Dublin Core)
http://dublincore.org/documents/dcmi-terms/
[15] FOAF (Friend Of A Friend)
http://xmlns.com/foaf/spec/
[16] RDFa Primer
http://www.w3.org/TR/xhtml-rdfa-primer/
[17] HTML5 Draft Standard, section 5: Microdata
http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html
[18] Extensible Metadata Platform (XMP)
http://en.wikipedia.org/wiki/Extensible_Metadata_Platform
[19] Extensible Metadata Platform (XMP)
http://www.adobe.com/products/xmp/
Steven Rowat
Received on Monday, 21 September 2009 20:04:00 UTC