Ten Use-Cases of Individual Content Authors Requiring Rights/Commerce Metadata: success in HTML4; HTML5; and CCN from Steven Rowat on 2009-09-21 (www-tag@w3.org from September 2009)

From: Steven Rowat <steven_rowat@sunshine.net>
Date: Mon, 21 Sep 2009 13:02:29 -0700
To: www-tag@w3.org
Message-ID: <4AB7DBD5.9080108@sunshine.net>
Ten Use-Cases of Individual Content Authors Requiring Rights/Commerce 
Metadata: success in HTML4; HTML5; and CCN

Introduction:

In this document I list 10 individual content-author use cases that 
require rights/commerce metadata, and attempt a preliminary 
exploration of how well they will be served by HTML4/5 versus 
content-centric networking (CCN) [1,2,3]. I place this list and its 
discussion on the TAG list because it expands on ideas I presented in 
a previous TAG post [4] and a related bug [5], and because the June 
F2F meeting of the TAG [6] discussed metadata at length, and made 
reference to CCN and the work of Van Jacobson in that regard. As well, 
HTML5 and Metadata are major items on the September F2F agenda [7]. I 
have also seen calls for metadata related use-cases widely recently; 
including from the TAG, the HTML WG, and the RDFa group.

After the use-case list I present the following:
  		-> Discussion of HTML4 and HTML5 ability to deliver the metadata 
preferences of the 10 individual authors
		-> Predictions of the Content-centric model (a thumbnail sketch)
		-> Proposal (informal) for W3C further involvement
		-> Notes
		-> References


Ten Use-Cases of Individual Content Authors Requiring Rights/Commerce 
Metadata

This is partly an imaginary list, but I believe all are highly 
plausible. I've lived situations very similar to three or four of 
them, and for three or four others I personally know the people who 
live them. Certainly the list is not perfect or exhaustive; 
nonetheless, in my view it suffices to indicate that there is a great 
range of individuals who have their own content to supply to the 
world; some who wish to sell it; and some not to sell but who 
nonetheless have other metadata rights needs.

In each of the ten cases the author and their content is described in 
general terms, and then the metadata needs for their rights/commerce 
preferences are given. In each case the so-called Moral Rights of the 
author (authorship; inviolability of the content), which are not 
legally transferable [8], are listed in the first line of metadata 
attributes; the various desired commercial rights are on the lines 
following.

1. An independent medical researcher in the USA who produces a pdf 
report about side-effects of a new prescription drug.
	He specifies:
		authorship; no content modification;
		payment per download; no downstream commercializing.

2. A journalist in Africa with an ogg or mp4 video of atrocities in an 
ongoing war.
	She specifies:
		anonymity; no content modification;
		free; no downstream commercializing.

3. A writer with a complete novel in pdf, doc, html, and other text 
formats.
	He specifies:
		authorship as pseudonym; no content modification;
		payment per download;
		downstream commercializing allowed with constraints of: no 
advertising (direct sale only); payment of 20% of gross per copy 
re-sold; any additional commercial rights for other media must obtain 
agreement of the original author.

4. A folk-musician in Siberia who records local throat-singing into 
mp3 or ogg vorbis files.
	He specifies:
		authorship; no content modification;
		free streaming of first 10% of content online;
		payment per download at sliding scale proportional to user-country's 
average yearly income per person;
		no downstream commercializing.
	
5. A whistleblower leaking documents from inside the government 
showing evidence of torture practices, in text or pdf.
	She specifies:
		anonymity; no content modification;
		free, but donation requested (user chooses);
		no commercializing.

6. A software engineer with a program for a particular OS, in zip or 
other compressed format.
	He specifies:
		authorship; no content modification;
		payment per download with constraints/permissions of: demo use is 
free on one machine for a month; payment thereafter is due for each 
machine in use with the program;
		downstream commercializing is possible with constraint of: payment 
to original author of 25% of net profits from resale or from any other 
commercializing, specifically including advertising.

7. A carpenter with photos and text describing how to build a solar 
outhouse, in html or pdf.
	He specifies:
		authorship; no content modification;
		payment per download (pdf); payment per page view (HTML);
		no downstream commercializing.

8. A visual artist (oils and acrylics) with high- and low-resolution 
JPEGs of the paintings.
	She specifies:
		authorship; no content modification;
		free access online to low-res images of paintings;
		payment per download for hi-res images of paintings, calculated by 
total surface area of the image;
		downstream commercializing permitted with explicit author agreement.

9. An inventor of a patented simplified mechanical tool with a 
description in pdf and html text accompanied by embedded jpegs.
	He specifies:
		authorship; no content modification;
		free HTML access to a summary of the tool specification;
		payment per pdf download of the complete specification including 
patent document;
		downstream commercializing allowed with constraints of: no resale of 
the patent or description itself; manufacturing of the tool for sale 
in any given country is permitted after explicit agreement with the 
inventor for that country.

10. A digital game programmer with a new multi-player game for online 
and/or offline use.
	He specifies:
		authorship; content modification allowed with constraint of: no 
commercialization once modified;
		payment for online use by subscription (monthly, yearly); or single 
payment for downloaded version;
		downstream commercializing permitted for unmodified download version 
only, with constraints of: 30% of gross sales receipts paid to 
original author, as well as 10% of site advertising revenue (if any) 
from the downstream page selling the game.


Discussion:

I think all can agree that there currently exist all over the globe 
individuals with their own digitized content (of any kind: science, 
art, music, education, journalism, programming, etc.) who wish to 
distribute that content on-line and who wish various levels of control 
over their legal rights and/or sale of that content.

I believe the unresolved questions about this fact that are relevant 
for the W3C to consider are these two:
       first, how many of these individuals exist (1 million? 100 
million? a billion?);
       second, how well served can any of them be by the existing 
tools in, or extended from: HTML4; HTML5 as recently proposed [9]; or CCN.

The first question -- of how many such individuals there are -- is 
both important and elusive. If there are many -- say 50-100 million or 
more worldwide -- then it may be worthwhile applying a major 
architectural change to the web as a whole to accommodate their needs. 
I reasoned that studying ten widely different use-cases and their 
needs would give at least some initial clues as to how many people 
like this there are.

And after constructing the list, I think it's reasonable to suggest 
that this is neither a local issue nor a small one; the exchange of 
money, and the exchange of information, are things that all human 
beings engage in. So a truly simple and direct system enabling both 
together could be used by a significant proportion of human beings: 
the authors represented above could eventually be billions of people. 
We have no way of knowing until the system is available and works 
well. At present it doesn't.

The second question, then, is how well HTML4, HTML5, and the projected 
abilities of CCN, could accomplish this.

For HTML4, on prima-facie evidence, the answer appears to be: 
extremely poorly or not at all. HTML4 has been the web standard for 12 
years and there has been no successful development of what used to be 
called 'micropayments', or digital rights controls by open standards. 
A full language has been developed to enable this (ODRL [10]) but it 
has not been implemented widely. Yet twelve years is an age in 
internet time; apparently some thing or things are preventing 
rights/commerce from proceeding at the individual/browser level.

It seems best at this point to skip to the first of the reasons why 
this might be: which has been expressed recently very succinctly by 
Maciej Stachowiak on the HTML WG list, in answer to my concerns: he wrote:

"HTML5 does not provide anything specific to enable selling of 
content, but then, neither did HTML4. E-commerce and revenue models 
are out of scope for HTML." [11]

So, in both HTML4 and HTML5 there is no attempt to specifically 
include the rights/commerce preference controls listed for the 10 use 
cases. And such individual author controls have not developed in any 
useful way (with HTML4) in 12 years. Based on this, it appears that 
the same thing would happen for HTML5: no progress will occur (for 
individuals). As I've argued elsewhere [4, 5, 12], those with deep 
pockets can monetize the web under HTML4 and do so more easily under 
HTML5; and are doing so; but not individuals.

And why has the attempt not been made? I've come to believe there are 
more fundamental reasons: reasons why neither HTML4 nor HTML5 
attempted to facilitate information commerce in its most core sense, 
content going from one person to another. And here it seems 
appropriate to turn to the content-centric networking theory, which 
provides several such reasons why individual rights/preference 
controls might be so difficult as to be actually impossible in the 
current architecture.

Predictions of the Content-centric model

The following summary is based almost exclusively on Van Jacobson's 
descriptions and discussion of past, current, and hypothetical-future 
data flows in our society [1,2]; he defines three states; "1" is 
historic; "2" is present; "3" is projected:
      1. telephony (specified path)
      2. internet (point to point calls, using TCP)
      3. content-centric (multi-point to multi-point)

These are expressed in detail in his Google tech talk of 2006 [1] and 
his print interview in 2009 [2]. Summaries are given on the PARC page 
[3] (where he is leading a group who are developing CCN for eventual 
deployment throughout the internet) and PC Magazine in 2007 [13].

I will attempt a thumbnail summary of his ideas here:
     1. Internet via TCP (stage 2 above) was originally designed to 
share scarce resources (hardware, like printers), but the evolution of 
the actual use of the internet instead evolved into the sharing of 
plentiful digital content via software. This is a completely different 
problem.
     2. Thus the architecture was never designed to do what it's being 
asked to do. It does its original job well, but a new goal has evolved.
     3. The internet attains this new goal badly at present. There are 
major difficulties on the internet in several areas, including:
            a. Security
            b. Scalability
            c. Complexity of interoperability.
     4. These are not improving, and will remain problematic and will 
prevent certain desired goals from being reached, unless the 
architecture is changed.
     5. Content-centric networking (stage 3 above) can solve all of 
these; or at least improve them dramatically relative to stage 2.
            a. Security: each packet will be named; the naming will be 
registered and secure (in the same way that IP addresses are now). 
This is contrasted with the current system, where only the end points 
are secure, and false data is regularly inserted between those points, 
with false location data.
            b. Scalability: since location of the named data is 
irrelevant, it does not have to come from where it was first created, 
and can smoothly be supplied in any quantity by internet caching and 
copying.
            c. Complexity of interoperability: since data is named and 
secure, it can be carried across borders more easily; firewalls and 
ways of checking credentials are less relevant; secure content can be 
moved through any OS or medium and still perform the same function.
     6. The change from the current internet to content-centric 
networking could facilitate just as major a change as the one from 
telephony to internet was.

There is far more that I have not attempted to explain here; and far 
more than that, that I didn't understand. However, given that the 
alternative is a stalemate, I feel I understood enough to say: we need 
to take the chance and start actively studying what is required to 
move to the CCN model. According to PARC [3] it can be done incrementally.

Finally, in terms of the specific problem that I find myself pursuing 
in this essay: supposing that CCN does what it is predicted to do, 
will these things help individual authors who supply internet content 
that requires rights/commerce data control? Consider the three main 
problem areas in #5 above, security; scalability; and complexity:
           a. Security
               Yes: security will increase dramatically, and the 
current lack of security for money/privacy in transactions is 
obviously a large impediment to developing a widespread information 
rights/commerce system. Conceivably it is the single largest reason 
why such a system does not yet exist.
            b. Scalability
               Yes: Van Jacobson expresses it well:
               "Right now, if you're not Google or YouTube — somebody 
who's big enough — there's a curse in creating popular content. If you 
make something that a lot of people look at and say, "Oooh, this is 
really cool!" you've just blown your Web site off the air because the 
only way that content can be distributed is from its original source.
               "...If you move to a content-centric model, then you 
can stop disenfranchising creators because they pay no cost and you 
actually stop disenfranchising all the intermediaries, too." [2]
               In other words, an original author needs to make no 
more than one Registered copy of the content in question; the 
internet, which is after all a huge copying machine, will take care of 
the rest, even if there is a spike in demand.
            c. Complexity of interoperability.
               Yes, although the gains here appear to be more 
internet-wide and less specific to the problem faced by individual 
creators. But still, they may be considerable for both; for instance, 
Van Jacobson said:
               " ...you're opening yourself up to a world of grief if 
you don't have what [Tony Hoare] called 'referential transparency'. If 
you can refer to only the container and not the thing that's 
contained, then contents can change on you. You have security issues; 
you have decidability issues; you have robustness issues. You don't 
really know what the bits are, and you can't reason about what the 
bits are, because all you can name is a container." [2]
               I interpret this to mean that the 'complexity' issue is 
often identical with the security issue; attempts to solve the 
security issue create more complexity; if it is solved innately by the 
architecture, then complexity is reduced, increasing overall 
efficiency of the whole system.


Proposal (informal):
Based on the discussion above, I would like to see:

	1. A W3C liaison group [*see note 1] formed to consult with 
Jacobson's PARC group and determine:
		a. What steps could be taken to test implementations of CCN in the 
existing internet.
		b. What the PARC group still needs in terms of information, 
use-cases, or testing resources that the W3C might be able to provide, 
in order to enable such implementations.
		c. What is a fuller list of advantages and disadvantages, relative 
to the current architecture, of different forms of CCN implementation.
	2. Based on the results of #1, if implementation in some form seems 
likely to bring sufficient advantages, the same or another W3C group 
could study:
		a. What is the optimum form of the metadata to be placed into (or 
accompany) the content packets in CCN. [*see Note 2]
		b. What other handshaking might be required to fulfill the actions 
intended to flow from this metadata (in browsers, ISPs, etc.) and how 
W3C can facilitate this. [*see Note 3]
		c. Whether HTML5 (or even 4) can be tailored to allow a full or 
partial CCN implementation directly.
		d. If not, whether HTML5/4 can at least install hooks that would 
allow graceful degradation/interoperability with CCN metadata 
protocols, in order to effectively anticipate a large-scale shift to 
CCN at a later date.



Notes:

1. In terms of the issue directly addressed in this essay (individual 
content-creators control of their work), I suggest that W3C groups 
studying the CCN option should have a majority without 
conflict-of-interest; in other words, only a minority can be making 
their living from direct sales on the web, from corporations 
monetizing the web, or from consulting about web monetization. If it 
turns out that this is impractical, that most of the interested 
parties are involved in some way in web monetization, then at least 
the group should have a numerical balance between individual authors 
distributing their own content and professionals who code for others 
or are members of corporations that do.

2.  For example, I believe for the current use-cases ODRL [10] can 
carry all the commercial calculations (as well as the moral rights). 
However metadata from other vocabularies such as Dublin Core [14] and 
FOAF [15] and the extensibility to many others should also be 
available; in other words, a protocol like RDFa [16] and/or Microdata 
[17] will likely be required to present the metadata, just as it is 
currently in HTML4 and is planned for HTML5. (Or possibly a form of 
Adobe's XMP [18, 19], if it can become an open standard).

3. For example, an interesting idea provided by Peter Dolan recently 
in a late-night talk about CCN and commercialization, is that ISPs are 
in a unique position relative to individual content authors and users 
both:
	a) ISPs taken as a whole already hold credit and private contact 
information for both authors and users.
	b) ISPs already are capable of counting the flow of packets.
	c) ISPs already are accustomed to performing monetary transactions.
	d) ISPs already are capable of performing secure transactions when 
necessary.
	He suggested that they would therefore best suited to assume the role 
of counting and validating the outflow of author's registered works 
and aggregating the inflow of user payments via a special ISP clearing 
house set up for the purpose, in the same way a bank does for cheques.
	

References:

[1] "A New Way to Look At Networking:
Van Jacobson, Google Tech Talks, 2006"
http://www.youtube.com/watch?v=gqGEMQveoqg&feature=PlayList&p=68A083F6EAFEFE01&index=29

[2] Interview with Van Jacobson, 2009: content-centric networking
http://mags.acm.org/queue/200901/

[3] PARC's "Networking" web page:
http://www.parc.com/work/focus-area/networking/

[4] "HTML 5's proposed basis in DOM/JS skews web control and 
monetization towards corporations and away from individual 
authors/researchers, to the detriment of society."
http://lists.w3.org/Archives/Public/www-tag/2009Sep/0028.html

[5] Bug 7546 ""HTML 5" Editor's draft misnamed and suboptimal for HTML 
content authors unless refactored into HTML (main) and DOM API 
(appendix)."
http://www.w3.org/Bugs/Public/show_bug.cgi?id=7546

[6] TAG F2F June
http://www.w3.org/2001/tag/2009/06/24-minutes.html

[7] TAG September Agenda (Preliminary)
http://www.w3.org/2001/tag/2009/09/23-agenda

[8] "Musicians and the Law in Canada", Paul Sanderson (Carswell; 
2000); p 12.

[9] HTML5 (Editor's Draft)
http://dev.w3.org/html5/spec/Overview.html

[10] ODRL (Open Digital Rights Language)
http://odrl.net/2.0/WD-ODRL-Core-Metadata.html
http://odrl.net/2.0/DS-ODRL-Model.html

[11]  http://lists.w3.org/Archives/Public/public-html/2009Sep/0814.html

[12]  http://lists.w3.org/Archives/Public/public-html/2009Sep/0827.html

[13] Five Ideas That Will Reinvent Modern Computing:
Extreme Peer-to-Peer
http://www.pcmag.com/article2/0,2817,2147451,00.asp

[14] DC (Dublin Core)
http://dublincore.org/documents/dcmi-terms/

[15] FOAF (Friend Of A Friend)
http://xmlns.com/foaf/spec/

[16] RDFa Primer
http://www.w3.org/TR/xhtml-rdfa-primer/

[17] HTML5 Draft Standard, section 5: Microdata
http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html

[18] Extensible Metadata Platform (XMP)
  http://en.wikipedia.org/wiki/Extensible_Metadata_Platform

[19] Extensible Metadata Platform (XMP)
http://www.adobe.com/products/xmp/



Steven Rowat
Received on Monday, 21 September 2009 20:04:00 UTC