- From: Steven Rowat <steven_rowat@sunshine.net>
- Date: Mon, 21 Sep 2009 13:02:29 -0700
- To: www-tag@w3.org
Ten Use-Cases of Individual Content Authors Requiring Rights/Commerce Metadata: success in HTML4; HTML5; and CCN Introduction: In this document I list 10 individual content-author use cases that require rights/commerce metadata, and attempt a preliminary exploration of how well they will be served by HTML4/5 versus content-centric networking (CCN) [1,2,3]. I place this list and its discussion on the TAG list because it expands on ideas I presented in a previous TAG post [4] and a related bug [5], and because the June F2F meeting of the TAG [6] discussed metadata at length, and made reference to CCN and the work of Van Jacobson in that regard. As well, HTML5 and Metadata are major items on the September F2F agenda [7]. I have also seen calls for metadata related use-cases widely recently; including from the TAG, the HTML WG, and the RDFa group. After the use-case list I present the following: -> Discussion of HTML4 and HTML5 ability to deliver the metadata preferences of the 10 individual authors -> Predictions of the Content-centric model (a thumbnail sketch) -> Proposal (informal) for W3C further involvement -> Notes -> References Ten Use-Cases of Individual Content Authors Requiring Rights/Commerce Metadata This is partly an imaginary list, but I believe all are highly plausible. I've lived situations very similar to three or four of them, and for three or four others I personally know the people who live them. Certainly the list is not perfect or exhaustive; nonetheless, in my view it suffices to indicate that there is a great range of individuals who have their own content to supply to the world; some who wish to sell it; and some not to sell but who nonetheless have other metadata rights needs. In each of the ten cases the author and their content is described in general terms, and then the metadata needs for their rights/commerce preferences are given. In each case the so-called Moral Rights of the author (authorship; inviolability of the content), which are not legally transferable [8], are listed in the first line of metadata attributes; the various desired commercial rights are on the lines following. 1. An independent medical researcher in the USA who produces a pdf report about side-effects of a new prescription drug. He specifies: authorship; no content modification; payment per download; no downstream commercializing. 2. A journalist in Africa with an ogg or mp4 video of atrocities in an ongoing war. She specifies: anonymity; no content modification; free; no downstream commercializing. 3. A writer with a complete novel in pdf, doc, html, and other text formats. He specifies: authorship as pseudonym; no content modification; payment per download; downstream commercializing allowed with constraints of: no advertising (direct sale only); payment of 20% of gross per copy re-sold; any additional commercial rights for other media must obtain agreement of the original author. 4. A folk-musician in Siberia who records local throat-singing into mp3 or ogg vorbis files. He specifies: authorship; no content modification; free streaming of first 10% of content online; payment per download at sliding scale proportional to user-country's average yearly income per person; no downstream commercializing. 5. A whistleblower leaking documents from inside the government showing evidence of torture practices, in text or pdf. She specifies: anonymity; no content modification; free, but donation requested (user chooses); no commercializing. 6. A software engineer with a program for a particular OS, in zip or other compressed format. He specifies: authorship; no content modification; payment per download with constraints/permissions of: demo use is free on one machine for a month; payment thereafter is due for each machine in use with the program; downstream commercializing is possible with constraint of: payment to original author of 25% of net profits from resale or from any other commercializing, specifically including advertising. 7. A carpenter with photos and text describing how to build a solar outhouse, in html or pdf. He specifies: authorship; no content modification; payment per download (pdf); payment per page view (HTML); no downstream commercializing. 8. A visual artist (oils and acrylics) with high- and low-resolution JPEGs of the paintings. She specifies: authorship; no content modification; free access online to low-res images of paintings; payment per download for hi-res images of paintings, calculated by total surface area of the image; downstream commercializing permitted with explicit author agreement. 9. An inventor of a patented simplified mechanical tool with a description in pdf and html text accompanied by embedded jpegs. He specifies: authorship; no content modification; free HTML access to a summary of the tool specification; payment per pdf download of the complete specification including patent document; downstream commercializing allowed with constraints of: no resale of the patent or description itself; manufacturing of the tool for sale in any given country is permitted after explicit agreement with the inventor for that country. 10. A digital game programmer with a new multi-player game for online and/or offline use. He specifies: authorship; content modification allowed with constraint of: no commercialization once modified; payment for online use by subscription (monthly, yearly); or single payment for downloaded version; downstream commercializing permitted for unmodified download version only, with constraints of: 30% of gross sales receipts paid to original author, as well as 10% of site advertising revenue (if any) from the downstream page selling the game. Discussion: I think all can agree that there currently exist all over the globe individuals with their own digitized content (of any kind: science, art, music, education, journalism, programming, etc.) who wish to distribute that content on-line and who wish various levels of control over their legal rights and/or sale of that content. I believe the unresolved questions about this fact that are relevant for the W3C to consider are these two: first, how many of these individuals exist (1 million? 100 million? a billion?); second, how well served can any of them be by the existing tools in, or extended from: HTML4; HTML5 as recently proposed [9]; or CCN. The first question -- of how many such individuals there are -- is both important and elusive. If there are many -- say 50-100 million or more worldwide -- then it may be worthwhile applying a major architectural change to the web as a whole to accommodate their needs. I reasoned that studying ten widely different use-cases and their needs would give at least some initial clues as to how many people like this there are. And after constructing the list, I think it's reasonable to suggest that this is neither a local issue nor a small one; the exchange of money, and the exchange of information, are things that all human beings engage in. So a truly simple and direct system enabling both together could be used by a significant proportion of human beings: the authors represented above could eventually be billions of people. We have no way of knowing until the system is available and works well. At present it doesn't. The second question, then, is how well HTML4, HTML5, and the projected abilities of CCN, could accomplish this. For HTML4, on prima-facie evidence, the answer appears to be: extremely poorly or not at all. HTML4 has been the web standard for 12 years and there has been no successful development of what used to be called 'micropayments', or digital rights controls by open standards. A full language has been developed to enable this (ODRL [10]) but it has not been implemented widely. Yet twelve years is an age in internet time; apparently some thing or things are preventing rights/commerce from proceeding at the individual/browser level. It seems best at this point to skip to the first of the reasons why this might be: which has been expressed recently very succinctly by Maciej Stachowiak on the HTML WG list, in answer to my concerns: he wrote: "HTML5 does not provide anything specific to enable selling of content, but then, neither did HTML4. E-commerce and revenue models are out of scope for HTML." [11] So, in both HTML4 and HTML5 there is no attempt to specifically include the rights/commerce preference controls listed for the 10 use cases. And such individual author controls have not developed in any useful way (with HTML4) in 12 years. Based on this, it appears that the same thing would happen for HTML5: no progress will occur (for individuals). As I've argued elsewhere [4, 5, 12], those with deep pockets can monetize the web under HTML4 and do so more easily under HTML5; and are doing so; but not individuals. And why has the attempt not been made? I've come to believe there are more fundamental reasons: reasons why neither HTML4 nor HTML5 attempted to facilitate information commerce in its most core sense, content going from one person to another. And here it seems appropriate to turn to the content-centric networking theory, which provides several such reasons why individual rights/preference controls might be so difficult as to be actually impossible in the current architecture. Predictions of the Content-centric model The following summary is based almost exclusively on Van Jacobson's descriptions and discussion of past, current, and hypothetical-future data flows in our society [1,2]; he defines three states; "1" is historic; "2" is present; "3" is projected: 1. telephony (specified path) 2. internet (point to point calls, using TCP) 3. content-centric (multi-point to multi-point) These are expressed in detail in his Google tech talk of 2006 [1] and his print interview in 2009 [2]. Summaries are given on the PARC page [3] (where he is leading a group who are developing CCN for eventual deployment throughout the internet) and PC Magazine in 2007 [13]. I will attempt a thumbnail summary of his ideas here: 1. Internet via TCP (stage 2 above) was originally designed to share scarce resources (hardware, like printers), but the evolution of the actual use of the internet instead evolved into the sharing of plentiful digital content via software. This is a completely different problem. 2. Thus the architecture was never designed to do what it's being asked to do. It does its original job well, but a new goal has evolved. 3. The internet attains this new goal badly at present. There are major difficulties on the internet in several areas, including: a. Security b. Scalability c. Complexity of interoperability. 4. These are not improving, and will remain problematic and will prevent certain desired goals from being reached, unless the architecture is changed. 5. Content-centric networking (stage 3 above) can solve all of these; or at least improve them dramatically relative to stage 2. a. Security: each packet will be named; the naming will be registered and secure (in the same way that IP addresses are now). This is contrasted with the current system, where only the end points are secure, and false data is regularly inserted between those points, with false location data. b. Scalability: since location of the named data is irrelevant, it does not have to come from where it was first created, and can smoothly be supplied in any quantity by internet caching and copying. c. Complexity of interoperability: since data is named and secure, it can be carried across borders more easily; firewalls and ways of checking credentials are less relevant; secure content can be moved through any OS or medium and still perform the same function. 6. The change from the current internet to content-centric networking could facilitate just as major a change as the one from telephony to internet was. There is far more that I have not attempted to explain here; and far more than that, that I didn't understand. However, given that the alternative is a stalemate, I feel I understood enough to say: we need to take the chance and start actively studying what is required to move to the CCN model. According to PARC [3] it can be done incrementally. Finally, in terms of the specific problem that I find myself pursuing in this essay: supposing that CCN does what it is predicted to do, will these things help individual authors who supply internet content that requires rights/commerce data control? Consider the three main problem areas in #5 above, security; scalability; and complexity: a. Security Yes: security will increase dramatically, and the current lack of security for money/privacy in transactions is obviously a large impediment to developing a widespread information rights/commerce system. Conceivably it is the single largest reason why such a system does not yet exist. b. Scalability Yes: Van Jacobson expresses it well: "Right now, if you're not Google or YouTube — somebody who's big enough — there's a curse in creating popular content. If you make something that a lot of people look at and say, "Oooh, this is really cool!" you've just blown your Web site off the air because the only way that content can be distributed is from its original source. "...If you move to a content-centric model, then you can stop disenfranchising creators because they pay no cost and you actually stop disenfranchising all the intermediaries, too." [2] In other words, an original author needs to make no more than one Registered copy of the content in question; the internet, which is after all a huge copying machine, will take care of the rest, even if there is a spike in demand. c. Complexity of interoperability. Yes, although the gains here appear to be more internet-wide and less specific to the problem faced by individual creators. But still, they may be considerable for both; for instance, Van Jacobson said: " ...you're opening yourself up to a world of grief if you don't have what [Tony Hoare] called 'referential transparency'. If you can refer to only the container and not the thing that's contained, then contents can change on you. You have security issues; you have decidability issues; you have robustness issues. You don't really know what the bits are, and you can't reason about what the bits are, because all you can name is a container." [2] I interpret this to mean that the 'complexity' issue is often identical with the security issue; attempts to solve the security issue create more complexity; if it is solved innately by the architecture, then complexity is reduced, increasing overall efficiency of the whole system. Proposal (informal): Based on the discussion above, I would like to see: 1. A W3C liaison group [*see note 1] formed to consult with Jacobson's PARC group and determine: a. What steps could be taken to test implementations of CCN in the existing internet. b. What the PARC group still needs in terms of information, use-cases, or testing resources that the W3C might be able to provide, in order to enable such implementations. c. What is a fuller list of advantages and disadvantages, relative to the current architecture, of different forms of CCN implementation. 2. Based on the results of #1, if implementation in some form seems likely to bring sufficient advantages, the same or another W3C group could study: a. What is the optimum form of the metadata to be placed into (or accompany) the content packets in CCN. [*see Note 2] b. What other handshaking might be required to fulfill the actions intended to flow from this metadata (in browsers, ISPs, etc.) and how W3C can facilitate this. [*see Note 3] c. Whether HTML5 (or even 4) can be tailored to allow a full or partial CCN implementation directly. d. If not, whether HTML5/4 can at least install hooks that would allow graceful degradation/interoperability with CCN metadata protocols, in order to effectively anticipate a large-scale shift to CCN at a later date. Notes: 1. In terms of the issue directly addressed in this essay (individual content-creators control of their work), I suggest that W3C groups studying the CCN option should have a majority without conflict-of-interest; in other words, only a minority can be making their living from direct sales on the web, from corporations monetizing the web, or from consulting about web monetization. If it turns out that this is impractical, that most of the interested parties are involved in some way in web monetization, then at least the group should have a numerical balance between individual authors distributing their own content and professionals who code for others or are members of corporations that do. 2. For example, I believe for the current use-cases ODRL [10] can carry all the commercial calculations (as well as the moral rights). However metadata from other vocabularies such as Dublin Core [14] and FOAF [15] and the extensibility to many others should also be available; in other words, a protocol like RDFa [16] and/or Microdata [17] will likely be required to present the metadata, just as it is currently in HTML4 and is planned for HTML5. (Or possibly a form of Adobe's XMP [18, 19], if it can become an open standard). 3. For example, an interesting idea provided by Peter Dolan recently in a late-night talk about CCN and commercialization, is that ISPs are in a unique position relative to individual content authors and users both: a) ISPs taken as a whole already hold credit and private contact information for both authors and users. b) ISPs already are capable of counting the flow of packets. c) ISPs already are accustomed to performing monetary transactions. d) ISPs already are capable of performing secure transactions when necessary. He suggested that they would therefore best suited to assume the role of counting and validating the outflow of author's registered works and aggregating the inflow of user payments via a special ISP clearing house set up for the purpose, in the same way a bank does for cheques. References: [1] "A New Way to Look At Networking: Van Jacobson, Google Tech Talks, 2006" http://www.youtube.com/watch?v=gqGEMQveoqg&feature=PlayList&p=68A083F6EAFEFE01&index=29 [2] Interview with Van Jacobson, 2009: content-centric networking http://mags.acm.org/queue/200901/ [3] PARC's "Networking" web page: http://www.parc.com/work/focus-area/networking/ [4] "HTML 5's proposed basis in DOM/JS skews web control and monetization towards corporations and away from individual authors/researchers, to the detriment of society." http://lists.w3.org/Archives/Public/www-tag/2009Sep/0028.html [5] Bug 7546 ""HTML 5" Editor's draft misnamed and suboptimal for HTML content authors unless refactored into HTML (main) and DOM API (appendix)." http://www.w3.org/Bugs/Public/show_bug.cgi?id=7546 [6] TAG F2F June http://www.w3.org/2001/tag/2009/06/24-minutes.html [7] TAG September Agenda (Preliminary) http://www.w3.org/2001/tag/2009/09/23-agenda [8] "Musicians and the Law in Canada", Paul Sanderson (Carswell; 2000); p 12. [9] HTML5 (Editor's Draft) http://dev.w3.org/html5/spec/Overview.html [10] ODRL (Open Digital Rights Language) http://odrl.net/2.0/WD-ODRL-Core-Metadata.html http://odrl.net/2.0/DS-ODRL-Model.html [11] http://lists.w3.org/Archives/Public/public-html/2009Sep/0814.html [12] http://lists.w3.org/Archives/Public/public-html/2009Sep/0827.html [13] Five Ideas That Will Reinvent Modern Computing: Extreme Peer-to-Peer http://www.pcmag.com/article2/0,2817,2147451,00.asp [14] DC (Dublin Core) http://dublincore.org/documents/dcmi-terms/ [15] FOAF (Friend Of A Friend) http://xmlns.com/foaf/spec/ [16] RDFa Primer http://www.w3.org/TR/xhtml-rdfa-primer/ [17] HTML5 Draft Standard, section 5: Microdata http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html [18] Extensible Metadata Platform (XMP) http://en.wikipedia.org/wiki/Extensible_Metadata_Platform [19] Extensible Metadata Platform (XMP) http://www.adobe.com/products/xmp/ Steven Rowat
Received on Monday, 21 September 2009 20:04:00 UTC