[whatwg] Creative Commons Rights Expression Language from Henri Sivonen on 2008-08-22 (public-whatwg-archive@w3.org from August 2008)

From: Henri Sivonen <hsivonen@iki.fi>
Date: Fri, 22 Aug 2008 10:50:59 +0300
Message-ID: <81D913D2-DCBC-471B-81CF-291B2B0515D8@iki.fi>
On Aug 21, 2008, at 21:53, Ben Adida wrote:
> Not to mention that our design approach was specifically tailored to  
> be HTML5-friendly.


It really isn't HTML5-friendly, since it depends on the namespace  
mapping context at a node.

> Henri Sivonen writes:
>> and those additions use a Namespace-dependent
>> anti-pattern, so they aren't portable to HTML.
>
> Namespaces are an anti-pattern, really? Says who?

The anti-pattern I was referring to was qnames-in-content. (But, I'm  
not saying that Namespaces in XML were not themselves an anti- 
pattern. :-)

> The web is inherently
> namespaced. Everything you go to is scoped to a URL prefix. There  
> isn't
> one "Paris" or one "New York," there is wikipedia/paris, and
> nyc.gov/NewYork.

At least in the case of New York, the settlers had the good sense to  
choose a short disambiguating prefix instead of thinking they were off  
in a different default namespace like Texas and free to reuse local  
names causing problems with global map search usability later.

> So is it the ":" that bothers you? Is that really relevant?

It's not the colon per se, although now that XML and HTML do DOM-wise  
different things with the colon, the colon is trouble for element and  
attribute names.

Here's what bothers me about namespaces:
  1) I need write namespaces URIs several times a day, but the URIs  
aren't memorable. Mistyping an NS URI would waste even more time as  
bugs than looking URIs up for copying and pasting, so I look them up  
for copying and pasting, and it's a huge waste of time.
  2) The indirection layer from prefix to URI confuses people.
  3) Namespaces not inheriting to attributes confuses people. (I have  
had to give a crash course in how namespaces work on W3C telecons and  
f2f meetings! Others have had to do it as well. This point is so  
confusing that people whose job is working on Web specs get it wrong.  
I've been told about a professor teaching a class about XML who got it  
wrong.)
  4) Instead of comparing names against a string literals, you have to  
compare two datums against two literals. That is, instead of doing  
"foo-bar".equals(name), you have to do "http://www.example.com/2008/08/namespace# 
".equals(uri) && "bar".equals(localName).
  5) Removing uri,local pairs from XML parsing context makes it hard  
to write the full name in a compact form. Witness the NSResolver  
complications with XPath and Selectors DOM APIs.
  6) That the prefix is semantically not important confuses people who  
go and write uninteroperable software thinking that they should be  
comparing the prefix instead of the URI.
  7) The design of namespaces considers parsing. It doesn't consider  
serialization. Writing an XML serializer that doesn't suck isn't  
trivial, and one will spend most of the development time on dealing  
with Namespaces. (The prefixes aren't important but people still have  
aesthetic opinions about how they should be generated...)
  8) Namespaces dropped the HTML ball a decade ago letting the HTML  
and XML DOMs diverge.
  9) Namespaces stuff their syntax into attributes as opposed to  
having syntax on their own meaning that certain magic attribute names  
need blacklisting both in parsing and in serialization.
10) Namespaces slow down parsing. (By over 20% with Xerces-J and the  
Wikipedia front page!)
11) I've spent *a lot* of time writing code that is Namespace-wise  
excruciatingly correct. Yet, Namespaces have never actually solved a  
problem for me. My software developer friends complain to me about how  
Namespaces cause them grief. No one can remember Namespaces solving a  
real problem. It's like feeding a white elephant.

Qnames in content have further problems: They complicate APIs and the  
application layer when the mapping context needs to leak to the  
application instead of being a parser-internal thing. Under scripted  
DOM scenarios, there's the issue of the mapping context not getting  
captured at node creation time thereby making the meaning of qnames  
brittle under tree mutations. Finally, serializing XML that *may* have  
qnames in content without the serializer knowing which values are  
qnames (i.e. writing a generic serializer) is complex. (See also the  
TAG finding about problems with digital signatures.)

> Just look at what microformats are forced to do, which is effectively
> re-inventing ad-hoc namespaces with "-" separators.

That's different. When the prefixes are fixed and go inside a name  
token without an indirection layer of without the name becoming a  
tuple, that's fine. You can still do "foo-bar".equals(name).

> The "namespaces are bad" argument is the most mind-boggling web-tech
> meme I've seen in a while.

It's Namespaces in XML that are bad--not *necessarily* lower-case 'n'  
namespaces. Also, qname-in-content are even worse than just Namespaces  
in XML.

>> making them to identify which CC
>> license they mean, making them understand what permissions they are
>> giving irrevocably to others upon granting a license and making them
>> understand what licenses used by others mean (NonCommercial,
>> anyone?). Syntax doesn't solve any of these.
>
> I appreciate the strategy advice, but let's stick to the tech. I don't
> think it would be relevant to question Google's business plan when Ian
> makes a tech proposal :)

If Hixie made a proposal about HTML syntax citing Google's needs, but  
there was something else going on at Google making the syntax moot, I  
think it would be relevant. (I guess metadata aiding  
translate.google.com is the recent example.)

>> Also note that even CC leadership omits the license URI.
>
> So you want a URI in the video content itself? What good would that  
> do?

It's not me wanting it, it's the CC licenses:
"You must include a copy of, or the Uniform Resource Identifier (URI)  
for, this License with every copy of the Work You Distribute or  
Publicly Perform."

> With ccREL (and specifically RDFa), the surrounding HTML can easily  
> say "*this* video is licensed under *that* license."


I meant the license URIs of the photos used in the video.

Either way, putting RDFa in a HTML file means that the license data  
doesn't travel with the video if I download it from Blip.tv or get it  
via a podcast client.

HTML5 already has a way to express that the HTML document as a whole  
is under a certain Creative Commons license: rel=license. This doesn't  
allow you to say things about *another* resource, but that's OK,  
because out-of-band metadata and data often travel their separate  
ways. I think it would be better to develop simple ways of putting the  
"license",license-URI key-value pair inside other popular file  
formats. After all, you don't need triples in this case--just a key- 
value pair and it's implied that it is *about* the file it is in.  
Having to spec this for many formats isn't as appealing as speccing  
one way for all formats, but the one way put forward isn't really that  
great. (A *graph* in XMP is an overkill when key-value pairs would do.)

For example, in PDF, do people *really* need all this cruft:
<?xpacket begin="" id=""?>
<x:xmpmeta xmlns:x="adobe:ns:meta/">
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description rdf:about=""
xmlns:xapRights="http://ns.adobe.com/xap/1.0/rights/">
<xapRights:Marked>True</xapRights:Marked>
<xapRights:WebStatement rdf:resource="http://codev2.cc/download 
+remix/" />
</rdf:Description>
...
<rdf:Description rdf:about=""
xmlns:cc="http://creativecommons.org/ns#">
<cc:license rdf:resource="http://creativecommons.org/licenses/by-sa/ 
2.5/" />
</rdf:Description>
</rdf:RDF>
</x:xmpmeta>

...instead of putting the key-value pair "License","http://creativecommons.org/licenses/by-sa/2.5/ 
" document information dictionary of the PDF file?

>> Getting back to the comment thread on intertwingly.net, a later
>> comment contained this gem:
>> http://intertwingly.net/blog/2008/02/09/Mashups-Smashups#c1202810109
>> My sarcasm detector isn't quite working, so I can't tell if the
>> comment was *meant* to mock RDF, but the follow-up comment is spot
>> on:
>> http://intertwingly.net/blog/2008/02/09/Mashups-Smashups#c1202870522
>
> I think your argument is "copyright is hard, so RDF sucks."

No, my argument is:
Copyright is hard. Sprinkling URIs and angle brackets doesn't make  
people grok copyright. RDF adds even more hardness that normal people  
don't grok.

> Lots of things about RDF are complicated, and lots of things about  
> copyright are complicated.

Together they don't cancel each other out.

> I'd say that Creative Commons has helped make copyright *easier* to  
> understand, not harder, though of course there are
> cases where we have failed and where we're trying to improve.

That may be, but I wouldn't attribute it to RDF.

> Now, what does that have to do with expressing user intent in
> machine-readable language, exactly? Is it harder to understand  
> copyright
> *because* of RDF and RDFa? I don't think so. I don't think those two
> things are even related.

No, RDF doesn't make copyright itself harder. It just adds something  
else that's hard, so it's not helping.

> The point of ccREL and RDFa is to help express, in a machine-readable
> way, the act of copyright licensing, attribution, and such. It's meant
> to make machines helpful in expressing and interpreting these  
> statements.

I think trying to break complex licenses (especially ones that don't  
originate from CC) into URI-identifiable components and letting  
software interpret these for the user seems risky compared to doing  
something simpler like having a finite catalog of licenses recognized  
by software and mapping them to logos that the user can identify after  
*actually reading* the licenses first without the software pretending  
to relieve the user from finding out what the licenses mean.

For example, the CC licenses have a pretty significant component  
lurking there that isn't covered by the RDF terms (or by the "human- 
readable" deeds): the anti-TPM clause. What if a tool happily tells  
someone that just giving me attribution for my photos is sufficient  
for using a photo taken by me in a book without telling them that my  
photos come with a poison pill that prohibits publishing the book on  
Kindle?

> just like they don't need to understand the deep legal contract.

(I disagree, but that's off-topic for WHATWG, except to the extent of  
pointing out that the RDF modeling doesn't cover significant aspects  
of the licenses like the anti-TPM clause.)

> [.. a number of comments regarding the specifics of the RDFa  
> syntax ...]
>
> We discussed the syntax in a public group, and we came to consensus. I
> don't see that you raised any issues or comments until 2 weeks ago,
> which was long past our deadline for comments.

If RDFa is considered immutable at this point, I guess HTML5 is put in  
a "take it or leave it" situation. :-/ I'd choose leaving it if taking  
it comes with the qnames-in-content and Namespaces in XML baggage.

> There could always be an alternate syntax, but the one we have was
> obtained through an open process of consensus. I suspect the same  
> holds
> true for HTML5: lots of options, pick one that works and is relatively
> clean, and form consensus.

Actually, HTML5 hasn't been developed by consensus.

-- 
Henri Sivonen
hsivonen at iki.fi
http://hsivonen.iki.fi/
Received on Friday, 22 August 2008 00:50:59 UTC