Re: RDFa Use Cases from Ben Adida on 2009-02-19 (public-rdf-in-xhtml-tf@w3.org from February 2009)

From: Ben Adida <ben@adida.net>
Date: Wed, 18 Feb 2009 17:05:00 -0800
To: Ian Hickson <ian@hixie.ch>
CC: Manu Sporny <msporny@digitalbazaar.com>, RDFa mailing list <public-rdf-in-xhtml-tf@w3.org>
Message-ID: <499CB03C.9010502@adida.net>
Ian Hickson wrote:
> (Also helpful for my own edification would be a reply to the e-mail I sent 
> earlier this week:
>    http://lists.w3.org/Archives/Public/public-rdf-in-xhtml-tf/2009Feb/0121.html
> ...which unfortunately appears to have killed its thread.)

I'll answer some parts of it, but first a point about what I believe is
off topic.

Some very advanced use cases of RDF, e.g. the triplestore across web
pages with reasoning, have been used to argue that RDF is bad. This is a
bit like arguing against <a href=> because, imagine if someone built a
giant search engine that had this awesome pagerank algorithm and then
started displaying ads alongside search results and then allowed folks
to embed those ads on their pages and paid those folks a cut of the ads
and then... wait a minute have you thought of clickjacking? No? Then
clearly HTML anchors are evil.

Let's stick to the simple use cases for now. Any enabling technology, if
it's useful to users in finding information, will also be useful to
spammers. To paraphrase Bruce Schneier, stop being surprised when the
bad guys hijack the good guys' infrastructure. Someone like Google will
figure out a way to use the structure of the data network to make sense
of it all. Maybe semantic pagerank, who knows.

Now, I'll focus on the CC use case and your specific points regarding CC.

> ...the process needs to be:
> 
>  1. Find problems.
>  2. Propose solutions that solve one or more of those problems.
>  3. Evaluate the solutions against each problem.
>  4. If a solution is found that addresses many of the problems, adopt it.

I believe that is what we've done, with the added item (1.5) find
existing technologies that solve a large chunk of the problem, i.e. RDF.

> I don't know that I've ever heard a _good_ example!

I thought you said SearchMonkey was a good example? It's certainly a
great example for Creative Commons.

Do *you* have to believe that an example is good before HTML5 considers
it? Is it not enough for big publishers like Yahoo to tell you that the
example is good?

> IMHO, the syntax and data model is the easy part. If you had trouble 
> getting adoption of your vocabulary with a trivial dedicated syntax, I 
> don't think you're likely to have any more luck now that your vocabulary 
> comes with a general-purpose data model and half a dozen different 
> syntaxes. But your mileage may vary, I guess.

Syntax and data model are easy only if you ignore extensibility. When we
came up with CC, it was only for images and music. But we knew we would
eventually delve into science, and we would need to CC-license
scientific publications, datasets, describe datasets transfer
agreements, etc...

We knew we needed extensibility from the start, so that our tools built
today can still work in some fashion tomorrow, and so that markup
written today can still function in tomorrow's tools to some degree.

We've also expanded in ways we didn't initially predict, with
cc:attributionName and cc:attributionURL: when you click from a web page
to its Creative Commons license link, we parse the referring URL for
RDFa and display, in the deed, to whom you should give credit.

Had we hacked up a one-time data-model and syntax, we wouldn't have been
able to add this feature without also asking uninterested users to
change their markup, or forcing tools to parse two different syntaxes.

> This line of argumentation (that small problems should share solutions so 
> as to leverage each others' work) is not convincing to me.

I'm not arguing for this as a general philosophy for every design
process, but it's a little bit disconcerting that you think small
solutions should never leverage each other. Isn't it an important aspect
of standards work? Isn't that what inventing a new VIDEO tag is for, to
ensure that folks who want to embed video (each for different purposes)
can each do so more easily?

It seems you draw a hard line at generalizing applications that use
different vocabularies. It's okay to generalize embedding video, but
it's not okay to generalize embedded metadata. That line is, in my
opinion, artificial. Maybe it's just because you haven't worked as much
with interoperable structured data? Certainly, I haven't worked much
with embedded video, so I don't know the reasoning behind a new VIDEO tag.

> Personally I prefer to address today's problems today and tomorrow's 
> problems tomorrow, so that as we meet new problems, they are addressed 
> with surgical precision, rather than trying to come up with systems that 
> can solve everything forever. But again, to each his own.
> 
> This line of argumentation (that we should design systems that solve all 
> future needs, whether forseeable or not) is also not convincing to me.

You're arguing based on a false dichotomy of extremes: why so black and
white? We don't expect to solve every problem from the start. With RDF
and RDFa, we have a solution that makes it *easier* to combine our work
with that of others, and *easier* to evolve our own solution over time.

So things are easier, though of course not automatic.

And since we're reusing a lot of existing technology (RDF), and we're
collaborating with Manu, Mark, and others to define RDFa, it costs us a
lot less than to make up our own approach. The test cases alone, built
by Manu and Michael and which CC contributed nothing to, are worth our
use of RDFa.

> You presumably do want some user agents some where at some time to do 
> something with these triples, otherwise what's the point? Whether this is 
> through extensions, or through browsers in ten years when the state of the 
> art is at the point where something useful can be done with any RDFa, or 
> through search engines processing RDFa data, there has to be _some_ user 
> agent somewhere that uses this data, otherwise what's the point?

I would like to hear your comments on the parallel I've drawn to the
@rel attribute. Browsers don't need to do anything with it except make
it available in the DOM. Google can use it to tweak its search
algorithm. But surely, you're not trying to explore every possible
Google implementation detail for rel="canonical" to spec HTML5? It's
pretty obvious that specifying @rel enables the kind of application that
Google is developing.

The same applies to RDFa. Making it available enables SearchMonkey,
ccREL, and other applications, because the data is now structured and
thus more useful for ... well whatever someone wants to do with
structured data just like like Google chooses what to do with @rel long
after the spec's ink has dried.

Do you agree with this comparison?

-Ben
Received on Thursday, 19 February 2009 01:05:45 UTC