Re: RDFa and Web Directions North 2009 from Ian Hickson on 2009-02-13 (public-rdfa@w3.org from February 2009)

From: Ian Hickson <ian@hixie.ch>
Date: Fri, 13 Feb 2009 22:39:43 +0000 (UTC)
To: Kjetil Kjernsmo <kjetil@kjernsmo.net>, Karl Dubost <karl@la-grange.net>
Cc: public-rdfa@w3.org, RDFa mailing list <public-rdf-in-xhtml-tf@w3.org>, Kingsley Idehen <kidehen@openlinksw.com>, Sam Ruby <rubys@intertwingly.net>, Dan Brickley <danbri@danbri.org>, Michael Bolger <michael@michaelbolger.net>, Tim Berners-Lee <timbl@w3.org>, Dan Connolly <connolly@w3.org>
Message-ID: <Pine.LNX.4.62.0902132227160.952@hixie.dreamhostps.com>

On Fri, 13 Feb 2009, Kjetil Kjernsmo wrote:
> 
> We have not seen a lot of problems with it as a paradigm thus far, to 
> the contrary, it is very liberating to work with a so flexible data 
> model, when you're used to the straightjacket that is XML and relational 
> databases.

Certainly as an internal data model triples are a fine addition to the 
toolbox. But we're not talking about internal data models here, we're 
talking about a serialisation format.

> > It should be noted that there are pretty simple solutions to both of 
> > the above, though. For example, for case 1 Amazon could just say 
> > "anything with class=price indicates the price for the item described 
> > by the nearest ancestor block with class=item" or some such,
> 
> Yeah, they could have done that the past decade, and they didn't. And 
> even if they did, it wouldn't have helped a lot, as you'd need a 
> programmer for every little simple task that involved getting to the 
> data.
> 
> > or they could expose the information in a much simpler way by having a 
> > "&format=json" mode for their pages that is purely machine-readable 
> > data.
> 
> Same thing.
> 
> > Or they could do what they in fact do do, which is expose this using a 
> > dedicated API:
> >
> > http://docs.amazonwebservices.com/AWSEcommerceService/2006-05-17/ApiReference/ItemLookupOperation.html
> 
> Those are the exact kind of things we are trying to avoid. They are much 
> too costly to work with. They are the reason people go w00t when they 
> see a two source mash-up, while I remain unimpressed.

If Amazon couldn't even be bothered to add a class for "price" in the last 
decade, why do we believe they will add RDFa?

How does RDFa solve the problem that they have that I described but that 
you cut from the above quotes, namely that they want to track usage on a 
per-developer basis?

> > For example, merging MP3/ID3 data (dedicated vocabulary with dedicated 
> > format embedded in MP3 files) with an iTunes library data dump 
> > (dedicated vocabulary with XML format) would not be easier if they 
> > were both expressed as RDF using different vocabularies. If anything, 
> > frankly, the problem would get harder.
> 
> How would it be harder? Actually, we just did stuff like that. We had 
> ID3, Ogg Vorbis comments, EXIF data (mostly useless), two different XML 
> dumps of two different big media archives with hundreds of thousands of 
> records. Pretty straightforwardly modelled, and a bit of small-o 
> ontology, and there you go. The easy part of that job was to resolve 
> vocabulary differences. Getting the data out was the hard part.

How did you resolve issues such as different vocabularies having different 
enumerations of music genres?

Why was getting data out of those varied formats hard? For ID3, for 
example, it's three lines of perl to get most of the common information. 
There are also libraries that support MP3, M4A, Ogg, and FLAC all at once. 
Processing XML is also a pretty easy task these days.

Note that RDFa doesn't actually solve the problem of common data models 
unless everyone uses it. For music, for instance, it seems unlikely that 
we would get all MP3, Ogg, M4A, etc, tools to all switch to using RDF, so 
there will always be a multiple-data-model problem.

On Sat, 14 Feb 2009, Karl Dubost wrote:
> >
> > Note that you can already "ask questions" on the Web. For example, I 
> > just searched for "which country napolean", which is neither the right 
> > question nor correctly spelt (though that wasn't intentional), and 
> > Google answered:
> > 
> >   Did you mean: which country napoleon
> > 
> >   Search Results
> > 
> >   French invasion of Russia - Wikipedia, the free encyclopedia
> >   [...]
> > 
> >      Napoleon I of France - Wikipedia, the free encyclopedia
> >      [...]
> > 
> > Microsoft Live Search actually gave even better results here 
> > ("Napoleon I of France" is the first answer).
> 
> "What Python" 
> http://www.google.com/search?hl=fr&q=what+python&btnG=Recherche+Google&lr=
> 
> Not very useful for someone looking information about the animal. Not to 
> say that your example is wrong but that you will *always* be able to 
> find counter-examples.

I didn't come up with the example above, I based it purely on the example 
that was being put forward as to why RDFa was a solution.

How would RDF solve this ambiguity problem? As a user, what query would I 
put into my user agent to find out "what python"? Why would it work better 
with RDF than with natural language processing?

Why is that a better solution than just having the user say "no, I meant 
the animal"?:

   http://www.google.com/search?q=what+python+animal

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Friday, 13 February 2009 22:40:22 UTC