Yahoo's RDF vocabularies (was: Re: embed RDFa --> embed coolness into Yahoo search results)

Ben Adida wrote:
> Yahoo has launched even more RDFa coolness: embed RDFa on your site to
> describe your flash games and videos, and they show up embedded in Yahoo
> search results *for everyone*, *by default*.

Overall, this is great news. Very nice to see Yahoo! adopting RDFa this
deeply into their search service... do have some gripes about
SearchMonkey Vocabularies, however...

> PS: the only thing that's a bit unfortunate is that they didn't reuse
> Digital Bazaar's media vocabulary. I hope we can find a way to create
> equivalences at some point... that's the goal of RDF, after all.

I've had a bit of time to look at Yahoo's published vocabularies and I'm
quite concerned by them and Yahoo!s general direction with vocabulary
design.

Here's a list of issues that I was able to find... there are
many more issues that I found than are outlined here. It would be good
to talk with whoever designed their vocabularies. You can find an
overview of Yahoo!s vocabularies here:

http://developer.yahoo.com/searchmonkey/smguide/profile_vocab.html

Issues specific to Yahoo's Media vocabulary:

Vocabulary is not machine-readable, not validate-able
-----------------------------------------------------

Yahoo's searchmonkey media vocabulary defined here:

http://search.yahoo.com/searchmonkey/media/

is not machine-readable. There are no RDF ranges, subClassOf, comments,
or types specified. New RDF vocabularies, especially ones from large
companies like Yahoo, should be machine readable otherwise it's going to
be nearly impossible to validate against them.

Monolithic Vocabulary Design
----------------------------

Rather than break Media out into multiple different vocabularies, Yahoo
has shoved audio, video, text, photos, thumbnails, re-invented sets, and
shoved them into one monolithic vocabulary which will surely get more
and more bloated as the years increase.

Rather than create a nice vocabulary stack (like what we've been doing
for the past several years):

+--------------+
|Music Ontology|
+--------------+-------+
|     Audio    | Video |
+--------------+-------+
|        Media         |
+----------------------+

They've instead created a mega vocabulary that doesn't seem to be backed
up by any usage data... or rather, it certainly isn't backed up by the
data we collected on the subjects of audio and video. Perhaps I'm
missing some sort of grand architecture, but when you have media:Article
and media:Text (neither of which subclass each other), then it shows
that not a great deal of design work went into your vocabularies.

Confounding Media with Media Format
-----------------------------------

Yahoo defines the following properties in media:

* media:bitrate
* media:channels
* media:duration
* media:fileSize
* media:framerate
* media:height
* media:samplingrate
* media:type
* media:width

Most of these are quite specific to web-based media formats and have
nothing to do with media in the physical world (not the Web). Many of
these can't be used to describe media:Text or media:Article. These
attributes really have nothing to do with media and should be separated
out into a different media format vocabulary.

* media:views

This one has more to do with social news sites than media.

Specification of medium using both class and property
-----------------------------------------------------

Yahoo defines both this:
	
media:Image, media:Audio, media:Video

and this:

media:medium - The type of object: image | audio | video | document |
executable.

What's the point of having both a 'medium' property and classes that
define the medium? media:medium shouldn't exist at all - use one or the
other, not both. Using both is confusing and will inevitably lead to
more pain for Yahoo down the line when you have to look at not only
@typeof information, but also medium information.

Naming conflict, right off of the bat
-------------------------------------

Yahoo has defined the following prefixes: commerce, media

These conflict directly with ones that we've already created, which
isn't that big of a deal - in fact, it shows that RDFa is resilient even
in these scenarios. However, it also means that almost all of the
solutions that have been proposed for addressing the "cut-paste
fragility" issue that the WHATWG has raised are now much more difficult
to implement correctly. Which commerce and which media vocabularies do
we resolve to?

I'm afraid that since Yahoo is the 300lb gorilla in the room, there will
be no place for good vocabulary designs.

These vocabularies will hurt RDFa adoption in the long run
----------------------------------------------------------

My real fear is that while Yahoo adopting RDFa will help in the short
term, these badly designed vocabularies will hurt RDFa adoption in the
long run.

The worst-case scenario is seeing wide adoption of Yahoo's media
vocabulary as it currently stands, which will eventually come under much
harsher and less constructive criticism than I've outlined above.

As I stated earlier, there are many more issues with what Yahoo has done
with their SearchMonkey vocabularies that should be fixed for the
benefit of this community. We are more than glad to help them work
through the issues, as long as Yahoo is willing to have an open dialog
with the RDF vocabulary creation community.

-- manu

-- 
Manu Sporny
President/CEO - Digital Bazaar, Inc.
blog: Absorbing Costs Considered Harmful
http://blog.digitalbazaar.com/2009/02/27/absorbing-costs-harmful

Received on Wednesday, 18 March 2009 06:32:47 UTC