Re: Yahoo's RDF vocabularies

Hi Manu,

I would like to start by saying that we are always open for dialog!

I would like to add --simply because it seems to be a common 
misunderstanding-- is that you can use any and all vocabularies in 
conjunction with SearchMonkey, including your own media vocabularies. 
See for example what Myspace has done in creating their own vocabulary [1].

As a way of explanation, you also have to know that when creating the 
initial SearchMonkey documentation, we included vocabularies that we 
felt were stable enough, covered major interests, and were widespread  
including Dublin Core, FOAF and others. However, we wanted the 
documentation to be complete and provide simple vocabularies in areas 
where none existed or on the contrary, too many existed. The engineer 
who created the media vocabulary followed the best practice of taking an 
existing format (MediaRSS) and translating it into RDF. So that would 
explain where the terms are coming from and also shows you that the 
vocabulary is backed up by usage. We do publish OWL definitions for the 
vocabularies at [2].

Obviously, this is not to say that the vocabularies are 'perfect' by 
whatever measure of aesthetics (and yes, I consider ontology engineering 
more of an art than science). All your suggestions to improve this 
vocabulary or any other are highly welcome. There are only two 
requirements: they have to be specific enough to implement and backwards 
compatible. (As we are painfully aware of, schema versioning is an 
unsolved problem on the Semantic Web.) The only non-issue I see from 
your list of comments is the issue of prefixes: we have URIs in RDF(a) 
and there already plenty of namespace clashes in the sense you describe: 
RDF Calendar and Dublin Core both have at least two namespaces.

Best,
Peter

[1] http://www.myspace.com/parishilton
[2] http://developer.yahoo.com/searchmonkey/smguide/owl_defs.html


Manu Sporny wrote:
> Ben Adida wrote:
>   
>> Yahoo has launched even more RDFa coolness: embed RDFa on your site to
>> describe your flash games and videos, and they show up embedded in Yahoo
>> search results *for everyone*, *by default*.
>>     
>
> Overall, this is great news. Very nice to see Yahoo! adopting RDFa this
> deeply into their search service... do have some gripes about
> SearchMonkey Vocabularies, however...
>
>   
>> PS: the only thing that's a bit unfortunate is that they didn't reuse
>> Digital Bazaar's media vocabulary. I hope we can find a way to create
>> equivalences at some point... that's the goal of RDF, after all.
>>     
>
> I've had a bit of time to look at Yahoo's published vocabularies and I'm
> quite concerned by them and Yahoo!s general direction with vocabulary
> design.
>
> Here's a list of issues that I was able to find... there are
> many more issues that I found than are outlined here. It would be good
> to talk with whoever designed their vocabularies. You can find an
> overview of Yahoo!s vocabularies here:
>
> http://developer.yahoo.com/searchmonkey/smguide/profile_vocab.html
>
> Issues specific to Yahoo's Media vocabulary:
>
> Vocabulary is not machine-readable, not validate-able
> -----------------------------------------------------
>
> Yahoo's searchmonkey media vocabulary defined here:
>
> http://search.yahoo.com/searchmonkey/media/
>
> is not machine-readable. There are no RDF ranges, subClassOf, comments,
> or types specified. New RDF vocabularies, especially ones from large
> companies like Yahoo, should be machine readable otherwise it's going to
> be nearly impossible to validate against them.
>
> Monolithic Vocabulary Design
> ----------------------------
>
> Rather than break Media out into multiple different vocabularies, Yahoo
> has shoved audio, video, text, photos, thumbnails, re-invented sets, and
> shoved them into one monolithic vocabulary which will surely get more
> and more bloated as the years increase.
>
> Rather than create a nice vocabulary stack (like what we've been doing
> for the past several years):
>
> +--------------+
> |Music Ontology|
> +--------------+-------+
> |     Audio    | Video |
> +--------------+-------+
> |        Media         |
> +----------------------+
>
> They've instead created a mega vocabulary that doesn't seem to be backed
> up by any usage data... or rather, it certainly isn't backed up by the
> data we collected on the subjects of audio and video. Perhaps I'm
> missing some sort of grand architecture, but when you have media:Article
> and media:Text (neither of which subclass each other), then it shows
> that not a great deal of design work went into your vocabularies.
>
> Confounding Media with Media Format
> -----------------------------------
>
> Yahoo defines the following properties in media:
>
> * media:bitrate
> * media:channels
> * media:duration
> * media:fileSize
> * media:framerate
> * media:height
> * media:samplingrate
> * media:type
> * media:width
>
> Most of these are quite specific to web-based media formats and have
> nothing to do with media in the physical world (not the Web). Many of
> these can't be used to describe media:Text or media:Article. These
> attributes really have nothing to do with media and should be separated
> out into a different media format vocabulary.
>
> * media:views
>
> This one has more to do with social news sites than media.
>
> Specification of medium using both class and property
> -----------------------------------------------------
>
> Yahoo defines both this:
> 	
> media:Image, media:Audio, media:Video
>
> and this:
>
> media:medium - The type of object: image | audio | video | document |
> executable.
>
> What's the point of having both a 'medium' property and classes that
> define the medium? media:medium shouldn't exist at all - use one or the
> other, not both. Using both is confusing and will inevitably lead to
> more pain for Yahoo down the line when you have to look at not only
> @typeof information, but also medium information.
>
> Naming conflict, right off of the bat
> -------------------------------------
>
> Yahoo has defined the following prefixes: commerce, media
>
> These conflict directly with ones that we've already created, which
> isn't that big of a deal - in fact, it shows that RDFa is resilient even
> in these scenarios. However, it also means that almost all of the
> solutions that have been proposed for addressing the "cut-paste
> fragility" issue that the WHATWG has raised are now much more difficult
> to implement correctly. Which commerce and which media vocabularies do
> we resolve to?
>
> I'm afraid that since Yahoo is the 300lb gorilla in the room, there will
> be no place for good vocabulary designs.
>
> These vocabularies will hurt RDFa adoption in the long run
> ----------------------------------------------------------
>
> My real fear is that while Yahoo adopting RDFa will help in the short
> term, these badly designed vocabularies will hurt RDFa adoption in the
> long run.
>
> The worst-case scenario is seeing wide adoption of Yahoo's media
> vocabulary as it currently stands, which will eventually come under much
> harsher and less constructive criticism than I've outlined above.
>
> As I stated earlier, there are many more issues with what Yahoo has done
> with their SearchMonkey vocabularies that should be fixed for the
> benefit of this community. We are more than glad to help them work
> through the issues, as long as Yahoo is willing to have an open dialog
> with the RDF vocabulary creation community.
>
> -- manu
>
>   

Received on Wednesday, 18 March 2009 11:45:25 UTC