[MMSEM-UC] review of Music Use Case from Raphaël Troncy on 2006-11-15 (public-xg-mmsem@w3.org from November 2006)

From: Raphaël Troncy <Raphael.Troncy@cwi.nl>
Date: Wed, 15 Nov 2006 14:30:01 +0100
To: MMSem-XG Public List <public-xg-mmsem@w3.org>, Giovanni Tummarello <g.tummarello@gmail.com>, Oscar Celma <oscar.celma@iua.upf.edu>
Message-ID: <455B1659.2F564AA1@cwi.nl>
Dear XG members,

Oscar Celma, future new member of the XG from UFP has written very
interesting thoughts that I reproduce below about the Music Use Case.
You can see also his wiki web page:
http://www.w3.org/2005/Incubator/mmsem/wiki/OscarCelma
Best regards.

    Raphaël

---------------------

Now, regarding the Music Use Case. I've been thinking about it... My
main concern is that it tries to cover a wide range of topics with
regard to Music Information Retrieval (MIR),and Semweb fields. For
instance, it includes:
- audio fingerprinting
- metadata aggregation
- playlist generation
- ...

I think that a feasible music use-case should focus on one, or maybe a
couple of ideas. Else, it is too much!
Moreover, there is a lot of ongoing work in most of these fields: I'm
thinking on MusicDNS(+MusicBrainz) audio identification service,
Last.fm, Pandora (and a looong etcetera) for playlist generation, etc.

Therefore, the use case could be "misunderstood", in the sense that
*people* could not get the point of adding explicit semantics to the
ideas presented in the use case... That said, I think it would be
fantastic to cope with the whole music  use case ideas, but I think it
is too big now!

So, I rush a couple of ideas here:
* The first proposal is to exploit the propagation of (semantic) music
annotations.

This idea includes the following tasks:
1- Extract mid-level features from the audio. (i.e: beats per minute
(BPM), tonality (Key and mode), timbre characteristics, etc.)
2- User interaction with her music collection: Once the audio files have
been analysed, the user can tag some songs according to his criterion.
The user could create concepts such as Mood (with the following
categories: happy, sad, mysterious, etc.), and
attach some examples (i.e songs) for each class.
3- The system could propose a set of tags (a category value for a
concept) for newly incoming songs, based on audio similarity metrics.
That is, when a new song is added to the music collection, and is
analyzed, then the system can get the category values from the most
similar songs, and propagate these annotations to the new song.
4- (Rellevance feedback step). The user can accept or reject the
tags'proposals made by the system.

This process show how to *easily* annotate music collections by
expanding the annotations of music titles.Think of Pandora, that now
does all the music description manually (more than 400 attributes!).
This use case would help them in speeding up the process of annotating
big collections.

That's a rough idea, so instead of tags (linked somehow with the Wordnet
RDF/OWL representation [8]?), could be normalized by a predefined
ontology. Finally, this use case, is clearly related with the "Tagging
Use Case" proposal.

* The second idea is to add semantics to a Podcast session.
Nowadays, there is no metadata about a podcast session. The most useful
thing one can found is some sort of HTML tables into the RSS feed entry,
that include the songs and artists appearing on the podcast session.
Then, a nice use-case will be to add explicit metadata that allows to
explain contents of the session.

More concrete, I'm thinking on:
1- Speech/Music recognition.
Detect from the MP3 file the bits where a person is speaking, and the
parts that there is music (there's a lot of work in this area, probably
the best would be to use the state-of-the-art algorithm that solves this
problem with higher accuracy).
2- Once we have detected which parts includes music, the next step is to
create a temporal structural decomposition of the podcast session (we
could use parts of the some MPEG-7/OWL ontology, MDS part, to describe
it).
E.g:
00:04:02 - 00:06:22  :: Arctic Monkeys - A certain romance
00:06:35 - 00:08:04  :: The Killers  - Somebody told me
etc.

The main drawback here is to detect the music (audio identification).
So, one option is using the fingerprinting, and the other is to analyze
the text from the RSS entry, and try to derive tha artists and songs
(hmmm... not very nice, though!).

3- After this, the temporal decomposition could be embedded into the RSS
(think of RSS 1.0 or the Atom/OWL proposal by Henry Story)

4- Finally, we would get a nice description of the podcast session.
After that, a nice SPARQL query to retrieving podcasts that include
songs from user's favourite artists would be the killer app! :-) Related
with these use case, see, for instance the work done at DERI at [9].

That's all for now!
Oscar Celma.

[8] http://www.w3.org/2001/sw/BestPractices/WNET/wn-conversion.html
[9] http://sw.deri.org/2005/07/podcast/doc/podcast.pdf

--
Raphaël Troncy
CWI (Centre for Mathematics and Computer Science),
Kruislaan 413, 1098 SJ Amsterdam, The Netherlands
e-mail: raphael.troncy@cwi.nl & raphael.troncy@gmail.com
Tel: +31 (0)20 - 592 4093
Fax: +31 (0)20 - 592 4312
Web: http://www.cwi.nl/~troncy/
Received on Wednesday, 15 November 2006 14:01:50 UTC