- From: Raphaël Troncy <Raphael.Troncy@cwi.nl>
- Date: Wed, 15 Nov 2006 14:30:01 +0100
- To: MMSem-XG Public List <public-xg-mmsem@w3.org>, Giovanni Tummarello <g.tummarello@gmail.com>, Oscar Celma <oscar.celma@iua.upf.edu>
Dear XG members, Oscar Celma, future new member of the XG from UFP has written very interesting thoughts that I reproduce below about the Music Use Case. You can see also his wiki web page: http://www.w3.org/2005/Incubator/mmsem/wiki/OscarCelma Best regards. Raphaël --------------------- Now, regarding the Music Use Case. I've been thinking about it... My main concern is that it tries to cover a wide range of topics with regard to Music Information Retrieval (MIR),and Semweb fields. For instance, it includes: - audio fingerprinting - metadata aggregation - playlist generation - ... I think that a feasible music use-case should focus on one, or maybe a couple of ideas. Else, it is too much! Moreover, there is a lot of ongoing work in most of these fields: I'm thinking on MusicDNS(+MusicBrainz) audio identification service, Last.fm, Pandora (and a looong etcetera) for playlist generation, etc. Therefore, the use case could be "misunderstood", in the sense that *people* could not get the point of adding explicit semantics to the ideas presented in the use case... That said, I think it would be fantastic to cope with the whole music use case ideas, but I think it is too big now! So, I rush a couple of ideas here: * The first proposal is to exploit the propagation of (semantic) music annotations. This idea includes the following tasks: 1- Extract mid-level features from the audio. (i.e: beats per minute (BPM), tonality (Key and mode), timbre characteristics, etc.) 2- User interaction with her music collection: Once the audio files have been analysed, the user can tag some songs according to his criterion. The user could create concepts such as Mood (with the following categories: happy, sad, mysterious, etc.), and attach some examples (i.e songs) for each class. 3- The system could propose a set of tags (a category value for a concept) for newly incoming songs, based on audio similarity metrics. That is, when a new song is added to the music collection, and is analyzed, then the system can get the category values from the most similar songs, and propagate these annotations to the new song. 4- (Rellevance feedback step). The user can accept or reject the tags'proposals made by the system. This process show how to *easily* annotate music collections by expanding the annotations of music titles.Think of Pandora, that now does all the music description manually (more than 400 attributes!). This use case would help them in speeding up the process of annotating big collections. That's a rough idea, so instead of tags (linked somehow with the Wordnet RDF/OWL representation [8]?), could be normalized by a predefined ontology. Finally, this use case, is clearly related with the "Tagging Use Case" proposal. * The second idea is to add semantics to a Podcast session. Nowadays, there is no metadata about a podcast session. The most useful thing one can found is some sort of HTML tables into the RSS feed entry, that include the songs and artists appearing on the podcast session. Then, a nice use-case will be to add explicit metadata that allows to explain contents of the session. More concrete, I'm thinking on: 1- Speech/Music recognition. Detect from the MP3 file the bits where a person is speaking, and the parts that there is music (there's a lot of work in this area, probably the best would be to use the state-of-the-art algorithm that solves this problem with higher accuracy). 2- Once we have detected which parts includes music, the next step is to create a temporal structural decomposition of the podcast session (we could use parts of the some MPEG-7/OWL ontology, MDS part, to describe it). E.g: 00:04:02 - 00:06:22 :: Arctic Monkeys - A certain romance 00:06:35 - 00:08:04 :: The Killers - Somebody told me etc. The main drawback here is to detect the music (audio identification). So, one option is using the fingerprinting, and the other is to analyze the text from the RSS entry, and try to derive tha artists and songs (hmmm... not very nice, though!). 3- After this, the temporal decomposition could be embedded into the RSS (think of RSS 1.0 or the Atom/OWL proposal by Henry Story) 4- Finally, we would get a nice description of the podcast session. After that, a nice SPARQL query to retrieving podcasts that include songs from user's favourite artists would be the killer app! :-) Related with these use case, see, for instance the work done at DERI at [9]. That's all for now! Oscar Celma. [8] http://www.w3.org/2001/sw/BestPractices/WNET/wn-conversion.html [9] http://sw.deri.org/2005/07/podcast/doc/podcast.pdf -- Raphaël Troncy CWI (Centre for Mathematics and Computer Science), Kruislaan 413, 1098 SJ Amsterdam, The Netherlands e-mail: raphael.troncy@cwi.nl & raphael.troncy@gmail.com Tel: +31 (0)20 - 592 4093 Fax: +31 (0)20 - 592 4312 Web: http://www.cwi.nl/~troncy/
Received on Wednesday, 15 November 2006 14:01:50 UTC