W3C home > Mailing lists > Public > public-media-annotation@w3.org > November 2008

Re: my token about the "3 or more layer" structure for the ontology

From: Felix Sasaki <fsasaki@w3.org>
Date: Sat, 22 Nov 2008 10:57:16 +0900
Message-ID: <492766FC.9060906@w3.org>
To: Pierre-Antoine Champin <pchampin@liris.cnrs.fr>
CC: public-media-annotation@w3.org

Hello Pierre-Antoine,

Pierre-Antoine Champin さんは書きました:
> Felix,
> although I participated in putting the debate in terms of "XML vs. RDF",
> my concern was not about a precise syntax or foramt, and I agree with
> you that it should not be.

Just for clarification: "agree that it should not be" means "we do not 
need to define a syntax" or "we should not discuss XML vs. RDF, but need 
to decide on a syntax"? If the latter, which syntax do you propose?

> However in my view the question is more fundamental. Let me reword it.
> Designing an ontology involves, IMHO, a trade-off between faithfully
> representing the domain of interest, and projecting it in a practical
> data structure.

Maybe here we already have different opinions: I think we can design an 
ontology without a practical data structure. The current API / ontology 
proposal does just that: defining a list of terms *as prose*. The data 
structure related parts are only in the API, and in the prose mapping 

> Failthful in our context means:
> - able to cover a large part of legacy metadata
> - able to satisfy most of the requirements of our use cases
> Practical in our context, means that the ontology should be:
> - easy to use by media publisher
> - easy to implement in browsers
> A very easy to use and implement data structure is a list of
> (attribute,value) pairs -- the so-called "flat" structure.
> By the way, even easier is a list of simple tags -- which can be tweaked
> into (attribute,value) pairs anyway, as pointed out by your previous
> mail about flickr.
> However, I think that this is too much of a simplification:
> - it does not satisfy come requirements (like the multi-level or
> collection) -- though we might decide that those ones are too complex
> - my intuition is that more structure would make "impedence mismatch"
> between legacy vocabularies easier to point out and solve

I agree that this simplification does not cover many use cases like 
"multi-level  or collection". But I also think for this version (1.0 of 
the ontology / API) we should concentrate on the simple approach which 
is important for all use cases and application scenarios. If that founds 
adoption, we can shoot for 2.0 and a more complex approach.

Btw., of course we have not described all application scenarios, use 
cases and requirements yet. Nevertheless I think that the requirement to 
get information across heterogenous formats is central to our WG.

I don't think that more or less structure is related to the quality of 
mapping between different vocabularies. For this mapping, detailed 
knowledge brought in by the WG parcitipants about these vocabularies is 
mostly important. If the mapping then is represented in prose, or as 
more or less structured XML or RDF, is not important IMO. However, I do 
think that a detailed prose description is important for the API, and it 
can also help understanding a structured representation, if we decide to 
do that.


>   pa
> Felix Sasaki a écrit :
>> Ruben Tous (UPC) さんは書きました:
>>> Hi Pierre-Antoine, Silvia, all,
>>> I think that normalisation/denormalisation is related to the more
>>> general discussion about structured*/flat annotations (handling
>>> events, agents, etc. as separated structures) . The multi-level
>>> description discussion is probably a sub-topic within that general
>>> one, and refers only (as I've understood till now) to splitting
>>> (normalising) the main structure (the one describing the digital
>>> object) into several entities but only regarding different abstraction
>>> levels (e.g. document and instance).
>>> So, probably we should decide first about the structured*/flat
>>> question. If we choose "flat", then we could maybe discard also the
>>> multi-level description.
>>> Probably, there's a latent high-level question behind this discussion:
>>> will the ontology model the way annotations are interchanged, or will
>>> it model their underlying semantic grounding?
>>> Best regards,
>>> Ruben
>>> *When talking about structured annotations I'm not just referring to
>>> hierarchycal ones (XML), I refer to annotations with ObjectProperties
>>> (inlined or linked within the same annotation) (e.g. RDF).
>> Reading this discussion and the "features" wiki page, the "data model
>> rows", I have the impression that there is some tension between using
>> XML and RDF. I can understand that tension, but I think we should not
>> spend time on discussing it in this group. Nevertheless, it lets me more
>> and more think that we should not be format specific in our ontology,
>> but use just a prose description as the normative outcome, that is in
>> the "Ontology 1.0" Recommendation. If people want to write non-normative
>> RDF- and XML-formats, they are free to do so. I think we should focus on
>> formulating the terminology in the prose in a way that that makes a
>> formalization in whatever format straightforward.
>> Felix
>>> ----- Original Message ----- From: "Silvia Pfeiffer"
>>> <silviapfeiffer1@gmail.com>
>>> To: "Ruben Tous (UPC)" <rtous@ac.upc.edu>
>>> Cc: <public-media-annotation@w3.org>
>>> Sent: Wednesday, November 19, 2008 10:10 PM
>>> Subject: Re: my token about the "3 or more layer" structure for the
>>> ontology
>>> Hi Ruben,
>>> It is always a matter of use cases.
>>> When we talk about management of collections, there will be overlap
>>> between the annotations of different files, which can be handled more
>>> efficiently (in a database sense: normalise your schema).
>>> However, if you receive an individual media resource, you want all of
>>> its annotations to be available with the media resource, i.e. you want
>>> an "intelligent" media object that can tell you things about itself.
>>> I don't see these things as separate. Let's take a real-world example.
>>> Let's assume I have a Web server with thousands of videos. They fall
>>> into categories and within categories into event, where each video
>>> within an event has the same metadata about the event. On the server,
>>> I would store the metadata in a database. I would do normalisation of
>>> the data and just store the data for each event once, but have a
>>> relationship table for video-event-relationships. Now, a Web Browser
>>> requests one of the videos for playback (or a search engine comes
>>> along and asks about the metadata for a video). Of course, I go ahead
>>> and extract all related metadata about that video from the database
>>> and send it with the video (or in the case of the search engine:
>>> without the video). I further have two ways of sending the metadata: I
>>> can send it in a text file (which is probably all the search engine
>>> needs), or I can send it multiplexed into the video file, e.g. as a
>>> metadata header (e.g. MP3 has ID3 for this, Ogg has vorbiscomment,
>>> other file formats have different metadata headers).
>>> I don't think we need to overly concern ourselves with whether we
>>> normalise our data structure. This is an "implementation" issue. We
>>> should understand the general way in which metadata is being handled
>>> as in the example above and not create schemas that won't work in this
>>> and other scenarios. But we should focus on identifying which
>>> information is important to keep about a video or audio file.
>>> Cheers,
>>> Silvia.
>>> On Thu, Nov 20, 2008 at 12:01 AM, Ruben Tous (UPC) <rtous@ac.upc.edu>
>>> wrote:
>>>> Dear Véronique, Silvia, all,
>>>> I agree with both of you in that the need of multiple description
>>>> levels is
>>>> only related to a small subset of use cases, basically to those
>>>> related to
>>>> the management of groups of resources (e.g. digital asset management
>>>> systems, user media collections, etc.). Instead, we are (I guess)
>>>> focused in
>>>> embedded annotations in individual resources.
>>>> However, I think that there are solutions which cover both cases, the
>>>> simple
>>>> and the complex one. For instance, we could embed the following
>>>> annotation
>>>> within an MPEG video:
>>>> <mawg:Video rdf:ID=http://example.org/video/01">
>>>> <mawg:title>astronaut loses tool bag during spacewalk </mawg:title>
>>>> <mawg:creator>John Smith</mawg:creator>
>>>> </mawg:Video>
>>>> <mawg:Resource rdf:ID="http://example.org/resource/01">
>>>> <mawg:format>FLV</mawg:format>
>>>> <mawg:filesize>21342342</mawg:filesize>
>>>> <mawg:duration>PT1004199059S</mawg:duration>
>>>> </ mawg:videoID rdf:resource="http://example.org/video/01">
>>>> </mawg:Resource>
>>>> It is structured and it offers 2 abstraction levels, but it can be
>>>> serialized like a plain record. When appearing in isolated resources,
>>>> the
>>>> high-level annotation ("Video" in this case) would be repeated. When
>>>> appearing within a collection's annotation the "Video" annotation would
>>>> appear just once.
>>>> It is not so different than in XMP. Take to the following XMP example...
>>>> http://www.w3.org/2008/WebVideo/Annotations/wiki/images/8/8a/Xmp_example.xml
>>>> Best regards,
>>>> Ruben
>>>> ----- Original Message ----- From: <vmalaise@few.vu.nl>
>>>> To: <public-media-annotation@w3.org>
>>>> Sent: Wednesday, November 19, 2008 11:27 AM
>>>> Subject: my token about the "3 or more layer" structure for the ontology
>>>>> Hi everyone,
>>>>> I was at first very much in favor of an ontology that would distinguish
>>>>> different levels of media documents, like
>>>>> "work-manifestation-instance-item",
>>>>> but after reading this email from the list:
>>>>> http://lists.w3.org/Archives/Public/public-media-annotation/2008Nov/0076.html
>>>>> I agreed with the fact that we would probably only need a simple
>>>>> structure
>>>>> in
>>>>> our case, that multi-level structures were meant for linking different
>>>>> entities
>>>>> that have different status together: if we aim for linking the
>>>>> descriptions of a
>>>>> single item between different vocabularies, we need to specify if the
>>>>> single
>>>>> item is a work_in_XX_vocabulary, more likely a
>>>>> manifestation_in_XX_vocabulary
>>>>> (see note 1 below), to give its "type", and if people/use cases want to
>>>>> link
>>>>> this single item to other related works, manifestations, instances or
>>>>> items,
>>>>> they can use the framework defined in the schemas reviewed in
>>>>> http://www.w3.org/2008/WebVideo/Annotations/wiki/MultilevelDescriptionReview
>>>>> and use these properties for completing their description.
>>>>> So we would need a property like "has_type" to link a single
>>>>> description's
>>>>> identifier to the correct level of multilevel description schemes.
>>>>> I changed my mind think that only one "family" of use cases would need
>>>>> more
>>>>> levels, that they are somehow context dependent (and could thus be
>>>>> considered as
>>>>> requirements for a family of use cases), but of course if it turns out
>>>>> that more
>>>>> that one family of use cases needs this distinction, then we should
>>>>> consider
>>>>> going for a multilevel structure. Anyway, we would need to map
>>>>> informally
>>>>> the
>>>>> way these levels are expressed, in order to provide possible relevant
>>>>> "types"
>>>>> for the description of each single element.
>>>>> note 1: by specifying the different names of the relevant
>>>>> Concepts/terms
>>>>> in
>>>>> schemes like VRA, XMP etc., we would informally define a semantic
>>>>> equivalence
>>>>> between the ways these schema express these levels of description. It
>>>>> would look
>>>>> like:
>>>>> <metadataFile>
>>>>> <id="identifier">
>>>>> <hasType xmpMM:InstanceID, vra:image, frbr:item>
>>>>> </metadataFile>
>>>>> I think that the table
>>>>> http://www.w3.org/2008/WebVideo/Annotations/wiki/FeaturesTable
>>>>> is a very valuable tool for people to express their ideas about it,
>>>>> thank
>>>>> you
>>>>> very much Ruben for designing it!
>>>>> Best regards,
>>>>> Véronique
Received on Saturday, 22 November 2008 01:58:01 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:24:30 UTC