Re: [MM] Image Annotation for the Semantic web - Review from Raphaël Troncy on 2006-03-06 (public-swbp-wg@w3.org from March 2006)

From: Raphaël Troncy <Raphael.Troncy@cwi.nl>
Date: Mon, 06 Mar 2006 17:07:39 +0100
To: "Uschold, Michael F" <michael.f.uschold@boeing.com>
CC: public-swbp-wg@w3.org
Message-ID: <440C5E4B.EE58F6D1@cwi.nl>
Dear Mike,

I reply to your 2nd review in this mail. Note that the latest version of "Image annotation on the Semantic Web" editor's draft is available at [1] (Editors' Draft $Date: 2006/03/06 16:05:33 $ $Revision: 1.148 $)

> "At the time of writing, most work done in this area does not use semantic-based technologies mainly because of the differences between the multimedia and the web communities and their underlying standardization organizations."
>
> I would agree that this is an important reason, but it may be too strong to say it is the main one.  It suggests that if the web communities and mm communities were more aligned, then semantics would be in use. This seems unlikely to be true, as there is hardly any adoption now of semantic web compared to regular web.

I understand your concern. We still think however that the fact that the web and mm communities have worked in different directions even if they shared common problems is an explanation of why an important effort is now needed to reconcile both views. We have therefore rephrase as:

"At the time of writing, most work done in this area does not use semantic-based technologies PARTLY because of the differences between the multimedia and the web communities and their underlying standardization organizations."

> The main point of this issue:
> "Production versus post-production annotation" should be stated up front. I.e. that it is much easier to annotate earlier rather than later. This only comes out at the end, after a lot of other detail that one has to wade through. Post production is only mentioned at the very end.

Agree. The first sentence of the paragraph is now:

"A general rule is that it is much easier to annotate earlier rather than later. Typically, most of the information that is needed for making the annotations is available during production time. [...]"

> Insert 'the' before 'semantics' in: "Annotations alone do not establish semantics of what is being marked-up."

done

> "A good starting point for having more information on RDF is the RDF Primer <http://www.w3.org/2001/sw/BestPractices/MM/image_annotation.html> ."
> Another excellent recommendation is the book co-authored by Frank van Harmelen called something like Semantic Web Primer.

I agree with you that it is an excellent book :-) However, we have decided to cite other W3C documents when they exist and are relevant rather than promoting a specific author/publication (specially when we work closely with such third-party author :-)

> SOMEWHAT-SERIOUS
> "The limitation of the "external agreement" approach is its inflexibility, i.e., only a limited range of pre-defined information properties can be expressed."
>
> I don't get this, what limits how big a vocabulary that is agreed on by a user community? It can be as big as the community wants it to be. It can be arbitrarily big and complex. Of course it may be more work to get there, but you should say that, if that is the point.

This sentence has been removed.

> There is a bigger issue here. You are distinguishing an 'external agreement' approach from an 'ontology' approach, as if the ontology approach did not require any agreement. Oh contraire. I see both approaches as being identical from the point of the need for agreement. All have to agree to use the ontology. The main difference is the format in which the agreed-on-terms-and-definitions are specified. In one case, it is natural language, in the other case, it is formal axioms in a logic language. However, even in the latter case, you still need to bottom out in agreement. Defining 'author' in an ontology might not specify any more meaning than in the natural language vocabulary. It depends on how many axioms you use. In fact, if you look at a lot of ontologies, there is not a lot of meaning of many of the terms, you get some domain and range constraints, you get a few axioms here and there, but in general, they are highly ambiguous. For a human to know exactly what is meant by a given
> term in an ontology, they usually have to rely to some extent on what they think the ontology author intended. The axioms will never be adequate, in most cases.
>
> So I would say there is ONE way to establish meaning: get agreement on what a set of concepts are, what terms to use for them, and get them carefully defined. Period. You can do that using natural language, or you can use formal ontologies with good natural language comments. If you must have two ways, them maybe call them: agreement-informal and agreement-formal, or something that means the same thing.
>
> In both cases, (because axioms never tell the whole story) the programmer writing applications to interpret the semantics will have to rely on more than the axioms to make sure the application does the right thing when encountering a term or expression (which is an operational definition of 'understands'.

I fully agree with you on this point. Sorry, we missed that before (this comes from that multiple authors have edited the document).
The next four paragraphs have been fully rephrased and should now closely follow your view. We re-use the terminology "informal-agreement" versus "formal-agreement" emphazing that, ideally, an ontology should contain both (i.e. formal axioms *and* descriptive information in natural language).

> Although technically correct, this text may be hard to follow for a beginner:
> "We specified (elsewhere) hasSize to be a functional property, which is the same as saying that every big image has at most one size. The application of that property to BigImage using the cardinality restriction asserts something stronger, that every BigImage has exactly one size. Furthermore, the size of BigImage must be big. The allValuesFrom restriction is on the hasSize property of this BigImage class only. Sizes of Image are not constrained by this local restriction. For more information, the OWL Guide <http://www.w3.org/2001/sw/BestPractices/MM/image_annotation.html>  provides a good overview of the OWL language."
>
> 'every big image has at most one size' may be technically true but may be quite confusing.
>
> Try something simpler like:
>
> "This code defines a new concept called BigImage as the set of all members of the class Image such that the size of the image is equal to 'big'."
>
> Much more may not be much help

Rephrased as suggested.

> Use Case: Personal Photo Collections
> There is tension here between who is the is the user vs. the beneficiary of the semantic web technology. Ordinary users may benefit form semantic technologies, but if it is done well, most users will not notice the difference other than it may offer more features than other non-semantic approaches using sophisticated metadata search mechanisms.  This note is presumably not aimed at the ordinary person with a photo collection, but rather those developers of organizer tools like Photoshop file browser, or Iview Media, or dozens or others. It might help to make this clear what you mean by this use case. For example, an opening line like the one for media would be good, altered to match your intent, I'm just guessing:
> E.g.: The use case developed in this section is mainly targeted at photo organizer software developers, and less to the general public.

The description of this use case in section 2.1 is purposely pretty general. It is certainly true that the use of semantic web technologies by the general public for organizing his personal photos will take some time, and we even don't know if that will simply happen. However, we think that some of the practices reported in the (possible) solution could be used immediately by an audience larger than the photo organizer software developers.

> The Press Photo Bank and Bio-Medical Images entries are EMPTY!
> I'm reading the editors draft: http://www.w3.org/2001/sw/BestPractices/MM/image_annotation.html

There are still empty but should be completed soon.

> Quibble: some of the sections/categories have just one member. This seems categorically odd (so to speak ;-). If the intent is that more use cases in these categories are out there, then say so, that makes sense.

We still wonder if this "tentative" categorization of the various use cases presented in the document is useful at all. There are pros and cons in the TF, and we will be happy to receive more comments on that either from the SWBPD group or from the general public ...

> Developped is misspelled.

Replaced by "developed" :-)

> Re: the phrase: "Vocabularies Overview <http://www.w3.org/2001/sw/BestPractices/MM/resources/Vocabularies.html>  discusses a number of". Say what kind of thing is "Vocabularies overview" is when introducing it, a separate document? A section of this document? A document that you wrote, or one that others wrote? That helps reader know when/whether to click on the link.  This may happen in other places too, but most other occurrences of links in text seem fine.

Rephrased as: "A separate document named Vocabularies Overview discusses a number of individual vocabularies that are relevant for images annotation." (with appropriate links).

> Shouldn't section 4 have 'semantic' in the title? The opening text seems to be discussion semantic annotations, not just any annotation.

Correct. The title is now: "4. Available Tools for Semantic Image Annotation"

> Section 4: You say the list below is a set of "characteristics of semantic image annotation tools", but they don't all quite seem to be that.  Type of content  and granularity are not characteristics of a tool, they are characteristics of images. Perhaps what you mean is tools vary in their ability to handle what kinds of content, or what levels or granularity?  It seems to me that all the characteristics are really nothing to do with the tool, but the images and what users need to do with them. These are not so much characteristics, as dimensions of variation for user requirements, and tools can handle them or not to varying extents. A few really are about tools, like license conditions and collaborative vs. individual. This might just be a 'semantics' quibble, not sure.

We effectively would like to discuss what are the abilities of the tools: can they handle different type of content ? do they allow fine-grained annotations ? etc ... Obviously, some of these characteritics are intrinsically related to the images themselves, or to what users need to do with them, but at the end, we should emphasize that the main bottleneck will be what the tools can actually do (not much currently :-( !

> Style (minor quibble, personal preference). These characteristics are presented in inconsistent ways. Many start off as: "This characteristic ...". Others start by defining the characteristic. I prefer the latter. Let the boldface just be a header for the paragraph, and let the text stand as if that label was not there.  Granularity is handled this way.
> This is a very minor quibble, don't spend too much time on it, unless you feel like it.--

Most of the characteristics have been rephrased (but we don't have spend much time on it :-)

> Type of content is defined circularly. Do you mean the format of the digital asset (.jpg vs. .png)?

Yes, the mime-type.

> Section 5:
> Number disagreement: "as an illustrative example" --> "as illustrative examples"

Replaced as such.

> SERIOUS: so far, this next comment raises the only issue that seems serious enough to consider addressing right away.  I think it may confuse readers quite a lot.
>
> 5.1 I was rather surprised to see a discussion of manual, semi automatic vs. automatic in this use case. It does not seem at all what you advertise it as: "possible semantic web based solution".  That issue is completely independent of this use case and is completely independent of whether you use semantic web technologies for annotation and organization.  It is a very important issue, however and might be include somewhere else in the document. I was expecting to see something about the ontology for annotating the images, some example annotations and maybe what end-user benefit can be had from using a semantic web solution (e.g. can find things more easily due to simple inference say concluding that one geographic region is contained in another you can return images about Milan if someone asks about images about Italy.
>
> This is in stark contrast to section 5.2 which really is a "possible semantic web based solution". It would be great to have a (possibly much smaller) example for the personal photos use case. There must be lots of stuff out there to work with, e.g.  PhotoStuff, and the ISWC Semantic web challenge prize winner for 2005.

Fully agree. This work is part of the harmonization of the various use case solutions that we provide.

> The document ends abruptly, perhaps add some summarizing/concluding remarks?

Again, agree. We will talk in the next MM telecon about adding a section "6. Conclusion" for ending the document. The conclusion could highlight what it is currently possible to do for annotating images with Semantic Web technologies (based on the various use cases) but also warn the user of the long way to go before having SW technologies smoothly integrated into MM applications.

Sincerely.

    Raphaël

[1] http://www.w3.org/2001/sw/BestPractices/MM/image_annotation.html

--
Raphaël Troncy
CWI (Centre for Mathematics and Computer Science),
Kruislaan 413, 1098 SJ Amsterdam, The Netherlands
e-mail: raphael.troncy@cwi.nl & raphael.troncy@gmail.com
Tel: +31 (0)20 - 592 4093
Fax: +31 (0)20 - 592 4312
Web: http://www.cwi.nl/ins2/
Received on Monday, 6 March 2006 16:08:03 UTC