[MM] Image Annotation for the Semantic web - Review

Image Annotation for the semantic web - review of latest draft as of 02/9/2006 

OVERALL:
Excellent document, much much better than the last draft I read. I mostly have minor points to make on presentation and points of clarification.  I note two significant issues that would do well to be addressed sooner rather than later:
1.	external agreement approach vs. ontology approach: this is IHMO confused and misleading.
2.	personal photo collection example: is not what it is advertised to be: "a possible semantic web solution". Instead, it is discussion of an important issue that applies to all use cases and is unrelated to this particular use case. 

Even so, I am not going to recommend holding back publication of this note in its present form, if the authors wish to move it forward quickly to the next step in the process.

In general this is great work! This is a HUGE can of worms the MM group has opened, the likely subject of any number of future PhD theses and books.  I applaud them for rolling up their sleeves and getting on with it! You have barley scratched the surface, as you no doubt are well aware, but have taken big steps in moving the area forward.

Mike
------

DETAILED  COMMENTS:
"At the time of writing, most work done in this area does not use semantic-based technologies mainly because of the differences between the multimedia and the web communities and their underlying standardization organizations."

I would agree that this is an important reason, but it may be too strong to say it is the main one.  It suggests that if the web communities and mm communities were more aligned, then semantics would be in use. This seems unlikely to be true, as there is hardly any adoption now of semantic web compared to regular web.
--

The main point of this issue:
"Production versus post-production annotation" should be stated up front. I.e. that it is much easier to annotate earlier rather than later. This only comes out at the end, after a lot of other detail that one has to wade through. Post production is only mentioned at the very end.
--

Insert 'the' before 'semantics' in: "Annotations alone do not establish semantics of what is being marked-up."
--

"A good starting point for having more information on RDF is the RDF Primer <http://www.w3.org/2001/sw/BestPractices/MM/image_annotation.html> ."
Another excellent recommendation is the book co-authored by Frank van Harmelen called something like Semantic Web Primer.
--
SOMEWHAT-SERIOUS
"The limitation of the "external agreement" approach is its inflexibility, i.e., only a limited range of pre-defined information properties can be expressed."

I don't get this, what limits how big a vocabulary that is agreed on by a user community? It can be as big as the community wants it to be. It can be arbitrarily big and complex. Of course it may be more work to get there, but you should say that, if that is the point.

There is a bigger issue here. You are distinguishing an 'external agreement' approach from an 'ontology' approach, as if the ontology approach did not require any agreement. Oh contraire. I see both approaches as being identical from the point of the need for agreement. All have to agree to use the ontology. The main difference is the format in which the agreed-on-terms-and-definitions are specified. In one case, it is natural language, in the other case, it is formal axioms in a logic language. However, even in the latter case, you still need to bottom out in agreement. Defining 'author' in an ontology might not specify any more meaning than in the natural language vocabulary. It depends on how many axioms you use. In fact, if you look at a lot of ontologies, there is not a lot of meaning of many of the terms, you get some domain and range constraints, you get a few axioms here and there, but in general, they are highly ambiguous. For a human to know exactly what is meant by a given term in an ontology, they usually have to rely to some extent on what they think the ontology author intended. The axioms will never be adequate, in most cases. 

So I would say there is ONE way to establish meaning: get agreement on what a set of concepts are, what terms to use for them, and get them carefully defined. Period. You can do that using natural language, or you can use formal ontologies with good natural language comments. If you must have two ways, them maybe call them: agreement-informal and agreement-formal, or something that means the same thing.

In both cases, (because axioms never tell the whole story) the programmer writing applications to interpret the semantics will have to rely on more than the axioms to make sure the application does the right thing when encountering a term or expression (which is an operational definition of 'understands'. 
--

Although technically correct, this text may be hard to follow for a beginner:
"We specified (elsewhere) hasSize to be a functional property, which is the same as saying that every big image has at most one size. The application of that property to BigImage using the cardinality restriction asserts something stronger, that every BigImage has exactly one size. Furthermore, the size of BigImage must be big. The allValuesFrom restriction is on the hasSize property of this BigImage class only. Sizes of Image are not constrained by this local restriction. For more information, the OWL Guide <http://www.w3.org/2001/sw/BestPractices/MM/image_annotation.html>  provides a good overview of the OWL language."

'every big image has at most one size' may be technically true but may be quite confusing.

Try something simpler like:

"This code defines a new concept called BigImage as the set of all members of the class Image such that the size of the image is equal to 'big'." 

Much more may not be much help 
--

Use Case: Personal Photo Collections
There is tension here between who is the is the user vs. the beneficiary of the semantic web technology. Ordinary users may benefit form semantic technologies, but if it is done well, most users will not notice the difference other than it may offer more features than other non-semantic approaches using sophisticated metadata search mechanisms.  This note is presumably not aimed at the ordinary person with a photo collection, but rather those developers of organizer tools like Photoshop file browser, or Iview Media, or dozens or others. It might help to make this clear what you mean by this use case. For example, an opening line like the one for media would be good, altered to match your intent, I'm just guessing:
E.g.: The use case developed in this section is mainly targeted at photo organizer software developers, and less to the general public.
 --

The Press Photo Bank and Bio-Medical Images entries are EMPTY!
I'm reading the editors draft: http://www.w3.org/2001/sw/BestPractices/MM/image_annotation.html
--

Quibble: some of the sections/categories have just one member. This seems categorically odd (so to speak ;-). If the intent is that more use cases in these categories are out there, then say so, that makes sense.
--

Developped is misspelled.
--

Re: the phrase: "Vocabularies Overview <http://www.w3.org/2001/sw/BestPractices/MM/resources/Vocabularies.html>  discusses a number of". Say what kind of thing is "Vocabularies overview" is when introducing it, a separate document? A section of this document? A document that you wrote, or one that others wrote? That helps reader know when/whether to click on the link.  This may happen in other places too, but most other occurrences of links in text seem fine.
--

Section 3 on vocabularies ended suddenly. Perhaps add some summarizing/concluding remarks?
--

Shouldn't section 4 have 'semantic' in the title? The opening text seems to be discussion semantic annotations, not just any annotation.
--

Section 4: You say the list below is a set of "characteristics of semantic image annotation tools", but they don't all quite seem to be that.  Type of content  and granularity are not characteristics of a tool, they are characteristics of images. Perhaps what you mean is tools vary in their ability to handle what kinds of content, or what levels or granularity?  It seems to me that all the characteristics are really nothing to do with the tool, but the images and what users need to do with them. These are not so much characteristics, as dimensions of variation for user requirements, and tools can handle them or not to varying extents. A few really are about tools, like license conditions and collaborative vs. individual. This might just be a 'semantics' quibble, not sure.
--



Style (minor quibble, personal preference). These characteristics are presented in inconsistent ways. Many start off as: "This characteristic ...". Others start by defining the characteristic. I prefer the latter. Let the boldface just be a header for the paragraph, and let the text stand as if that label was not there.  Granularity is handled this way. 
This is a very minor quibble, don't spend too much time on it, unless you feel like it.--
--

Type of content is defined circularly. Do you mean the format of the digital asset (.jpg vs. .png)?
--

Section 5:
Number disagreement: "as an illustrative example" --> "as illustrative examples"
--

SERIOUS: so far, this next comment raises the only issue that seems serious enough to consider addressing right away.  I think it may confuse readers quite a lot.

5.1 I was rather surprised to see a discussion of manual, semi automatic vs. automatic in this use case. It does not seem at all what you advertise it as: "possible semantic web based solution".  That issue is completely independent of this use case and is completely independent of whether you use semantic web technologies for annotation and organization.  It is a very important issue, however and might be include somewhere else in the document. I was expecting to see something about the ontology for annotating the images, some example annotations and maybe what end-user benefit can be had from using a semantic web solution (e.g. can find things more easily due to simple inference say concluding that one geographic region is contained in another you can return images about Milan if someone asks about images about Italy.

This is in stark contrast to section 5.2 which really is a "possible semantic web based solution". It would be great to have a (possibly much smaller) example for the personal photos use case. There must be lots of stuff out there to work with, e.g.  PhotoStuff, and the ISWC Semantic web challenge prize winner for 2005.
--

The document ends abruptly, perhaps add some summarizing/concluding remarks?

Received on Saturday, 11 February 2006 01:56:32 UTC