[Bug 13468] Support Microdata values that are HTML snippets

http://www.w3.org/Bugs/Public/show_bug.cgi?id=13468

--- Comment #15 from Paolo Ciccarese <paolo.ciccarese@gmail.com> 2011-08-05 15:37:51 UTC ---
(In reply to comment #14)
> Could you elaborate on what your application does? I'm trying to learn more
> about your use case to better understand the need here.

Sure... and I apologize in advance for the size of this message.

The application is called DOMEO (Document Metadata Exchange) [1]. The idea is
simple: scientists read online documents and want to create annotations on
them. These are visually created through the application - and specifically
through a GWT component - and stored in a separate store with access control.
In this first phase, I can trigger pipelines for document analysis. In other
words, when a scientist opens a document I can do a bunch of things for her to
save their time. Examples are bibliographic citations and biological entities
extraction (genes, proteins, antibodies). 

Most of the document we deal with are out of our control so we cannot insert 
back any markup nor Microdata. However, our group builds also online portals
for scientific communities working on a specific disease or area (examples:
Pain, Parkinson Disease, MS...). In this last case we have control on the
documents and we can, after a moderation process, re-publish the comments/notes
of our users in the document. So if another user opens that document with DOMEO
- or with a text mining algorithm/tool - , she will automatically get pieces of
knowledge that she can simply look at - possibly with additional data as a
result of meshups with external sources - or reuse and organize in her private
knowledge management space.

One way of embedding those notes back into the document is to use Microdata. 

One example is the portal for Harvard Stem Cell Institute
http://www.stembook.org/ . We have control over these peer reviewed articles -
ex: http://www.stembook.org/node/471 -. We can therefore thinking of embedding
valuable notes back into the document with some Microdata that allows our
applications - that run outside that specific environment - and all the text
mining application of other research groups  to better understand what to look
for and how to parse it for knowledge extraction. In the case of a comment we
can think of embedding a snippet such as the one of my previous email. But we
have many other forms of annotations that are more specific to science:
hypothesis, claims. And these are even more powerful. If we publish an article
in our portal we want to be able to use Microdata to isolate important
scientific claims in the text. Such claims though include references and other
entities (such as protein as I was showing in my previous email) that are
ambiguous if you cannot follow the provided links. Extracting Microdata from
those document will allow to extract automatically the scientific discourse of
such documents. 

You can get the flavor of what scientific discourse is looking at this example
- http://tinyurl.com/3pvvjsc - of another application I developed for Alzheimer
Disease researchers. This list of statements you see here is incredibly
structured. It is actually a very detailed graph that you can see here
http://tinyurl.com/3hrraje . As we have structured data we can embed powerful
and very detailed Microdata back in the original document. These will allow
better knowledge discovery and also to generate multiple views of the classic
document that is still linear and very poor for today's technologies. 

I truly believe lots of knowledge our scientists encode in their annotation is
related to links and other markup that took long time for them to master. To
bring them back to plain text would probably be a big step backward. 

Let me know if you want to know more on the topic and thank you for following
up and trying to better understand our needs.
Paolo

[1] If you want to see the application live here is a screencast of one of my
presentations: http://www.bioontology.org/annotation-ontology . 
At minute 11.30 I explain the goals. At minute 28.55 I show the annotation
process. live.

-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.

Received on Friday, 5 August 2011 15:37:59 UTC