- From: <bugzilla@jessica.w3.org>
- Date: Fri, 05 Aug 2011 15:37:53 +0000
- To: public-html-bugzilla@w3.org
http://www.w3.org/Bugs/Public/show_bug.cgi?id=13468 --- Comment #15 from Paolo Ciccarese <paolo.ciccarese@gmail.com> 2011-08-05 15:37:51 UTC --- (In reply to comment #14) > Could you elaborate on what your application does? I'm trying to learn more > about your use case to better understand the need here. Sure... and I apologize in advance for the size of this message. The application is called DOMEO (Document Metadata Exchange) [1]. The idea is simple: scientists read online documents and want to create annotations on them. These are visually created through the application - and specifically through a GWT component - and stored in a separate store with access control. In this first phase, I can trigger pipelines for document analysis. In other words, when a scientist opens a document I can do a bunch of things for her to save their time. Examples are bibliographic citations and biological entities extraction (genes, proteins, antibodies). Most of the document we deal with are out of our control so we cannot insert back any markup nor Microdata. However, our group builds also online portals for scientific communities working on a specific disease or area (examples: Pain, Parkinson Disease, MS...). In this last case we have control on the documents and we can, after a moderation process, re-publish the comments/notes of our users in the document. So if another user opens that document with DOMEO - or with a text mining algorithm/tool - , she will automatically get pieces of knowledge that she can simply look at - possibly with additional data as a result of meshups with external sources - or reuse and organize in her private knowledge management space. One way of embedding those notes back into the document is to use Microdata. One example is the portal for Harvard Stem Cell Institute http://www.stembook.org/ . We have control over these peer reviewed articles - ex: http://www.stembook.org/node/471 -. We can therefore thinking of embedding valuable notes back into the document with some Microdata that allows our applications - that run outside that specific environment - and all the text mining application of other research groups to better understand what to look for and how to parse it for knowledge extraction. In the case of a comment we can think of embedding a snippet such as the one of my previous email. But we have many other forms of annotations that are more specific to science: hypothesis, claims. And these are even more powerful. If we publish an article in our portal we want to be able to use Microdata to isolate important scientific claims in the text. Such claims though include references and other entities (such as protein as I was showing in my previous email) that are ambiguous if you cannot follow the provided links. Extracting Microdata from those document will allow to extract automatically the scientific discourse of such documents. You can get the flavor of what scientific discourse is looking at this example - http://tinyurl.com/3pvvjsc - of another application I developed for Alzheimer Disease researchers. This list of statements you see here is incredibly structured. It is actually a very detailed graph that you can see here http://tinyurl.com/3hrraje . As we have structured data we can embed powerful and very detailed Microdata back in the original document. These will allow better knowledge discovery and also to generate multiple views of the classic document that is still linear and very poor for today's technologies. I truly believe lots of knowledge our scientists encode in their annotation is related to links and other markup that took long time for them to master. To bring them back to plain text would probably be a big step backward. Let me know if you want to know more on the topic and thank you for following up and trying to better understand our needs. Paolo [1] If you want to see the application live here is a screencast of one of my presentations: http://www.bioontology.org/annotation-ontology . At minute 11.30 I explain the goals. At minute 28.55 I show the annotation process. live. -- Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug.
Received on Friday, 5 August 2011 15:37:59 UTC