W3C home > Mailing lists > Public > public-semweb-lifesci@w3.org > February 2006

Re: Unstructured vs. Structured (was: HL7 and patient records in RDF/OWL?)

From: John Madden <john.madden@duke.edu>
Date: Fri, 10 Feb 2006 16:26:23 -0500
Message-Id: <52135CDC-531A-4805-AF1D-E0FC480A5DC7@duke.edu>
To: public-semweb-lifesci@w3.org

This discussion is fine, but let me bring this back to GRDDL.

GRDDL could be used to specify ways of generating RDF from many kinds  
of XML documents. The documents could be data-oriented XML, or text- 
oriented XML, or even (I think this is accurate, Eric?) could be  
documents with very sparse XML markup where the extraction transform  
operates exclusively on CDATA.

In any case, I'd like more discussion on the issue that: use of GRDDL  
to extract RDF from healthcare-oriented documents, especially non- 
html documents, is an important focus of HCLS, and one of the  
workgroups should take ownership of this issue.


On Feb 10, 2006, at 1:01 PM, Gao, Yong wrote:

> Having trained as a computational linguist, one thing I remember  
> vividly is the
> debate among linguists on the issue of semantics vs. syntax. One of  
> the wisdoms
> I gained from that experience is the saying "One man's semantics is  
> another
> man's syntax." (I'll need to dig deeper to find its origin.)
> Having worked on building practical tools for data extraction and  
> integration,
> I've learned the lesson on the importance of NOT getting too  
> boggled down on
> labeling what's "structured" and what's not. Here I quote another  
> saying "One
> Man's Ceiling is Another Man's Floor"
> The point I'm trying to make is this: The concept of  
> "structuredness" is
> relative and context-sensitive. For example, natural language texts  
> are highly
> structured, it's just we still have a long way to fully discover  
> and understand
> its structures and use them to find meanings mechanically.
> Another example, HTML pages are structured so that web browsers can  
> display them
> properly. XML and RDF data can as well be "unstructured" if you put  
> a blob of
> text, say abstract, between a pair of tags.
> I would almost suggest the term "non-RDF", rather than  
> "unstructured", be used
> in the context of transforming some data into RDF format.
> ---
> Yong Gao, PH.D.
> MassGeneral Institute for Neurodegenerative Disease (MIND)
Received on Friday, 10 February 2006 21:26:52 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 14:52:25 UTC