Re: Unstructured vs. Structured from Internet Business Logic on 2006-02-13 (public-semweb-lifesci@w3.org from February 2006)

From: Internet Business Logic <ibl@snet.net>
Date: Mon, 13 Feb 2006 13:33:39 -0500
To: "Cutler, Roger (RogerCutler)" <RogerCutler@chevron.com>
CC: public-semweb-lifesci@w3.org
Message-ID: <43F0D103.4020201@snet.net>
Roger --

You wrote (below)....

We are pretty interested in the "semi-structured" realm, as defined
above, particularly because we have a lot of business critical
information in spreadsheets, and I noted at the F2F that a number of
other representatives were, too.  


Our approach to this is to add "application semantics" in the form of 
rules in open vocabulary, executable English. The rules basically take 
over where the semi-structuredness stops, and the rules also define 
applications. An advantage of this approach is that one can not only get 
answers to questions, but also English explanations of the answers, at 
the business or scientific level.

For example, [1] takes some CIA World Factbook Data from a spreadsheet, 
and adds rules that figure out the per capita use of oil in each 
country. Explanations show that the results are based on combining 
consumption and poplulation figures from different years, thus adding 
support for the results, at the same time as limiting that support a bit.

Other examples are [2,3,4].

Thanks in advance for comments.

Adrian Walker

[1] http://www.reengineeringllc.com/demo_agents/CiaWorldFacts1.agent

[2] http://www.reengineeringllc.com/demo_agents/MedMine2.agent

[3] http://www.reengineeringllc.com/demo_agents/RelBioOntDefn3.agent

[4] 
http://www.reengineeringllc.com/Oil_Industry_Supply_Chain_by_Kowalski_and_Walker.pdf
-- 

Internet Business Logic (R)
Executable open vocabulary English
Online at www.reengineeringllc.com
Shared use is free

Reengineering,  PO Box 1412,  Bristol,  CT 06011-1412,  USA

Phone 860 583 9677     Mobile 860 830 2085     Fax 860 314 1029




Cutler, Roger (RogerCutler) wrote:

>Welll ... Maybe.  I see your point, but I think nonetheless that there
>are some important distinctions to be made within what you are calling
>non-RDF.  On one extreme one has highly structured data in relational
>databases.  One key here is that the data definitions are contained in
>machine readable, standardized schemas.  Another is that at least some
>of the relationships and keying of the data are explicit.  Slightly less
>structured are XML documents that have schemas. Intermediate are data
>that have internal structure but the definition of that structure is not
>easily determined by a machine.  XML documennts sans schema, HTML
>documents and spreadsheets come to mind, probably in decreasing order of
>"structuredness".  We in CVX call these "semi-structured data", but I'm
>not sure whether this usage is widespread.  Then on the other end of the
>spectrum is text, in which, as you point out, a structure certainly
>exists, but even a human being may find it really hard to figure out and
>formalize that structure.
>
>We are pretty interested in the "semi-structured" realm, as defined
>above, particularly because we have a lot of business critical
>information in spreadsheets, and I noted at the F2F that a number of
>other representatives were, too.  
>
>-----Original Message-----
>From: public-semweb-lifesci-request@w3.org
>[mailto:public-semweb-lifesci-request@w3.org] On Behalf Of Gao, Yong
>Sent: Friday, February 10, 2006 12:02 PM
>To: public-semweb-lifesci@w3.org
>Subject: Unstructured vs. Structured (was: HL7 and patient records in
>RDF/OWL?)
>
>
>Having trained as a computational linguist, one thing I remember vividly
>is the debate among linguists on the issue of semantics vs. syntax. One
>of the wisdoms I gained from that experience is the saying "One man's
>semantics is another man's syntax." (I'll need to dig deeper to find its
>origin.)
>
>Having worked on building practical tools for data extraction and
>integration, I've learned the lesson on the importance of NOT getting
>too boggled down on labeling what's "structured" and what's not. Here I
>quote another saying "One Man's Ceiling is Another Man's Floor"
>
>
>The point I'm trying to make is this: The concept of "structuredness" is
>relative and context-sensitive. For example, natural language texts are
>highly structured, it's just we still have a long way to fully discover
>and understand its structures and use them to find meanings
>mechanically.
>Another example, HTML pages are structured so that web browsers can
>display them properly. XML and RDF data can as well be "unstructured" if
>you put a blob of text, say abstract, between a pair of tags.
>
>I would almost suggest the term "non-RDF", rather than "unstructured",
>be used in the context of transforming some data into RDF format.
>
>---
>Yong Gao, PH.D.
>MassGeneral Institute for Neurodegenerative Disease (MIND)
>
>
>
>
>
>
>
>  
>
Received on Monday, 13 February 2006 18:28:53 UTC