- From: Internet Business Logic <ibl@snet.net>
- Date: Mon, 13 Feb 2006 13:33:39 -0500
- To: "Cutler, Roger (RogerCutler)" <RogerCutler@chevron.com>
- CC: public-semweb-lifesci@w3.org
Roger -- You wrote (below).... We are pretty interested in the "semi-structured" realm, as defined above, particularly because we have a lot of business critical information in spreadsheets, and I noted at the F2F that a number of other representatives were, too. Our approach to this is to add "application semantics" in the form of rules in open vocabulary, executable English. The rules basically take over where the semi-structuredness stops, and the rules also define applications. An advantage of this approach is that one can not only get answers to questions, but also English explanations of the answers, at the business or scientific level. For example, [1] takes some CIA World Factbook Data from a spreadsheet, and adds rules that figure out the per capita use of oil in each country. Explanations show that the results are based on combining consumption and poplulation figures from different years, thus adding support for the results, at the same time as limiting that support a bit. Other examples are [2,3,4]. Thanks in advance for comments. Adrian Walker [1] http://www.reengineeringllc.com/demo_agents/CiaWorldFacts1.agent [2] http://www.reengineeringllc.com/demo_agents/MedMine2.agent [3] http://www.reengineeringllc.com/demo_agents/RelBioOntDefn3.agent [4] http://www.reengineeringllc.com/Oil_Industry_Supply_Chain_by_Kowalski_and_Walker.pdf -- Internet Business Logic (R) Executable open vocabulary English Online at www.reengineeringllc.com Shared use is free Reengineering, PO Box 1412, Bristol, CT 06011-1412, USA Phone 860 583 9677 Mobile 860 830 2085 Fax 860 314 1029 Cutler, Roger (RogerCutler) wrote: >Welll ... Maybe. I see your point, but I think nonetheless that there >are some important distinctions to be made within what you are calling >non-RDF. On one extreme one has highly structured data in relational >databases. One key here is that the data definitions are contained in >machine readable, standardized schemas. Another is that at least some >of the relationships and keying of the data are explicit. Slightly less >structured are XML documents that have schemas. Intermediate are data >that have internal structure but the definition of that structure is not >easily determined by a machine. XML documennts sans schema, HTML >documents and spreadsheets come to mind, probably in decreasing order of >"structuredness". We in CVX call these "semi-structured data", but I'm >not sure whether this usage is widespread. Then on the other end of the >spectrum is text, in which, as you point out, a structure certainly >exists, but even a human being may find it really hard to figure out and >formalize that structure. > >We are pretty interested in the "semi-structured" realm, as defined >above, particularly because we have a lot of business critical >information in spreadsheets, and I noted at the F2F that a number of >other representatives were, too. > >-----Original Message----- >From: public-semweb-lifesci-request@w3.org >[mailto:public-semweb-lifesci-request@w3.org] On Behalf Of Gao, Yong >Sent: Friday, February 10, 2006 12:02 PM >To: public-semweb-lifesci@w3.org >Subject: Unstructured vs. Structured (was: HL7 and patient records in >RDF/OWL?) > > >Having trained as a computational linguist, one thing I remember vividly >is the debate among linguists on the issue of semantics vs. syntax. One >of the wisdoms I gained from that experience is the saying "One man's >semantics is another man's syntax." (I'll need to dig deeper to find its >origin.) > >Having worked on building practical tools for data extraction and >integration, I've learned the lesson on the importance of NOT getting >too boggled down on labeling what's "structured" and what's not. Here I >quote another saying "One Man's Ceiling is Another Man's Floor" > > >The point I'm trying to make is this: The concept of "structuredness" is >relative and context-sensitive. For example, natural language texts are >highly structured, it's just we still have a long way to fully discover >and understand its structures and use them to find meanings >mechanically. >Another example, HTML pages are structured so that web browsers can >display them properly. XML and RDF data can as well be "unstructured" if >you put a blob of text, say abstract, between a pair of tags. > >I would almost suggest the term "non-RDF", rather than "unstructured", >be used in the context of transforming some data into RDF format. > >--- >Yong Gao, PH.D. >MassGeneral Institute for Neurodegenerative Disease (MIND) > > > > > > > > >
Received on Monday, 13 February 2006 18:28:53 UTC