- From: Cutler, Roger (RogerCutler) <RogerCutler@chevron.com>
- Date: Mon, 13 Feb 2006 10:23:45 -0600
- To: "Gao, Yong" <YGAO@PARTNERS.ORG>, public-semweb-lifesci@w3.org
Welll ... Maybe. I see your point, but I think nonetheless that there are some important distinctions to be made within what you are calling non-RDF. On one extreme one has highly structured data in relational databases. One key here is that the data definitions are contained in machine readable, standardized schemas. Another is that at least some of the relationships and keying of the data are explicit. Slightly less structured are XML documents that have schemas. Intermediate are data that have internal structure but the definition of that structure is not easily determined by a machine. XML documennts sans schema, HTML documents and spreadsheets come to mind, probably in decreasing order of "structuredness". We in CVX call these "semi-structured data", but I'm not sure whether this usage is widespread. Then on the other end of the spectrum is text, in which, as you point out, a structure certainly exists, but even a human being may find it really hard to figure out and formalize that structure. We are pretty interested in the "semi-structured" realm, as defined above, particularly because we have a lot of business critical information in spreadsheets, and I noted at the F2F that a number of other representatives were, too. -----Original Message----- From: public-semweb-lifesci-request@w3.org [mailto:public-semweb-lifesci-request@w3.org] On Behalf Of Gao, Yong Sent: Friday, February 10, 2006 12:02 PM To: public-semweb-lifesci@w3.org Subject: Unstructured vs. Structured (was: HL7 and patient records in RDF/OWL?) Having trained as a computational linguist, one thing I remember vividly is the debate among linguists on the issue of semantics vs. syntax. One of the wisdoms I gained from that experience is the saying "One man's semantics is another man's syntax." (I'll need to dig deeper to find its origin.) Having worked on building practical tools for data extraction and integration, I've learned the lesson on the importance of NOT getting too boggled down on labeling what's "structured" and what's not. Here I quote another saying "One Man's Ceiling is Another Man's Floor" The point I'm trying to make is this: The concept of "structuredness" is relative and context-sensitive. For example, natural language texts are highly structured, it's just we still have a long way to fully discover and understand its structures and use them to find meanings mechanically. Another example, HTML pages are structured so that web browsers can display them properly. XML and RDF data can as well be "unstructured" if you put a blob of text, say abstract, between a pair of tags. I would almost suggest the term "non-RDF", rather than "unstructured", be used in the context of transforming some data into RDF format. --- Yong Gao, PH.D. MassGeneral Institute for Neurodegenerative Disease (MIND)
Received on Monday, 13 February 2006 16:24:24 UTC