W3C home > Mailing lists > Public > public-semweb-lifesci@w3.org > February 2006

Re: Unstructured vs. Structured (was: HL7 and patient records in RDF/OWL?)

From: Christopher Cavnor <ccavnor@systemsbiology.org>
Date: Tue, 14 Feb 2006 13:53:50 -0800
To: <public-semweb-lifesci@w3.org>
Message-ID: <C017916E.23FF9%ccavnor@systemsbiology.org>

I'd argue that most information resources are indeed semi-structured. The
human brain is only able to meta-categorize resources based on its
structured aspects (markup and structural metadata), its informational
content (its aboutness), and context (environmental metadata).

"Structured" data is only structured once we have a common understanding of
its meaning. In this regard, data is never "raw" (except for randomly
generated data) - as even structured database tables have metadata to add
meaning. So the term "semi-structured" is always adequate as far as I am
concerned. You'd have to prove that there is any other type of data to me ;)

Christopher Cavnor

On 2/14/06 10:54 AM, "Cutler, Roger (RogerCutler)" <RogerCutler@chevron.com>

> OK, then is there a preferred term for what we call "semi-structured
> data"?  That is, information that is structured but where the structure
> is not easily determined and perhaps has not been formalized at all, but
> for which a formalized structure could be defined?  For example, tables
> in a spreadsheet?  We really care about this kind of thing, but I don't
> want to confuse the issue by using terms that most people understand
> differently.
> Incidentally, from my personal experience the usage of the term
> semi-structured, that is, binary blobs in structured databases, is not
> very common.  Frankly, this is the first I have heard the term used in
> that sense, but maybe I just don't run in the right circles.
> -----Original Message-----
> From: public-semweb-lifesci-request@w3.org
> [mailto:public-semweb-lifesci-request@w3.org] On Behalf Of Jim Hendler
> Sent: Monday, February 13, 2006 3:43 PM
> To: Pat Hayes; Gao, Yong
> Cc: public-semweb-lifesci@w3.org
> Subject: Re: Unstructured vs. Structured (was: HL7 and patient records
> in RDF/OWL?)
> At 14:46 -0600 2/13/06, Pat Hayes wrote:
>>> The point I'm trying to make is this: The concept of "structuredness"
>>> is relative and context-sensitive.
>> Hear, hear. Well said.
>> Pat Hayes
> FWIW, Structured, unstructured and semi-structured, although non-precise
> concepts in common language and (esp) philosophy, have well-defined and
> precise meanings in database jargon" -- most database books have decent
> definitions that are consistent with:
>   unstructured - NL text
>   semi-structured - unstructured fields within a structured DB context
>   structured - relational model (or similar) (those papers with
> technical definitions tend to get ugly and recourse to relational
> calculus, so these overly simplified definitions should suffice for now)
> that said, in the spirit of this particular thread, I think we should be
> careful and, if we mean to use it in a DB context, make it clear in any
> document that uses the term (i.e. "structured database" v.
> "structured data" which are very different in some contexts)
>     -JH
Received on Wednesday, 15 February 2006 05:26:21 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 14:52:25 UTC