W3C home > Mailing lists > Public > public-semweb-lifesci@w3.org > February 2006

Re: Unstructured vs. Structured (was: HL7 and patient records in RDF/OWL?)

From: Matthew Cockerill <matt@biomedcentral.com>
Date: Wed, 15 Feb 2006 06:33:09 +0000
Message-Id: <01C9890D-D52F-4B56-8183-1DC7AF7092D9@biomedcentral.com>
Cc: "Christopher Cavnor" <ccavnor@systemsbiology.org>, public-semweb-lifesci@w3.org
To: "Cutler, Roger (RogerCutler)" <RogerCutler@chevron.com>

I couldn't agree more.

Spreadsheets (and equivalently, CSV files) are a large fraction of  
the 'additional datafiles' that BioMed Central receives from authors.

What would be great would be to be able to define some simple  
standards and/or templates which authors could follow in their  
spreadsheets, to allow the automatic recognition of key life science  
identifiers, and quantitative attributes,  and so the generation of RDF.

 From my point of view, that's the most basic, practical and  
prevalent example of the whole semi-structured data, and so seems  
like a good starting point.

Matt

On 15 Feb 2006, at 5:42, Cutler, Roger (RogerCutler) wrote:

>
> That's too deep for me.  I'll be satisfied, at least in an immediate
> sense, with a demonstration of how to generate RDF from an Excel
> spreadsheet.  I think I'll just start saying "Excel spreadsheet" and
> forget about the term that we use internally to categorize the  
> kinds of
> problems we have.  Spreadsheets are pretty much the 80-20 of that
> problem, so why not call a spade a spade.  I'm really not very good at
> generalizing and categorizing.
>
> -----Original Message-----
> From: public-semweb-lifesci-request@w3.org
> [mailto:public-semweb-lifesci-request@w3.org] On Behalf Of Christopher
> Cavnor
> Sent: Tuesday, February 14, 2006 3:54 PM
> To: public-semweb-lifesci@w3.org
> Subject: Re: Unstructured vs. Structured (was: HL7 and patient records
> in RDF/OWL?)
>
>
> I'd argue that most information resources are indeed semi-structured.
> The human brain is only able to meta-categorize resources based on its
> structured aspects (markup and structural metadata), its informational
> content (its aboutness), and context (environmental metadata).
>
> "Structured" data is only structured once we have a common  
> understanding
> of its meaning. In this regard, data is never "raw" (except for  
> randomly
> generated data) - as even structured database tables have metadata to
> add meaning. So the term "semi-structured" is always adequate as  
> far as
> I am concerned. You'd have to prove that there is any other type of  
> data
> to me ;)
>
>
> --
> Christopher Cavnor
>
>
> On 2/14/06 10:54 AM, "Cutler, Roger (RogerCutler)"
> <RogerCutler@chevron.com>
> wrote:
>
>>
>> OK, then is there a preferred term for what we call "semi-structured
>> data"?  That is, information that is structured but where the
> structure
>> is not easily determined and perhaps has not been formalized at all,
> but
>> for which a formalized structure could be defined?  For example,
> tables
>> in a spreadsheet?  We really care about this kind of thing, but I
> don't
>> want to confuse the issue by using terms that most people understand
>> differently.
>>
>> Incidentally, from my personal experience the usage of the term
>> semi-structured, that is, binary blobs in structured databases, is  
>> not
>> very common.  Frankly, this is the first I have heard the term  
>> used in
>> that sense, but maybe I just don't run in the right circles.
>>
>> -----Original Message-----
>> From: public-semweb-lifesci-request@w3.org
>> [mailto:public-semweb-lifesci-request@w3.org] On Behalf Of Jim  
>> Hendler
>> Sent: Monday, February 13, 2006 3:43 PM
>> To: Pat Hayes; Gao, Yong
>> Cc: public-semweb-lifesci@w3.org
>> Subject: Re: Unstructured vs. Structured (was: HL7 and patient  
>> records
>> in RDF/OWL?)
>>
>>
>> At 14:46 -0600 2/13/06, Pat Hayes wrote:
>>>>
>>>> The point I'm trying to make is this: The concept of
> "structuredness"
>>>> is relative and context-sensitive.
>>>
>>> Hear, hear. Well said.
>>>
>>> Pat Hayes
>>>
>>
>>
>> FWIW, Structured, unstructured and semi-structured, although
> non-precise
>> concepts in common language and (esp) philosophy, have well-defined
> and
>> precise meanings in database jargon" -- most database books have
> decent
>> definitions that are consistent with:
>>   unstructured - NL text
>>   semi-structured - unstructured fields within a structured DB  
>> context
>>   structured - relational model (or similar) (those papers with
>> technical definitions tend to get ugly and recourse to relational
>> calculus, so these overly simplified definitions should suffice for
> now)
>> that said, in the spirit of this particular thread, I think we should
> be
>> careful and, if we mean to use it in a DB context, make it clear in
> any
>> document that uses the term (i.e. "structured database" v.
>> "structured data" which are very different in some contexts)
>>     -JH
>
>
>
>
>
>
Received on Wednesday, 15 February 2006 06:34:23 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 18:00:42 GMT