Re: [BIORDF] Re: Unstructured vs. Structured (was: HL7 and patient records in RDF/OWL?) from Alf Eaton on 2006-02-23 (public-semweb-lifesci@w3.org from February 2006)

From: Alf Eaton <lists@hubmed.org>
Date: Thu, 23 Feb 2006 13:00:46 -0500
To: public-semweb-lifesci@w3.org
Message-Id: <0D7BB02D-380F-4627-A667-4237BF6200FC@hubmed.org>

To follow up on this, do you think it would be possible to create a  
generic GRDDL transformation that would extract information from any  
well-structured XHTML table, using the scoped <th> row and column  
headers?

alf.

On 19 Feb 2006, at 15:07, Alf Eaton wrote:

>
> I've been trying to decide on a good way to provide tabular data in  
> papers using XHTML, for presentation online. The best options seem  
> to be either just embedding the data as an array using JSON, or  
> using tables with class and id markup and allowing them to be  
> processed with GRDDL or Javascript to transform the data. Has there  
> been any work on presenting spreadsheets in XHTML?
>
> alf.
>
> On 19 Feb 2006, at 12:17, Eric Neumann wrote:
>
>>
>> Matt,
>>
>> Spreadsheets are indeed useful as formatted sources that can be  
>> readily converted into RDF. We've used them as the primary source  
>> of expression data for BioDash (see attached averages; full  
>> GeneLogic data at http://www.samsi.info/200304/dmml/web-internal/ 
>> bio/data/data_rsvd.xls ). It almost seems a mapping tool could be  
>> written to take any excel files, a GRDDL-like conversion of column  
>> headers, row-headers, and cells, to produce RDF from these (see  
>> the example).
>>
>> In our example, we wrote the conversion scripts directly into the  
>> excel file. The resulting (adenine/N3) file is show as well, with  
>> symbols strings mapped to URI's. The cool thing here is that if  
>> you add a DB query using the symbols strings (we did this within  
>> BioDash), you can take the returned gene information, convert it  
>> to RDF, and conenct it to the expression graph through the probes  
>> for each the row (see resulting adenine file).
>>
>> Perhaps the BIORDF group should include using sdf sources as part  
>> of their overall strategy for producing RDF from current  
>> structured files (e.g.,  gene expression, screening, and clinical  
>> data in sdf). Many published papers have data tables, and this  
>> would be a great way to auto convert them to RDF!
>>
>> Eric
>>
>> --- Matthew Cockerill <matt@biomedcentral.com> wrote:
>>
>>>
>>> I couldn't agree more.
>>>
>>> Spreadsheets (and equivalently, CSV files) are a
>>> large fraction of
>>> the 'additional datafiles' that BioMed Central
>>> receives from authors.
>>>
>>> What would be great would be to be able to define
>>> some simple
>>> standards and/or templates which authors could
>>> follow in their
>>> spreadsheets, to allow the automatic recognition of
>>> key life science
>>> identifiers, and quantitative attributes,  and so
>>> the generation of RDF.
>>>
>>>  From my point of view, that's the most basic,
>>> practical and
>>> prevalent example of the whole semi-structured data,
>>> and so seems
>>> like a good starting point.
>>>
>>> Matt
>>>
>>> On 15 Feb 2006, at 5:42, Cutler, Roger (RogerCutler)
>>> wrote:
>>>
>>>>
>>>> That's too deep for me.  I'll be satisfied, at
>>> least in an immediate
>>>> sense, with a demonstration of how to generate RDF
>>> from an Excel
>>>> spreadsheet.  I think I'll just start saying
>>> "Excel spreadsheet" and
>>>> forget about the term that we use internally to
>>> categorize the
>>>> kinds of
>>>> problems we have.  Spreadsheets are pretty much
>>> the 80-20 of that
>>>> problem, so why not call a spade a spade.  I'm
>>> really not very good at
>>>> generalizing and categorizing.
>

Received on Thursday, 23 February 2006 18:01:30 UTC