RE: [BIORDF] Re: Unstructured vs. Structured (was: HL7 and patient records in RDF/OWL?)

Not sure if this is the same question, but I wonder how it might be most appropriate to express, within an XHTML TH header (and the equivalent within the a journal XML DTD), the URI of the datatype that the content values in that table column represent.

This could also be used to render CSV data in a form suitable for easy unambiguous scraping, with the datatypes identified.

Matt



> -----Original Message-----
> From: public-semweb-lifesci-request@w3.org
> [mailto:public-semweb-lifesci-request@w3.org]On Behalf Of Alf Eaton
> Sent: 23 February 2006 18:01
> To: public-semweb-lifesci@w3.org
> Subject: Re: [BIORDF] Re: Unstructured vs. Structured (was: HL7 and
> patient records in RDF/OWL?)
> 
> 
> 
> To follow up on this, do you think it would be possible to create a  
> generic GRDDL transformation that would extract information from any  
> well-structured XHTML table, using the scoped <th> row and column  
> headers?
> 
> alf.
> 
> On 19 Feb 2006, at 15:07, Alf Eaton wrote:
> 
> >
> > I've been trying to decide on a good way to provide tabular 
> data in  
> > papers using XHTML, for presentation online. The best options seem  
> > to be either just embedding the data as an array using JSON, or  
> > using tables with class and id markup and allowing them to be  
> > processed with GRDDL or Javascript to transform the data. 
> Has there  
> > been any work on presenting spreadsheets in XHTML?
> >
> > alf.
> >
> > On 19 Feb 2006, at 12:17, Eric Neumann wrote:
> >
> >>
> >> Matt,
> >>
> >> Spreadsheets are indeed useful as formatted sources that can be  
> >> readily converted into RDF. We've used them as the primary source  
> >> of expression data for BioDash (see attached averages; full  
> >> GeneLogic data at http://www.samsi.info/200304/dmml/web-internal/ 
> >> bio/data/data_rsvd.xls ). It almost seems a mapping tool could be  
> >> written to take any excel files, a GRDDL-like conversion 
> of column  
> >> headers, row-headers, and cells, to produce RDF from these (see  
> >> the example).
> >>
> >> In our example, we wrote the conversion scripts directly into the  
> >> excel file. The resulting (adenine/N3) file is show as well, with  
> >> symbols strings mapped to URI's. The cool thing here is that if  
> >> you add a DB query using the symbols strings (we did this within  
> >> BioDash), you can take the returned gene information, convert it  
> >> to RDF, and conenct it to the expression graph through the probes  
> >> for each the row (see resulting adenine file).
> >>
> >> Perhaps the BIORDF group should include using sdf sources as part  
> >> of their overall strategy for producing RDF from current  
> >> structured files (e.g.,  gene expression, screening, and clinical  
> >> data in sdf). Many published papers have data tables, and this  
> >> would be a great way to auto convert them to RDF!
> >>
> >> Eric
> >>
> >> --- Matthew Cockerill <matt@biomedcentral.com> wrote:
> >>
> >>>
> >>> I couldn't agree more.
> >>>
> >>> Spreadsheets (and equivalently, CSV files) are a
> >>> large fraction of
> >>> the 'additional datafiles' that BioMed Central
> >>> receives from authors.
> >>>
> >>> What would be great would be to be able to define
> >>> some simple
> >>> standards and/or templates which authors could
> >>> follow in their
> >>> spreadsheets, to allow the automatic recognition of
> >>> key life science
> >>> identifiers, and quantitative attributes,  and so
> >>> the generation of RDF.
> >>>
> >>>  From my point of view, that's the most basic,
> >>> practical and
> >>> prevalent example of the whole semi-structured data,
> >>> and so seems
> >>> like a good starting point.
> >>>
> >>> Matt
> >>>
> >>> On 15 Feb 2006, at 5:42, Cutler, Roger (RogerCutler)
> >>> wrote:
> >>>
> >>>>
> >>>> That's too deep for me.  I'll be satisfied, at
> >>> least in an immediate
> >>>> sense, with a demonstration of how to generate RDF
> >>> from an Excel
> >>>> spreadsheet.  I think I'll just start saying
> >>> "Excel spreadsheet" and
> >>>> forget about the term that we use internally to
> >>> categorize the
> >>>> kinds of
> >>>> problems we have.  Spreadsheets are pretty much
> >>> the 80-20 of that
> >>>> problem, so why not call a spade a spade.  I'm
> >>> really not very good at
> >>>> generalizing and categorizing.
> >
> 
> 
> 


This message has been scanned for viruses by BlackSpider MailControl - www.blackspider.com

Received on Thursday, 23 February 2006 18:12:38 UTC