Re: Nutrition / Linked data

Hi Daniel,

Just a quick comment regarding design patterns. In the FlyWeb project
we wanted to build some simple data search and visualisation UIs for
fruit fly genomic data, e.g.

http://openflydata.org/search/gene-expression

We had to deal with data from four different sources in a variety of
formats/protocols, e.g. CSV files, SQL dumps, SQL endpoints, XML
protocols.

We found that the quickest way to get something up and running was to
convert each data source to RDF, i.e. generate an RDF dump for each
data source, load the dump into an off-the-shelf RDF store (we use
Jena TDB [1]), then expose to the Web via an off-the-shelf SPARQL
protocol implementation (e.g. Joseki [2], although we've baked our own
[3] for reasons I won't go into here).

SPARQL then gives you a REST/JSON API you can use to build
browser-based javascript apps, using the normal dynamic HTML / AJAX
pattern [4].

I.e. you don't have to write a line of server-side code, if you don't
want to :)

There are of course lots of other ways to go, but I just thought I'd
say, this has worked well for us.

Hth,

Alistair

[1] http://jena.sourceforge.net/TDB/
[2] http://www.joseki.org/
[3] http://sparqlite.googlecode.com/
[4] http://flyui.googlecode.com/

On Sat, Jan 10, 2009 at 08:44:28PM +1030, Daniel O'Connor wrote:
> Hey all,
> I'm Daniel O'Connor, a software engineer from Australia.
> 
> At the moment I'm trying to get a lot of food nutrition data together from a
> whole bunch of different sources and create a bit of an ontology; publish it
> as RDF; and make sure its chock full of linked data goodness; and I could
> use your help, advice, pointers and encouragement.
> 
> Use cases include things like shopping, diet / fitness applications,
> cooking, and much more.
> 
>    - what did you eat today? -> hey, that's only 75% of your recommended
>    daily energy intake
>    - what is the approximate food energy in this recipe?
>    - tell me the fattiest food I'm eating and replace it with one with more
>    protein (but the same energy content)
> 
> 
> The data sources I've got on my list so far are:
> 
>    - USDA's SR21 food nutrients data (public domain)
>    - Australia's NUTTAB 06 data (not so public domain)
>    - Canadia's CNF data (haven't delved into it in depth)
> 
> The typical format provided is CSV, so I'm going through and mapping those
> CSV exports back into a RDBMS (php + mysql / pgsql / etc), then providing
> tools to generate RDF out, and publishing the static results.
> 
> 
> You can see (and get) the code from:
> http://freebase-owl.googlecode.com/svn/trunk/nutrition/
> 
> and read a bit more about installing from:
> http://clockwerx.blogspot.com/2009/01/generating-nutritional-data-rdf-from.html
> 
> 
> and view samples of the output:
> USDA:
> http://lauken.com/doconnor/nutrition/usda/1006.rdf
> 
> NUTTAB:
> http://lauken.com/doconnor/nutrition/nuttab/01A10027.rdf
> 
> Ontology (draft!):
> http://www.lauken.com/doconnor/nutrition/0.1/schema.rdf
> 
> 
> 
> There's a lot of work for me here, and if anyone here has knowledge or a
> helping hand, I'd love to hear from you, especially regarding the ones in
> bold.
> 
>    - Resolve licensing agreements with Aust. government for rights to
>    reproduce data (in progress)
>    - Model Canadian data
>    - *Find or create a suitable ontology for Nutrition data* (I would have
>    expected some common terms from the bio-rdf community, but I don't have the
>    background to know what I'm looking for)
>    - Model the USDA, NUTTAB and Canadian extensions as appropriate
>    - Find or create (ick hope not) an ontology for measurements in relation
>    to typical nutrition measurements (again, there's no semantic web concepts
>    for milligrams, kilocalories, etc - not even in dbpedia. timbl did some very
>    high level concepts of what a Gram / etc is; but its not quite the same)
>    - Find or create a list of terms used in nutrition data
>    (shorthand/abbreivations) - ie CBODF = "Carbohydrate by difference", but I
>    can't seem to find a good list of these outside of the USDA data itself.
>    - Find or create a *journal publications ontology* (dublincore might do
>    it though; or some other bibliographic ontology) - suggestions?
>    - Find or create *science terms ontology* (Paper, Subject, Experiment,
>    Samples, etc) - anyone?
>    - Create *owl:sameAs links to DBPedia* topics in some automated fashion -
>    this is tricky, because a lot of the data is written as "Cheese, blue" and
>    is much more granular than wikipedia articles about Cheese.
>    - Create *owl:sameAs links to Freebase* topics in some automated fashion
>    - ditto
>    - *Interlink Canadian, NUTTAB, USDA data* in some automated fashion -
>    similar - different naming schemes make using dc:title as a IFP a bit
>    annoying.
>    - Render full sets of RDF for each
>    - Publish these somewhere - http://lauken.com/doconnor/ is not suitable
>    for anything more than a sandbox
>    - Provide human interfaces as appropriate - if anyone wanted to
> create *shiny
>    XSLT -> XHTML *perhaps; or PHP glue...
>    - *Setup a SPARQL endpoint* (I have a hell of a time doing this in my
>    development environment, so this might not happen) - HELP!
>    - Provide unit test coverage for all generator tools
>    - Refactor lots

-- 
Alistair Miles
Senior Computing Officer
Image Bioinformatics Research Group
Department of Zoology
The Tinbergen Building
University of Oxford
South Parks Road
Oxford
OX1 3PS
United Kingdom
Web: http://purl.org/net/aliman
Email: alistair.miles@zoo.ox.ac.uk
Tel: +44 (0)1865 281993

Received on Monday, 12 January 2009 08:16:22 UTC