Re: Nutrition / Linked data from Jay Luker on 2009-01-11 (public-lod@w3.org from January 2009)

From: Jay Luker <lbjay@reallywow.com>
Date: Sun, 11 Jan 2009 08:14:09 -0500
To: public-lod@w3.org
Message-ID: <30292b940901110514u2234b6b6l90115fbecba26388@mail.gmail.com>
Really interesting, Daniel. This hits kind of a sweet spot for me in that it
intersects LOD & food. I've been toying with some ideas related more to
recipes and cooking, but also with the thought of using the USDA data.

For the SPARQL endpoint, since your code is PHP I would think the ARC
modules would be a natural fit. There's a good example of how easy it here:
http://inkdroid.org/journal/2008/07/07/lcshinfo-sparql-endpoint/.

--jay

On Sat, Jan 10, 2009 at 5:14 AM, Daniel O'Connor
<daniel.oconnor@gmail.com>wrote:

> Hey all,
> I'm Daniel O'Connor, a software engineer from Australia.
>
> At the moment I'm trying to get a lot of food nutrition data together from
> a whole bunch of different sources and create a bit of an ontology; publish
> it as RDF; and make sure its chock full of linked data goodness; and I could
> use your help, advice, pointers and encouragement.
>
> Use cases include things like shopping, diet / fitness applications,
> cooking, and much more.
>
>    - what did you eat today? -> hey, that's only 75% of your recommended
>    daily energy intake
>    - what is the approximate food energy in this recipe?
>    - tell me the fattiest food I'm eating and replace it with one with
>    more protein (but the same energy content)
>
>
> The data sources I've got on my list so far are:
>
>    - USDA's SR21 food nutrients data (public domain)
>    - Australia's NUTTAB 06 data (not so public domain)
>    - Canadia's CNF data (haven't delved into it in depth)
>
> The typical format provided is CSV, so I'm going through and mapping those
> CSV exports back into a RDBMS (php + mysql / pgsql / etc), then providing
> tools to generate RDF out, and publishing the static results.
>
>
> You can see (and get) the code from:
> http://freebase-owl.googlecode.com/svn/trunk/nutrition/
>
> and read a bit more about installing from:
>
> http://clockwerx.blogspot.com/2009/01/generating-nutritional-data-rdf-from.html
>
>
> and view samples of the output:
> USDA:
> http://lauken.com/doconnor/nutrition/usda/1006.rdf
>
> NUTTAB:
> http://lauken.com/doconnor/nutrition/nuttab/01A10027.rdf
>
> Ontology (draft!):
> http://www.lauken.com/doconnor/nutrition/0.1/schema.rdf
>
>
>
> There's a lot of work for me here, and if anyone here has knowledge or a
> helping hand, I'd love to hear from you, especially regarding the ones in
> bold.
>
>    - Resolve licensing agreements with Aust. government for rights to
>    reproduce data (in progress)
>    - Model Canadian data
>    - *Find or create a suitable ontology for Nutrition data* (I would have
>    expected some common terms from the bio-rdf community, but I don't have the
>    background to know what I'm looking for)
>    - Model the USDA, NUTTAB and Canadian extensions as appropriate
>    - Find or create (ick hope not) an ontology for measurements in
>    relation to typical nutrition measurements (again, there's no semantic web
>    concepts for milligrams, kilocalories, etc - not even in dbpedia. timbl did
>    some very high level concepts of what a Gram / etc is; but its not quite the
>    same)
>     - Find or create a list of terms used in nutrition data
>    (shorthand/abbreivations) - ie CBODF = "Carbohydrate by difference", but I
>    can't seem to find a good list of these outside of the USDA data itself.
>    - Find or create a *journal publications ontology* (dublincore might do
>    it though; or some other bibliographic ontology) - suggestions?
>    - Find or create *science terms ontology* (Paper, Subject, Experiment,
>    Samples, etc) - anyone?
>    - Create *owl:sameAs links to DBPedia* topics in some automated fashion
>    - this is tricky, because a lot of the data is written as "Cheese, blue" and
>    is much more granular than wikipedia articles about Cheese.
>    - Create *owl:sameAs links to Freebase* topics in some automated
>    fashion - ditto
>    - *Interlink Canadian, NUTTAB, USDA data* in some automated fashion -
>    similar - different naming schemes make using dc:title as a IFP a bit
>    annoying.
>    - Render full sets of RDF for each
>    - Publish these somewhere - http://lauken.com/doconnor/ is not suitable
>    for anything more than a sandbox
>    - Provide human interfaces as appropriate - if anyone wanted to create
>    *shiny XSLT -> XHTML *perhaps; or PHP glue...
>    - *Setup a SPARQL endpoint* (I have a hell of a time doing this in my
>    development environment, so this might not happen) - HELP!
>     - Provide unit test coverage for all generator tools
>    - Refactor lots
>
>
>
>
>
Received on Monday, 12 January 2009 09:25:59 UTC