Re: Best Practices for Converting CSV into LOD? from Wood, Jamey on 2010-08-13 (public-lod@w3.org from August 2010)

From: Wood, Jamey <Jamey.Wood@nrel.gov>
Date: Fri, 13 Aug 2010 13:46:48 -0600
To: "public-lod@w3.org" <public-lod@w3.org>
Message-ID: <C88AFD48.810F%jamey.wood@nrel.gov>

Thanks to everyone who responded to my questions (both on this list and privately). One thing I realized is that sending out my example(s) as RDF snippets that lacked dereferenceable URIs probably wasn't a good idea (since one of my core goals is to produce not just good RDF, but good RDF which is LOD-friendly).

So I have fleshed-out a couple of examples to incorporate some of the suggestions I've received and put them up as live LOD. (They're still very much works in progress, though-so I do expect they'll change or disappear soon.)

They're available at:

http://en.openei.org/lod/resource/datasets/43
http://en.openei.org/lod/resource/datasets/43b

I've but these two samples together to try to clarify my third question (about making LOD "browseable"), which is still murkiest to me. In the "43" example, the data is crafted to have a hierarchical path through the data ("state" -> "state/year" -> "state/year/month" -> "state/year/month/type_of_producer" -> "state/year/month/type_of_producer/energy_source"). In the "43b" example, no such attempt is made. Instead, 43b links each leaf data node back to the "root" of the dataset (" /lod/resource/datasets/43b") via a "dcterms:isReferencedBy" predicate and to a URI for the associated state (e.g. "/lod/resource/datasets/43b/AK") via a "openei:datasets/43b/terms/state" predicate. (This state URI is then linked to DBpedia's state URI via a "skos:closeMatch" predicate.)

Thus, the 43b example would seem to be less amenable to HTML-based browsing. For example, note how these pages end up being overwhelming (and truncated):

http://en.openei.org/lod/resource/datasets/43b
http://en.openei.org/lod/resource/datasets/43b/AK

So what I'm still wondering is whether striving for a non-overwhelming HTML browsing experience for a given set of LOD is a worthwhile goal. And, if so, is the "43" example taking a reasonable path to achieve that goal? Or is there some better way?

Thanks,
Jamey

On 8/9/10 10:37 AM, "Jamey Wood" <jamey.wood@nrel.gov> wrote:

Are there any established best practices for converting CSV data into LOD-friendly RDF? For example, I would like to produce an LOD-friendly RDF version of the "2001 - Present Net Generation by State by Type of Producer by Energy Source" CSV data at:

http://www.eia.doe.gov/cneaf/electricity/epa/epa_sprdshts_monthly.html

I'm attaching a sample of a first stab at this. Questions I'm running into include the following:

1. Should one try to convert primitive data types (particularly strings) into URI references? Or just leave them as primitives? Or perhaps provide both (with separate predicate names)? For example, the sample EIA data I reference has two-letter state abbreviations in one column. Should those be left alone or converted into URIs?
2. Should one merge separate columns from the original data in order to align to well-known RDF types? For example, the sample EIA data has separate "Year" and "Month" columns. Should those be merged in the RDF version so that an "xs:gYearMonth" type can be used?
3. Should one attempt to introduce some sort of hierarchical structure (to make the LOD more "browseable")? The "skos:related" triples in the attached sample are an initial attempt to do that. Is this a good idea? If so, is that a reasonable predicate to use? If it is a reasonable thing to do, we would presumably craft these triples so that one could navigate through the entire LOD (e.g. "state" -> "state/year" -> "state/year/month" -> "state/year/month/typeOfProducer" -> "state/year/month/typeOfProducer/energySource").
4. Any other considerations that I'm overlooking?

Thanks,
Jamey

Received on Friday, 13 August 2010 19:47:31 UTC