Best Practices for Converting CSV into LOD? from Wood, Jamey on 2010-08-09 (public-lod@w3.org from August 2010)

From: Wood, Jamey <Jamey.Wood@nrel.gov>
Date: Mon, 9 Aug 2010 10:37:01 -0600
To: "public-lod@w3.org" <public-lod@w3.org>
Message-ID: <C8858ACE.7F47%jamey.wood@nrel.gov>

Are there any established best practices for converting CSV data into LOD-friendly RDF? For example, I would like to produce an LOD-friendly RDF version of the "2001 - Present Net Generation by State by Type of Producer by Energy Source" CSV data at:

http://www.eia.doe.gov/cneaf/electricity/epa/epa_sprdshts_monthly.html

I'm attaching a sample of a first stab at this. Questions I'm running into include the following:

1. Should one try to convert primitive data types (particularly strings) into URI references? Or just leave them as primitives? Or perhaps provide both (with separate predicate names)? For example, the sample EIA data I reference has two-letter state abbreviations in one column. Should those be left alone or converted into URIs?
2. Should one merge separate columns from the original data in order to align to well-known RDF types? For example, the sample EIA data has separate "Year" and "Month" columns. Should those be merged in the RDF version so that an "xs:gYearMonth" type can be used?
3. Should one attempt to introduce some sort of hierarchical structure (to make the LOD more "browseable")? The "skos:related" triples in the attached sample are an initial attempt to do that. Is this a good idea? If so, is that a reasonable predicate to use? If it is a reasonable thing to do, we would presumably craft these triples so that one could navigate through the entire LOD (e.g. "state" -> "state/year" -> "state/year/month" -> "state/year/month/typeOfProducer" -> "state/year/month/typeOfProducer/energySource").
4. Any other considerations that I'm overlooking?

Thanks,
Jamey

Attachments

application/octet-stream attachment: generation_state_mon.rdf

Received on Monday, 9 August 2010 16:37:41 UTC