danbri intro

I thought I'd send this around, ... they're the notes I presented from
last week. Mostly neutral general intro but a bit also about why I'm
here. I will work on some more specifically Google-oriented use cases
with colleagues, but in brief the Google interest here is in making it
easier for data consumers to understand CSV data, and in making it
easier for publishers to use a familiar format rather than having to
worry about converting their data files into some fancy new format.

Dan




I work on structured data at Google, particularly schema.org (a
partnership with Bing, Yahoo, Yandex), and at Google on various ways
of getting data into the Knowledge Graph. For a high level schema.org
perspective on tabular data, see the position paper to last year's
Open Data on the Web W3C workshop,
<http://www.w3.org/2013/04/odw/papers>
<http://www.w3.org/2013/04/odw/odw13_submission_53.pdf>


A few brief observations about our work here.


1. The "comma" in CSV

The core focus here is on tabular data for the Web: metadata about
tabular data. What shape it should take, what it should express, how
it can be found, packaged and interpreted.

In a way, naming this group after CSV is a little misleading; in
practice tab-separated is equally important. But it serves as a
marker: we are practically minded, and driven by a concern to know
more about the millions of real tabular files in circulation that are
most stereotypically expressed as CSV. But tab-separated is of course
also in scope.


2. The relationship to RDF

There are two roles RDF might play: as data model (and syntax, eg.
json-ld) for the metadata about a table as a target data model that
tabular data might be mapped into, i.e. tables to graphs

It is far from a given that RDF is a perfect match for either. As a
co-chair I'll say that these decisions will need to be grounded in
practical use cases, and ideally running sample code. As a Google
engineer I'll say that we consider RDF's basic graph data model key in
both areas, but are wary of trying to squeeze too much into the graph
data model. Sometimes the best representation of a table is a table.
We'll come back to this in good time.


3. This could be an endless task

Note that describing "tabular data on the Web" could be considered a
larger problem than describing the structure of relational databases.
We could be here forever. However Jeni, Ivan and I have no such
intention. We need to do something useful quickly and pragmatically
that addresses real world problems. To this end we will put a lot of
focus on use cases and scenarios that come with actual CSV data, and
then analyzing those situations and datasets to pull out their common
features.

Received on Wednesday, 5 February 2014 11:57:37 UTC