Re: New metadata draft from Andy Seaborne on 2014-05-26 (public-csv-wg@w3.org from May 2014)

From: Andy Seaborne <andy@apache.org>
Date: Mon, 26 May 2014 12:26:32 +0100
To: Jeni Tennison <jeni@jenitennison.com>
CC: CSV on the Web Working Group <public-csv-wg@w3.org>
Message-ID: <538324E8.40300@apache.org>

On 21/05/14 18:02, Jeni Tennison wrote:
> Hi,
>
> I’ve done some fairly substantial work on the metadata draft [1] to
> get the structure and content more towards where I think we want it
> to head, including trying to map the existing data package structures
> into something that makes (more) sense if we’re viewing the metadata
> documents as JSON-LD structures with a metadata vocabulary.
>
> There’s still a lot of work to do (and loads of issues as you’ll
> see), but I think it’s a little more internally consistent now.
> Comments appreciated.
>
> Jeni
>
> [1] http://w3c.github.io/csvw/metadata/
> -- Jeni Tennison http://www.jenitennison.com/
>

The text about primaryKey in 3.4.2 trigged a thought.

CSV files are denormalised [*] tables.

Two rows can be "about" the same entity (subject in RDF speak).
It is a way, in spreadsheets, to express multiple values.

Person  phone
fred    +44-020-555-1234
fred    +44-029-555-6789

or

Person  phone
fred    +44-020-555-1234
         +44-029-555-6789

There seems to be two uses of "primary key":  it's technical role as
index into the table, and conceptual role as identifier of the principle
entity of the row.

In the database modelling world, where the table structure can be change
to have uniqueness at the row level be related to the uniqueness of
conceptual entity, this is fine, indeed good modelling.  CSV files
aren't so neat.

Suggestion: call the indexing aspect "indexKey"

 Andy

[*] Database sense of "denormalized".

Received on Monday, 26 May 2014 11:27:03 UTC