Scoping: "Tabular Data" from Jeni Tennison on 2014-02-23 (public-csv-wg@w3.org from February 2014)

From: Jeni Tennison <jeni@theodi.org>
Date: Sun, 23 Feb 2014 15:19:42 -0800
To: public-csv-wg@w3.org
Message-ID: <etPan.530a820f.41b71efb.10f@jenit.local>

Hi,

Another scoping question, brought up from Tim Finin’s example from:

  https://www.w3.org/2013/csvw/wiki/Use_Cases#Representing_entitles_and_facts_extracted_from_text

1> :e4 type         PER
2> :e4 mention      "Bart"  D00124 283-286
3> :e4 mention      "JoJo"  D00124 145-149 0.9
4> :e4 per:siblings :e7     D00124 283-286 173-179 274-281
5> :e4 per:age      "10"    D00124 180-181 173-179 182-191 0.9
6> :e4 per:parent   :e9     D00124 180-181 381-380 399-406 D00101 220-225 230-233 201-210
   ^   ^            ^       ^      ^       ^       ^       ^      ^       ^       ^
   1   2            3       4      5       6       7       8      9       10      11

(I’ve added numbers for the implied columns.)

To me, this looks like a text-based format in which each line has a defined format, but where there isn’t the commonality between values in a single column that I would normally expect in what I would consider a tabular format.

So for example, column 6 contains a certainty value on line 3 and an offset range in lines 4-6, while column 8 contains a certainty value on line 5 and a document ID on line 6.

If the data looked like (comma separators added for clarity):

  :e4, type,         PER,    ,
  :e4, mention,      ”Bart”, D00124 283-286,
  :e4, mention,      ”JoJo”, D00124 145-149,                                               0.9
  :e4, per:siblings, :e7,    D00124 283-286 173-179 274-281,
  :e4, per:age,      "10"    D00124 180-181 173-179 182-191,                               0.9
  :e4, per:parent,   :e9     D00124 180-181 381-380 399-406 D00101 220-225 230-233 201-210,
  ^    ^             ^       ^                                                             ^
  1    2             3       4                                                             5

then I would consider it tabular data and could add headers:

  1: subject
  2: predicate
  3: object
  4: location
  5: certainty

Can/should we define tabular data as data where all values in a given column have a common meaning?

Cheers,

Jeni
--  
Jeni Tennison, Technical Director theODI.org  
+44 (0) 7974 420 482 @JeniT

Received on Sunday, 23 February 2014 23:20:11 UTC