Re: Specifying the number of rows in a table from Gregg Kellogg on 2018-10-16 (public-csvw@w3.org from October 2018)

From: Gregg Kellogg <gregg@greggkellogg.net>
Date: Tue, 16 Oct 2018 16:12:36 -0700
To: Clark Fitzgerald <clarkfitzg@gmail.com>
Cc: public-csvw@w3.org
Message-Id: <4A45796E-55B0-46D9-8FB9-9B607B8E00F0@greggkellogg.net>
> On Oct 11, 2018, at 6:10 PM, Clark Fitzgerald <clarkfitzg@gmail.com> wrote:
> 
> Thanks!
> 
> This is unfamiliar territory for me, so I want to make sure I understand what you're saying. Suppose statsmetadata.org <http://statsmetadata.org/> defines the vocabulary. Then a fully compliant solution might look like:
> 
> {
> "url": "data.csv",
> "notes": [{"http://statsmetadata.org/terms/numberRows <http://statsmetadata.org/terms/numberRows>": 1e6, "http://statsmetadata.org/terms/randomized <http://statsmetadata.org/terms/randomized>": true}]
> }

I think that should work.

> The second suggestion, which isn't currently supported, might look like:
> 
> {
> "url": "data.csv",
> "notes": [{"@context": "http://statsmetadata.org <http://statsmetadata.org/>", "numberRows": 1e6, "randomized": true}]
> }

Right, this could be created as an issue on https://github.com/w3c/csvw <https://github.com/w3c/csvw> for a hypothetical future group to consider, and is a great place to preserve such ideas. We did this for JSON-LD for several years before a CG took the ideas forward. There is a fairly inactive CG for CSVW (https://www.w3.org/community/csvw/ <https://www.w3.org/community/csvw/>). Presumably, they could do some work on a community draft that would at least provide some tooling to manage it, but it would take a big effort to get the group stirring. (There are some other issues which may prompt this eventually, but we’ll have to see).

Gregg

> On Thu, Oct 11, 2018 at 3:09 PM Gregg Kellogg <gregg@greggkellogg.net <mailto:gregg@greggkellogg.net>> wrote:
>> On Oct 10, 2018, at 1:46 PM, Clark Fitzgerald <clarkfitzg@gmail.com <mailto:clarkfitzg@gmail.com>> wrote:
>> 
>> Hello,
>> 
>> I would like to use W3's tabular data model to record metadata for local CSV files relevant for statistical analysis using the R language (or Python, Julia). For example, to indicate that the local file "data.csv" contains one million rows in randomized order I might use the following table description:
>> 
>> {
>> "url": "data.csv",
>> "notes": [{"numberRows": 1e6, "randomized": true}]
>> }
>> 
>> A couple questions:
>> 
>> 1. Is this reasonable/correct?
> 
> More or less. The metadata document can have a “notes” attribute with arbitrary content. This creates a notes annotation on the table [1]. To be properly treated as JSON-LD/RDF, both “numberOfRows” and “randomized” need to resolve to IRIs. They could be in one of the namespaces defined for CSVW [2], or you can use an absolute IRI for the property, otherwise. WIthout this, the data would be dropped when interpreted, at least by the CSV2RDF process.
> 
>> 2. Is there a better way to do it? Perhaps by linking to a document that defines new common properties like numberRows?
> 
> This is somewhat problematic, as CSVW doesn’t allow you to define arbitrary JSON-LD contexts, otherwise, you might define your namespace or term mappings in a context within “notes”. It’s not an unreasonable thing to do, though IMHO, and tool creators may be convinced to support this as an extension. Actually, IMHO, an update to this spec could remove the restriction on the value of @context, and/or allow CSVW to be used within other contexts, such as schema.org <http://schema.org/>, but strictly speaking, you can’t do this now.
> 
> Gregg
> 
> [1] https://www.w3.org/TR/2015/REC-tabular-data-model-20151217/#dfn-table-notes <https://www.w3.org/TR/2015/REC-tabular-data-model-20151217/#dfn-table-notes>
> [2] https://www.w3.org/ns/csvw#term-definitions <https://www.w3.org/ns/csvw#term-definitions>
> 
>> Thanks,
>> Clark Fitzgerald
>
Received on Tuesday, 16 October 2018 23:13:02 UTC