dwbp-ISSUE-239 (Laufer): machine-readable standardized data formats - serialization data formats - dataset formats [Best practices document(s)] from Data on the Web Best Practices Working Group Issue Tracker on 2016-02-17 (public-dwbp-wg@w3.org from February 2016)

From: Data on the Web Best Practices Working Group Issue Tracker <sysbot+tracker@w3.org>
Date: Wed, 17 Feb 2016 18:22:11 +0000
To: public-dwbp-wg@w3.org
Message-Id: <E1aW6jr-0005Tq-V1@maia.w3.org>

dwbp-ISSUE-239 (Laufer): machine-readable standardized data formats - serialization data formats - dataset formats [Best practices document(s)]

http://www.w3.org/2013/dwbp/track/issues/239

Raised by: Carlos Laufer
On product: Best practices document(s)

In Best Practice 14, "Use machine-readable standardized data formats", the term data format is used to define the serialization format of a dataset distribution.

The example uses GTFS (https://developers.google.com/transit/gtfs/reference), a standard way of distributing timetables. We have here two standards: GTFS (structure and serialization) and CSV (serialization). GTFS is distributed as a set of CSV files embedded in a single .zip style file.

The previous BP examples use timetables but it is not explicit if it was a GTFS feed. It could be any format and it seems that it is a single file containing all the information, distributed in different formats as csv, json, ttl, etc. But GTFS is a standard way of defining more that the serialization format (a set of csv files). It defines the structure and the meaning of data (a set of specific named files and a vocabulary).

Serialization standardized data formats has a semantic related to how a machine understand the meta-model of the different ways of distributing data, the data itself is inside this pack. This data could use a standard: a vocabulary or a more complex structure of distribution, as GTFS, for example, and so on.

I think this difference should be clear in the document. Maybe it will be interesting to have a BP talking about things like GTFS. I cannot see a BP that talks about this: using standards for publishing datasets for specific domains or applications.

Received on Wednesday, 17 February 2016 18:22:17 UTC