- From: Phil Archer <phila@w3.org>
- Date: Fri, 15 Apr 2016 08:32:25 +0100
- To: Public DWBP WG <public-dwbp-wg@w3.org>
As flagged, I've been working on my native speaker review of the doc
and, in doing so, have been paying very close attention to the text.
This leads me to make a number of comments that go beyond simple native
speaker edits and are there ones that should be assessed like any other
comment.
My review begins at the Data Formats section.
#MachineReadableStandardizedFormat
===================================
There is no definition of 'machine readable', or of proprietary
software. "computational tools typically available in the relevant
domain" will surely include .docx and .xlsx, for example.
I looked at the Wikipedia page which links to a doc from the US
government https://en.wikipedia.org/wiki/Machine-readable_data. from
that I suggest the following:
<p>There is an important distinction between formats that can be read
and edited by humans using a computer and formats that are <em>machine
readable</em>. The latter term implies that the data is readily
extracted, transformed and processed by a computer. The following
definition of machine readable is based on that provided by the US
Office of Management and Budget's definition in their Preparation and
Submission of Strategic Plans, Annual Performance Plans, and Annual
Program Performance Reports [[OMB-A11]]</p>
<p><strong>Machine readable</strong>: A format in a standard computer
language (not natural language text) that can be read automatically by a
computer system. Traditional word processing documents and portable
document format (PDF) files are easily read by humans but typically are
difficult for machines to interpret. Formats such as XML, JSON, NetCDF,
RDF or spreadsheets with header columns that can be exported as CSV are
machine readable formats.</p>
Biblio entry
"OMB-A11": {
"title": "Preparation and Submission of Strategic Plans, Annual
Performance Plans, and Annual Program Performance Reports",
"href":"https://www.whitehouse.gov/sites/default/files/omb/assets/a11_current_year/s200.pdf",
"date": "2015",
"publisher":"Office of Management and Budget (OMB)",
"id":"OMB Circular A-11"
#MultipleFormats
================
Suggest that the intended outcome could be worded along the lines of:
"As many users as possible will be able to use the data without first
having to transform it into their preferred format."
I have many similar comments on intended outcomes. I think they should
be statements of the specific benefit that is gained, so "to enable X"
rather than "Doing X will enable Y."
I very much dislike the word 'intended' in the sentence: "Consider the
data formats most likely to be needed by intended users, and consider
alternatives that are likely to be useful in the future." The idea of
making data on the WEb is that it's up to the user to decide that he/she
intends to do with it, not the publisher.
Suggest simply making it "Consider the data formats most likely to be
needed and consider alternatives that are likely to be useful in the future.
#MetadataStandardized
=====================
Suggest rewording the intended outcome
Currently:
Standardized code lists and other commonly used terms will enhance
interoperability and consensus among data publishers and consumers.
Could be:
Enhanced interoperability and consensus among data publishers and consumers.
#ReuseVocabularies
==================
Again, the intended outcome could be worded more succinctly I think.
"Using the same vocabulary to describe metadata will make datasets and
metadata sets easier to be compared by humans or machines. When two
datasets or metadata sets use the same vocabulary, (automatic)
processing tools designed for one can be more easily applied to the
other. This greatly facilitates re-use of datasets"
could be simply
To make datasets and metadata easier to compare and integrate by humans
or machines.
(I added 'and integrate', which I personally think is important but this
is more than an editorial change).
#ChooseRightFormalizationLevel
==============================
I would word the intended outcome as:
The data supports a wide range of application cases but is not more
complex to produce and reuse than necessary, or, to paraphrase Albert
Einstein, "Everything should be made as simple as possible, but no simpler."
The Einstein line is often quoted but, like so many quotations, is
probably a misquote.
And I'd say that the how to test line would be improved by using the
word 'typical' rather than target:
For formal knowledge representation languages, applying an inference
engine on top of the data that uses a given vocabulary does not produce
too many statements that are unnecessary for typical applications.
#Sensitive
==========
I'd word the intended outcome as:
"To enable data consumers to know that data that is referred to from the
current dataset is unavailable or only available under different
conditions."
I changed the reference to HTTP status code 404 to 303 (see other) when
doing the native speaker review. I *really* don't want us to include
deliberate 404s as a Best Practice :-(
#BulkAccess
===========
I don't think this should only refer to cases where data is spread
across multiple locations. I think it shoujld also cover the simple case
of making a file available, as opposed to only providing an API. This is
in addition to, not instead of what is written about multiple locations
- which I think is very good.
I'd phrase the intended outcome as:
"Bulk download enables developers to access the complete dataset for
local processing without the need for further calls to the Web."
#ProvideSubsets
===============
The intended outcome section is too long IMO. All the content is valid,
I just think some of it could be moved to the Why section.
Really not sure about include an example of making a set of PDFs available.
#Conneg
=======
In tidying up the language of this BP I pretty much rewrote it. I hope
without changing your meaning significantly.
I suggest the intended outcome could be phrased as: "To enable different
representations of the same resource to be served fromt he same URI
according to the request made by the client."
#AccessRealTime
===============
I would word the intended outcome as:
"To enable applications to access time-critical data in real time or
near real time, where real-time means a range from milliseconds to a few
seconds after the data creation, and near real time is a predetermined
delay for expected data delivery."
#AccessUptoDate
===============
I think this sentence: "The international date format is recommended to
avoid any ambiguity <a
href="https://www.w3.org/International/questions/qa-date-format">https://www.w3.org/International/questions/qa-date-format</a>."
Would be better as:
"Datestamps should be formatted using the XML Schema <a
href="/TR/xmlschema11-2/#dateTimeStamp">dateTimeStamp</a> datatype
[[xmlschema11-2]]."
Although I note that the NOAA example uses the horrible "Mar, 3rd 2016
at 9:03:07 pm PST" format which breaks this advice :-(
#documentYourAPI
================
I'd write the intended outcome as:
"Developers can obtain detailed information about each call to the API,
including the parameters it takes and what it is expected to return."
#documentYourAPI
================
This is very spatial, ideally we should have some non-spatial examples
as well. I can tell this came from Linda and Jeremy et al :-)
#EvaluateCoverage
=================
I'd phrase the intended outcome as
"To enable data consumers to appreciate the coverage and external
dependencies of a given dataset."
#Serialisation
==============
Intended outcome suggestion:
To enable machines to process a dataset even if the original software
that was used to create it is no longer available or supported.
More later
Phil.
--
Phil Archer
W3C Data Activity Lead
http://www.w3.org/2013/data/
http://philarcher.org
+44 (0)7887 767755
@philarcher1
Received on Friday, 15 April 2016 07:32:40 UTC