Re: Comments on the 9 April version of the BP doc from Antoine Isaac on 2016-04-21 (public-dwbp-wg@w3.org from April 2016)

From: Antoine Isaac <aisaac@few.vu.nl>
Date: Thu, 21 Apr 2016 10:26:04 +0200
To: <public-dwbp-wg@w3.org>
Message-ID: <57188E9C.2050707@few.vu.nl>
Hi Phil,

Regarding your suggestions about the BP vocabularies.
I'm ok with them, but I won't be able to handle the dependencies with Annette's comments on the same parts [1,2] and other changes on the intended outcome that we had suggested but are not yet implemented [3]. This is quite a mess...

Antoine

[1] https://lists.w3.org/Archives/Public/public-dwbp-wg/2016Apr/0135.html
[2] https://lists.w3.org/Archives/Public/public-dwbp-wg/2016Apr/0136.html
[3] https://lists.w3.org/Archives/Public/public-dwbp-wg/2016Apr/0056.html

On 15/04/16 09:32, Phil Archer wrote:
> As flagged, I've been working on my native speaker review of the doc and, in doing so, have been paying very close attention to the text. This leads me to make a number of comments that go beyond simple native speaker edits and are there ones that should be assessed like any other comment.
>
> My review begins at the Data Formats section.
>
> #MachineReadableStandardizedFormat
> ===================================
>
> There is no definition of 'machine readable', or of proprietary software. "computational tools typically available in the relevant domain" will surely include .docx and .xlsx, for example.
>
> I looked at the Wikipedia page which links to a doc from the US government https://en.wikipedia.org/wiki/Machine-readable_data. from that I suggest the following:
>
>
> <p>There is an important distinction between formats that can be read and edited by humans using a computer and formats that are <em>machine readable</em>. The latter term implies that the data is readily extracted, transformed and processed by a computer. The following definition of machine readable is based on that provided by the US Office of Management and Budget's definition in their Preparation and Submission of Strategic Plans, Annual Performance Plans, and Annual Program Performance Reports [[OMB-A11]]</p>
> <p><strong>Machine readable</strong>: A format in a standard computer language (not natural language text) that can be read automatically by a computer system. Traditional word processing documents and portable document format (PDF) files are easily read by humans but typically are difficult for machines to interpret. Formats such as XML, JSON, NetCDF, RDF or spreadsheets with header columns that can be exported as CSV are machine readable formats.</p>
>
>
> Biblio entry
>
>         "OMB-A11": {
>            "title": "Preparation and Submission of Strategic Plans, Annual Performance Plans, and Annual Program Performance Reports",
> "href":"https://www.whitehouse.gov/sites/default/files/omb/assets/a11_current_year/s200.pdf",
>          "date": "2015",
>          "publisher":"Office of Management and Budget (OMB)",
>          "id":"OMB Circular A-11"
>
> #MultipleFormats
> ================
>
> Suggest that the intended outcome could be worded along the lines of:
>
> "As many users as possible will be able to use the data without first having to transform it into their preferred format."
>
> I have  many similar comments on intended outcomes. I think they should be statements of the specific benefit that is gained, so "to enable X" rather than "Doing X will enable Y."
>
>
> I very much dislike the word 'intended' in the sentence: "Consider the data formats most likely to be needed by intended users, and consider alternatives that are likely to be useful in the future." The idea of making data on the WEb is that it's up to the user to decide that he/she intends to do with it, not the publisher.
>
> Suggest simply making it "Consider the data formats most likely to be needed and consider alternatives that are likely to be useful in the future.
>
>
> #MetadataStandardized
> =====================
>
> Suggest rewording the intended outcome
>
> Currently:
> Standardized code lists and other commonly used terms will enhance interoperability and consensus among data publishers and consumers.
>
> Could be:
> Enhanced interoperability and consensus among data publishers and consumers.
>
> #ReuseVocabularies
> ==================
> Again, the intended outcome could be worded more succinctly I think.
>
> "Using the same vocabulary to describe metadata will make datasets and metadata sets easier to be compared by humans or machines. When two datasets or metadata sets use the same vocabulary, (automatic) processing tools designed for one can be more easily applied to the other. This greatly facilitates re-use of datasets"
>
> could be simply
>
> To make datasets and metadata easier to compare and integrate by humans or machines.
>
> (I added 'and integrate', which I personally think is important but this is more than an editorial change).
>
>
> #ChooseRightFormalizationLevel
> ==============================
>
> I would word the intended outcome as:
>
> The data supports a wide range of application cases but is not more complex to produce and reuse than necessary, or, to paraphrase Albert Einstein, "Everything should be made as simple as possible, but no simpler."
>
> The Einstein line is often quoted but, like so many quotations, is probably a misquote.
>
> And I'd say that the how to test line would be improved by using the word 'typical' rather than target:
>
> For formal knowledge representation languages, applying an inference engine on top of the data that uses a given vocabulary does not produce too many statements that are unnecessary for typical applications.
>
> #Sensitive
> ==========
>
> I'd word the intended outcome as:
>
> "To enable data consumers to know that data that is referred to from the current dataset is unavailable or only available under different conditions."
>
> I changed the reference to HTTP status code 404 to 303 (see other) when doing the native speaker review. I *really* don't want us to include deliberate 404s as a Best Practice :-(
>
> #BulkAccess
> ===========
>
> I don't think this should only refer to cases where data is spread across multiple locations. I think it shoujld also cover the simple case of making a file available, as opposed to only providing an API. This is in addition to, not instead of what is written about multiple locations - which I think is very good.
>
> I'd phrase the intended outcome as:
>
> "Bulk download enables developers to access the complete dataset for local processing without the need for further calls to the Web."
>
> #ProvideSubsets
> ===============
>
> The intended outcome section is too long IMO. All the content is valid, I just think some of it could be moved to the Why section.
>
> Really not sure about include an example of making a set of PDFs available.
>
>
> #Conneg
> =======
>
> In tidying up the language of this BP I pretty much rewrote it. I hope without changing your meaning significantly.
>
> I suggest the intended outcome could be phrased as: "To enable different representations of the same resource to be served fromt he same URI according to the request made by the client."
>
> #AccessRealTime
> ===============
>
> I would word the intended outcome as:
>
> "To enable applications to access time-critical data in real time or near real time, where real-time means a range from milliseconds to a few seconds after the data creation, and near real time is a predetermined delay for expected data delivery."
>
> #AccessUptoDate
> ===============
>
> I think this sentence: "The international date format is recommended to avoid any ambiguity <a href="https://www.w3.org/International/questions/qa-date-format">https://www.w3.org/International/questions/qa-date-format</a>."
>
> Would be better as:
>
> "Datestamps should be formatted using the XML Schema <a href="/TR/xmlschema11-2/#dateTimeStamp">dateTimeStamp</a> datatype [[xmlschema11-2]]."
>
> Although I note that the NOAA example uses the horrible "Mar, 3rd 2016 at 9:03:07 pm PST" format which breaks this advice :-(
>
> #documentYourAPI
> ================
>
> I'd write the intended outcome as:
>
> "Developers can obtain detailed information about each call to the API, including the parameters it takes and what it is expected to return."
>
> #documentYourAPI
> ================
>
> This is very spatial, ideally we should have some non-spatial examples as well. I can tell this came from Linda and Jeremy et al :-)
>
> #EvaluateCoverage
> =================
>
> I'd phrase the intended outcome as
>
> "To enable data consumers to appreciate the coverage and external dependencies of a given dataset."
>
> #Serialisation
> ==============
>
> Intended outcome suggestion:
>
> To enable machines to process a dataset even if the original software that was used to create it is no longer available or supported.
>
> More later
>
> Phil.
>
>
>
>
Received on Thursday, 21 April 2016 08:26:37 UTC