Re: Comments on the 9 April version of the BP doc from Bernadette Farias Lóscio on 2016-04-16 (public-dwbp-wg@w3.org from April 2016)

From: Bernadette Farias Lóscio <bfl@cin.ufpe.br>
Date: Sat, 16 Apr 2016 13:03:54 -0300
To: Phil Archer <phila@w3.org>
Cc: Public DWBP WG <public-dwbp-wg@w3.org>
Message-ID: <CANx1PzzhxOQe7vgDnOtM3L7drHjeoB=eYvGJ1DwR0bqxHn6vXA@mail.gmail.com>
Hi Phil,

Thanks a lot for your detailed review! Your comments and suggestions are
really important to improve the document. Please, find some comments below.

2016-04-15 4:32 GMT-03:00 Phil Archer <phila@w3.org>:

> As flagged, I've been working on my native speaker review of the doc and,
> in doing so, have been paying very close attention to the text. This leads
> me to make a number of comments that go beyond simple native speaker edits
> and are there ones that should be assessed like any other comment.
>
> My review begins at the Data Formats section.
>
> #MachineReadableStandardizedFormat
> ===================================
>
> There is no definition of 'machine readable', or of proprietary software.
> "computational tools typically available in the relevant domain" will
> surely include .docx and .xlsx, for example.
>
> I looked at the Wikipedia page which links to a doc from the US government
> https://en.wikipedia.org/wiki/Machine-readable_data. from that I suggest
> the following:
>
>
> <p>There is an important distinction between formats that can be read and
> edited by humans using a computer and formats that are <em>machine
> readable</em>. The latter term implies that the data is readily extracted,
> transformed and processed by a computer. The following definition of
> machine readable is based on that provided by the US Office of Management
> and Budget's definition in their Preparation and Submission of Strategic
> Plans, Annual Performance Plans, and Annual Program Performance Reports
> [[OMB-A11]]</p>
> <p><strong>Machine readable</strong>: A format in a standard computer
> language (not natural language text) that can be read automatically by a
> computer system. Traditional word processing documents and portable
> document format (PDF) files are easily read by humans but typically are
> difficult for machines to interpret. Formats such as XML, JSON, NetCDF, RDF
> or spreadsheets with header columns that can be exported as CSV are machine
> readable formats.</p>
>
>
> Biblio entry
>
>        "OMB-A11": {
>               "title": "Preparation and Submission of Strategic Plans,
> Annual Performance Plans, and Annual Program Performance Reports",
>          "href":"
> https://www.whitehouse.gov/sites/default/files/omb/assets/a11_current_year/s200.pdf
> ",
>         "date": "2015",
>         "publisher":"Office of Management and Budget (OMB)",
>         "id":"OMB Circular A-11"
>

I suggest to include the first paragraph in the Why section of the BP and
the second one in the glossary.



> #MultipleFormats
> ================
>
> Suggest that the intended outcome could be worded along the lines of:
>
> "As many users as possible will be able to use the data without first
> having to transform it into their preferred format."
>
> I have  many similar comments on intended outcomes. I think they should be
> statements of the specific benefit that is gained, so "to enable X" rather
> than "Doing X will enable Y."
>

I agree! We were not sure about the best way to present the intended
outcomes. We're gonna review the other BP considering your proposal.


>
> I very much dislike the word 'intended' in the sentence: "Consider the
> data formats most likely to be needed by intended users, and consider
> alternatives that are likely to be useful in the future." The idea of
> making data on the WEb is that it's up to the user to decide that he/she
> intends to do with it, not the publisher.
>
> Suggest simply making it "Consider the data formats most likely to be
> needed and consider alternatives that are likely to be useful in the future.
>

yes, when publishing data on the Web it can be difficult to know the
"intended users".


>
> #MetadataStandardized
> =====================
>
> Suggest rewording the intended outcome
>
> Currently:
> Standardized code lists and other commonly used terms will enhance
> interoperability and consensus among data publishers and consumers.
>
> Could be:
> Enhanced interoperability and consensus among data publishers and
> consumers.
>

I agree, but before making the change I think we should discuss this
proposal with Antoine.


> #ReuseVocabularies
> ==================
> Again, the intended outcome could be worded more succinctly I think.
>
> "Using the same vocabulary to describe metadata will make datasets and
> metadata sets easier to be compared by humans or machines. When two
> datasets or metadata sets use the same vocabulary, (automatic) processing
> tools designed for one can be more easily applied to the other. This
> greatly facilitates re-use of datasets"
>
> could be simply
>
> To make datasets and metadata easier to compare and integrate by humans or
> machines.
>
> (I added 'and integrate', which I personally think is important but this
> is more than an editorial change).
>

I agree, but before making the change I think we should discuss this
proposal with Antoine.


>
> #ChooseRightFormalizationLevel
> ==============================
>
> I would word the intended outcome as:
>
> The data supports a wide range of application cases but is not more
> complex to produce and reuse than necessary, or, to paraphrase Albert
> Einstein, "Everything should be made as simple as possible, but no simpler."
>
> The Einstein line is often quoted but, like so many quotations, is
> probably a misquote.
>
> And I'd say that the how to test line would be improved by using the word
> 'typical' rather than target:
>
> For formal knowledge representation languages, applying an inference
> engine on top of the data that uses a given vocabulary does not produce too
> many statements that are unnecessary for typical applications.
>

I'm gonna send a specific message to Antoine asking feedback about your
proposed changes.


> #Sensitive
> ==========
>
> I'd word the intended outcome as:
>
> "To enable data consumers to know that data that is referred to from the
> current dataset is unavailable or only available under different
> conditions."
>
> I changed the reference to HTTP status code 404 to 303 (see other) when
> doing the native speaker review. I *really* don't want us to include
> deliberate 404s as a Best Practice :-(
>

ok!


> #BulkAccess
> ===========
>
> I don't think this should only refer to cases where data is spread across
> multiple locations. I think it shoujld also cover the simple case of making
> a file available, as opposed to only providing an API. This is in addition
> to, not instead of what is written about multiple locations - which I think
> is very good.
>
> I'd phrase the intended outcome as:
>
> "Bulk download enables developers to access the complete dataset for local
> processing without the need for further calls to the Web."
>

I propose to complement the Why section to include "the simple case of
making a file available". For the intended outcome I propose:

"To enable developers to access the complete dataset for local processing
without the need for further calls to the Web."


> #ProvideSubsets
> ===============
>
> The intended outcome section is too long IMO. All the content is valid, I
> just think some of it could be moved to the Why section.
>
> Really not sure about include an example of making a set of PDFs available.
>

I agree with you! I already discussed the PDF point with Annette. Let's
discuss this issue with Annette.


>
> #Conneg
> =======
>
> In tidying up the language of this BP I pretty much rewrote it. I hope
> without changing your meaning significantly.
>
> I suggest the intended outcome could be phrased as: "To enable different
> representations of the same resource to be served fromt he same URI
> according to the request made by the client."
>

I'm gonna check with Newton if he is ok with your proposal.


> #AccessRealTime
> ===============
>
> I would word the intended outcome as:
>
> "To enable applications to access time-critical data in real time or near
> real time, where real-time means a range from milliseconds to a few seconds
> after the data creation, and near real time is a predetermined delay for
> expected data delivery."
>

I agree!


> #AccessUptoDate
> ===============
>
> I think this sentence: "The international date format is recommended to
> avoid any ambiguity <a href="
> https://www.w3.org/International/questions/qa-date-format">
> https://www.w3.org/International/questions/qa-date-format</a>."
>
> Would be better as:
>
> "Datestamps should be formatted using the XML Schema <a
> href="/TR/xmlschema11-2/#dateTimeStamp">dateTimeStamp</a> datatype
> [[xmlschema11-2]]."
>
> Although I note that the NOAA example uses the horrible "Mar, 3rd 2016 at
> 9:03:07 pm PST" format which breaks this advice :-(
>

I was discussing this BP with Annette and I think we should make more
updates. I'm gonna try to rewrite a proposal.


> #documentYourAPI
> ================
>
> I'd write the intended outcome as:
>
> "Developers can obtain detailed information about each call to the API,
> including the parameters it takes and what it is expected to return."
>

I like it! I think the current version is too long.


>
> #documentYourAPI
> ================
>
> This is very spatial, ideally we should have some non-spatial examples as
> well. I can tell this came from Linda and Jeremy et al :-)
>

I think the example section is a mixture of approach to implementation and
examples. We're gonna review this and make a proposal.


>
> #EvaluateCoverage
> =================
>
> I'd phrase the intended outcome as
>
> "To enable data consumers to appreciate the coverage and external
> dependencies of a given dataset."
>

I agree!


>
> #Serialisation
> ==============
>
> Intended outcome suggestion:
>
> To enable machines to process a dataset even if the original software that
> was used to create it is no longer available or supported.
>

I agree!



> More later
>

Looking forward to your comments!

We're gonna wait until we have more feedback from the group to see if we
have contradictory comments or proposals. Then we're gonna present to the
group the proposal updates based on the member's feedback .

Thanks a lot!

Berna

>
> Phil.
>
>
>
>
> --
>
>
> Phil Archer
> W3C Data Activity Lead
> http://www.w3.org/2013/data/
>
> http://philarcher.org
> +44 (0)7887 767755
> @philarcher1
>
>


-- 
Bernadette Farias Lóscio
Centro de Informática
Universidade Federal de Pernambuco - UFPE, Brazil
----------------------------------------------------------------------------
Received on Saturday, 16 April 2016 16:04:44 UTC