Re: Comments on the 9 April version of the BP doc

Thanks Antoine,

I see that you and Annette have been very active in making comments. 
Having spent a lot of time on the doc in the last week I must focus on 
other stuff today. The editors are working on a list of issues arising 
from all the comments and Dee will guide us through all that tomorrow.

Cheers for now

Phil.

On 21/04/2016 09:18, Antoine Isaac wrote:
> Hi Phil,
>
> Regarding your suggestions about the BP vocabularies.
> I'm ok with them, but I won't be able to handle the dependencies with
> Annette's comments on the same parts [1,2] and other changes on the
> intended outcome that we had suggested but are not yet implemented [3].
> This is quite a mess...
>
> Antoine
>
> [1] https://lists.w3.org/Archives/Public/public-dwbp-wg/2016Apr/0135.html
> [2] https://lists.w3.org/Archives/Public/public-dwbp-wg/2016Apr/0136.html
> [3] https://lists.w3.org/Archives/Public/public-dwbp-wg/2016Apr/0056.html
>
> On 15/04/16 09:32, Phil Archer wrote:
>> As flagged, I've been working on my native speaker review of the doc
>> and, in doing so, have been paying very close attention to the text.
>> This leads me to make a number of comments that go beyond simple
>> native speaker edits and are there ones that should be assessed like
>> any other comment.
>>
>> My review begins at the Data Formats section.
>>
>> #MachineReadableStandardizedFormat
>> ===================================
>>
>> There is no definition of 'machine readable', or of proprietary
>> software. "computational tools typically available in the relevant
>> domain" will surely include .docx and .xlsx, for example.
>>
>> I looked at the Wikipedia page which links to a doc from the US
>> government https://en.wikipedia.org/wiki/Machine-readable_data. from
>> that I suggest the following:
>>
>>
>> <p>There is an important distinction between formats that can be read
>> and edited by humans using a computer and formats that are <em>machine
>> readable</em>. The latter term implies that the data is readily
>> extracted, transformed and processed by a computer. The following
>> definition of machine readable is based on that provided by the US
>> Office of Management and Budget's definition in their Preparation and
>> Submission of Strategic Plans, Annual Performance Plans, and Annual
>> Program Performance Reports [[OMB-A11]]</p>
>> <p><strong>Machine readable</strong>: A format in a standard computer
>> language (not natural language text) that can be read automatically by
>> a computer system. Traditional word processing documents and portable
>> document format (PDF) files are easily read by humans but typically
>> are difficult for machines to interpret. Formats such as XML, JSON,
>> NetCDF, RDF or spreadsheets with header columns that can be exported
>> as CSV are machine readable formats.</p>
>>
>>
>> Biblio entry
>>
>>         "OMB-A11": {
>>            "title": "Preparation and Submission of Strategic Plans,
>> Annual Performance Plans, and Annual Program Performance Reports",
>> "href":"https://www.whitehouse.gov/sites/default/files/omb/assets/a11_current_year/s200.pdf",
>>
>>          "date": "2015",
>>          "publisher":"Office of Management and Budget (OMB)",
>>          "id":"OMB Circular A-11"
>>
>> #MultipleFormats
>> ================
>>
>> Suggest that the intended outcome could be worded along the lines of:
>>
>> "As many users as possible will be able to use the data without first
>> having to transform it into their preferred format."
>>
>> I have  many similar comments on intended outcomes. I think they
>> should be statements of the specific benefit that is gained, so "to
>> enable X" rather than "Doing X will enable Y."
>>
>>
>> I very much dislike the word 'intended' in the sentence: "Consider the
>> data formats most likely to be needed by intended users, and consider
>> alternatives that are likely to be useful in the future." The idea of
>> making data on the WEb is that it's up to the user to decide that
>> he/she intends to do with it, not the publisher.
>>
>> Suggest simply making it "Consider the data formats most likely to be
>> needed and consider alternatives that are likely to be useful in the
>> future.
>>
>>
>> #MetadataStandardized
>> =====================
>>
>> Suggest rewording the intended outcome
>>
>> Currently:
>> Standardized code lists and other commonly used terms will enhance
>> interoperability and consensus among data publishers and consumers.
>>
>> Could be:
>> Enhanced interoperability and consensus among data publishers and
>> consumers.
>>
>> #ReuseVocabularies
>> ==================
>> Again, the intended outcome could be worded more succinctly I think.
>>
>> "Using the same vocabulary to describe metadata will make datasets and
>> metadata sets easier to be compared by humans or machines. When two
>> datasets or metadata sets use the same vocabulary, (automatic)
>> processing tools designed for one can be more easily applied to the
>> other. This greatly facilitates re-use of datasets"
>>
>> could be simply
>>
>> To make datasets and metadata easier to compare and integrate by
>> humans or machines.
>>
>> (I added 'and integrate', which I personally think is important but
>> this is more than an editorial change).
>>
>>
>> #ChooseRightFormalizationLevel
>> ==============================
>>
>> I would word the intended outcome as:
>>
>> The data supports a wide range of application cases but is not more
>> complex to produce and reuse than necessary, or, to paraphrase Albert
>> Einstein, "Everything should be made as simple as possible, but no
>> simpler."
>>
>> The Einstein line is often quoted but, like so many quotations, is
>> probably a misquote.
>>
>> And I'd say that the how to test line would be improved by using the
>> word 'typical' rather than target:
>>
>> For formal knowledge representation languages, applying an inference
>> engine on top of the data that uses a given vocabulary does not
>> produce too many statements that are unnecessary for typical
>> applications.
>>
>> #Sensitive
>> ==========
>>
>> I'd word the intended outcome as:
>>
>> "To enable data consumers to know that data that is referred to from
>> the current dataset is unavailable or only available under different
>> conditions."
>>
>> I changed the reference to HTTP status code 404 to 303 (see other)
>> when doing the native speaker review. I *really* don't want us to
>> include deliberate 404s as a Best Practice :-(
>>
>> #BulkAccess
>> ===========
>>
>> I don't think this should only refer to cases where data is spread
>> across multiple locations. I think it shoujld also cover the simple
>> case of making a file available, as opposed to only providing an API.
>> This is in addition to, not instead of what is written about multiple
>> locations - which I think is very good.
>>
>> I'd phrase the intended outcome as:
>>
>> "Bulk download enables developers to access the complete dataset for
>> local processing without the need for further calls to the Web."
>>
>> #ProvideSubsets
>> ===============
>>
>> The intended outcome section is too long IMO. All the content is
>> valid, I just think some of it could be moved to the Why section.
>>
>> Really not sure about include an example of making a set of PDFs
>> available.
>>
>>
>> #Conneg
>> =======
>>
>> In tidying up the language of this BP I pretty much rewrote it. I hope
>> without changing your meaning significantly.
>>
>> I suggest the intended outcome could be phrased as: "To enable
>> different representations of the same resource to be served fromt he
>> same URI according to the request made by the client."
>>
>> #AccessRealTime
>> ===============
>>
>> I would word the intended outcome as:
>>
>> "To enable applications to access time-critical data in real time or
>> near real time, where real-time means a range from milliseconds to a
>> few seconds after the data creation, and near real time is a
>> predetermined delay for expected data delivery."
>>
>> #AccessUptoDate
>> ===============
>>
>> I think this sentence: "The international date format is recommended
>> to avoid any ambiguity <a
>> href="https://www.w3.org/International/questions/qa-date-format">https://www.w3.org/International/questions/qa-date-format</a>."
>>
>>
>> Would be better as:
>>
>> "Datestamps should be formatted using the XML Schema <a
>> href="/TR/xmlschema11-2/#dateTimeStamp">dateTimeStamp</a> datatype
>> [[xmlschema11-2]]."
>>
>> Although I note that the NOAA example uses the horrible "Mar, 3rd 2016
>> at 9:03:07 pm PST" format which breaks this advice :-(
>>
>> #documentYourAPI
>> ================
>>
>> I'd write the intended outcome as:
>>
>> "Developers can obtain detailed information about each call to the
>> API, including the parameters it takes and what it is expected to
>> return."
>>
>> #documentYourAPI
>> ================
>>
>> This is very spatial, ideally we should have some non-spatial examples
>> as well. I can tell this came from Linda and Jeremy et al :-)
>>
>> #EvaluateCoverage
>> =================
>>
>> I'd phrase the intended outcome as
>>
>> "To enable data consumers to appreciate the coverage and external
>> dependencies of a given dataset."
>>
>> #Serialisation
>> ==============
>>
>> Intended outcome suggestion:
>>
>> To enable machines to process a dataset even if the original software
>> that was used to create it is no longer available or supported.
>>
>> More later
>>
>> Phil.
>>
>>
>>
>>
>
>

-- 


Phil Archer
W3C Data Activity Lead
http://www.w3.org/2013/data/

http://philarcher.org
+44 (0)7887 767755
@philarcher1

Received on Thursday, 21 April 2016 08:28:47 UTC