- From: Phil Archer <phila@w3.org>
- Date: Thu, 21 Apr 2016 09:28:32 +0100
- To: Antoine Isaac <aisaac@few.vu.nl>, public-dwbp-wg@w3.org
Thanks Antoine, I see that you and Annette have been very active in making comments. Having spent a lot of time on the doc in the last week I must focus on other stuff today. The editors are working on a list of issues arising from all the comments and Dee will guide us through all that tomorrow. Cheers for now Phil. On 21/04/2016 09:18, Antoine Isaac wrote: > Hi Phil, > > Regarding your suggestions about the BP vocabularies. > I'm ok with them, but I won't be able to handle the dependencies with > Annette's comments on the same parts [1,2] and other changes on the > intended outcome that we had suggested but are not yet implemented [3]. > This is quite a mess... > > Antoine > > [1] https://lists.w3.org/Archives/Public/public-dwbp-wg/2016Apr/0135.html > [2] https://lists.w3.org/Archives/Public/public-dwbp-wg/2016Apr/0136.html > [3] https://lists.w3.org/Archives/Public/public-dwbp-wg/2016Apr/0056.html > > On 15/04/16 09:32, Phil Archer wrote: >> As flagged, I've been working on my native speaker review of the doc >> and, in doing so, have been paying very close attention to the text. >> This leads me to make a number of comments that go beyond simple >> native speaker edits and are there ones that should be assessed like >> any other comment. >> >> My review begins at the Data Formats section. >> >> #MachineReadableStandardizedFormat >> =================================== >> >> There is no definition of 'machine readable', or of proprietary >> software. "computational tools typically available in the relevant >> domain" will surely include .docx and .xlsx, for example. >> >> I looked at the Wikipedia page which links to a doc from the US >> government https://en.wikipedia.org/wiki/Machine-readable_data. from >> that I suggest the following: >> >> >> <p>There is an important distinction between formats that can be read >> and edited by humans using a computer and formats that are <em>machine >> readable</em>. The latter term implies that the data is readily >> extracted, transformed and processed by a computer. The following >> definition of machine readable is based on that provided by the US >> Office of Management and Budget's definition in their Preparation and >> Submission of Strategic Plans, Annual Performance Plans, and Annual >> Program Performance Reports [[OMB-A11]]</p> >> <p><strong>Machine readable</strong>: A format in a standard computer >> language (not natural language text) that can be read automatically by >> a computer system. Traditional word processing documents and portable >> document format (PDF) files are easily read by humans but typically >> are difficult for machines to interpret. Formats such as XML, JSON, >> NetCDF, RDF or spreadsheets with header columns that can be exported >> as CSV are machine readable formats.</p> >> >> >> Biblio entry >> >> "OMB-A11": { >> "title": "Preparation and Submission of Strategic Plans, >> Annual Performance Plans, and Annual Program Performance Reports", >> "href":"https://www.whitehouse.gov/sites/default/files/omb/assets/a11_current_year/s200.pdf", >> >> "date": "2015", >> "publisher":"Office of Management and Budget (OMB)", >> "id":"OMB Circular A-11" >> >> #MultipleFormats >> ================ >> >> Suggest that the intended outcome could be worded along the lines of: >> >> "As many users as possible will be able to use the data without first >> having to transform it into their preferred format." >> >> I have many similar comments on intended outcomes. I think they >> should be statements of the specific benefit that is gained, so "to >> enable X" rather than "Doing X will enable Y." >> >> >> I very much dislike the word 'intended' in the sentence: "Consider the >> data formats most likely to be needed by intended users, and consider >> alternatives that are likely to be useful in the future." The idea of >> making data on the WEb is that it's up to the user to decide that >> he/she intends to do with it, not the publisher. >> >> Suggest simply making it "Consider the data formats most likely to be >> needed and consider alternatives that are likely to be useful in the >> future. >> >> >> #MetadataStandardized >> ===================== >> >> Suggest rewording the intended outcome >> >> Currently: >> Standardized code lists and other commonly used terms will enhance >> interoperability and consensus among data publishers and consumers. >> >> Could be: >> Enhanced interoperability and consensus among data publishers and >> consumers. >> >> #ReuseVocabularies >> ================== >> Again, the intended outcome could be worded more succinctly I think. >> >> "Using the same vocabulary to describe metadata will make datasets and >> metadata sets easier to be compared by humans or machines. When two >> datasets or metadata sets use the same vocabulary, (automatic) >> processing tools designed for one can be more easily applied to the >> other. This greatly facilitates re-use of datasets" >> >> could be simply >> >> To make datasets and metadata easier to compare and integrate by >> humans or machines. >> >> (I added 'and integrate', which I personally think is important but >> this is more than an editorial change). >> >> >> #ChooseRightFormalizationLevel >> ============================== >> >> I would word the intended outcome as: >> >> The data supports a wide range of application cases but is not more >> complex to produce and reuse than necessary, or, to paraphrase Albert >> Einstein, "Everything should be made as simple as possible, but no >> simpler." >> >> The Einstein line is often quoted but, like so many quotations, is >> probably a misquote. >> >> And I'd say that the how to test line would be improved by using the >> word 'typical' rather than target: >> >> For formal knowledge representation languages, applying an inference >> engine on top of the data that uses a given vocabulary does not >> produce too many statements that are unnecessary for typical >> applications. >> >> #Sensitive >> ========== >> >> I'd word the intended outcome as: >> >> "To enable data consumers to know that data that is referred to from >> the current dataset is unavailable or only available under different >> conditions." >> >> I changed the reference to HTTP status code 404 to 303 (see other) >> when doing the native speaker review. I *really* don't want us to >> include deliberate 404s as a Best Practice :-( >> >> #BulkAccess >> =========== >> >> I don't think this should only refer to cases where data is spread >> across multiple locations. I think it shoujld also cover the simple >> case of making a file available, as opposed to only providing an API. >> This is in addition to, not instead of what is written about multiple >> locations - which I think is very good. >> >> I'd phrase the intended outcome as: >> >> "Bulk download enables developers to access the complete dataset for >> local processing without the need for further calls to the Web." >> >> #ProvideSubsets >> =============== >> >> The intended outcome section is too long IMO. All the content is >> valid, I just think some of it could be moved to the Why section. >> >> Really not sure about include an example of making a set of PDFs >> available. >> >> >> #Conneg >> ======= >> >> In tidying up the language of this BP I pretty much rewrote it. I hope >> without changing your meaning significantly. >> >> I suggest the intended outcome could be phrased as: "To enable >> different representations of the same resource to be served fromt he >> same URI according to the request made by the client." >> >> #AccessRealTime >> =============== >> >> I would word the intended outcome as: >> >> "To enable applications to access time-critical data in real time or >> near real time, where real-time means a range from milliseconds to a >> few seconds after the data creation, and near real time is a >> predetermined delay for expected data delivery." >> >> #AccessUptoDate >> =============== >> >> I think this sentence: "The international date format is recommended >> to avoid any ambiguity <a >> href="https://www.w3.org/International/questions/qa-date-format">https://www.w3.org/International/questions/qa-date-format</a>." >> >> >> Would be better as: >> >> "Datestamps should be formatted using the XML Schema <a >> href="/TR/xmlschema11-2/#dateTimeStamp">dateTimeStamp</a> datatype >> [[xmlschema11-2]]." >> >> Although I note that the NOAA example uses the horrible "Mar, 3rd 2016 >> at 9:03:07 pm PST" format which breaks this advice :-( >> >> #documentYourAPI >> ================ >> >> I'd write the intended outcome as: >> >> "Developers can obtain detailed information about each call to the >> API, including the parameters it takes and what it is expected to >> return." >> >> #documentYourAPI >> ================ >> >> This is very spatial, ideally we should have some non-spatial examples >> as well. I can tell this came from Linda and Jeremy et al :-) >> >> #EvaluateCoverage >> ================= >> >> I'd phrase the intended outcome as >> >> "To enable data consumers to appreciate the coverage and external >> dependencies of a given dataset." >> >> #Serialisation >> ============== >> >> Intended outcome suggestion: >> >> To enable machines to process a dataset even if the original software >> that was used to create it is no longer available or supported. >> >> More later >> >> Phil. >> >> >> >> > > -- Phil Archer W3C Data Activity Lead http://www.w3.org/2013/data/ http://philarcher.org +44 (0)7887 767755 @philarcher1
Received on Thursday, 21 April 2016 08:28:47 UTC