Re: Proposals for Annette's comments to be considered before publishing the last working draft from Annette Greiner on 2016-04-26 (public-dwbp-wg@w3.org from April 2016)

From: Annette Greiner <amgreiner@lbl.gov>
Date: Tue, 26 Apr 2016 16:30:35 -0700
To: Bernadette Farias Lóscio <bfl@cin.ufpe.br>
Cc: "public-dwbp-wg@w3.org" <public-dwbp-wg@w3.org>
Message-ID: <571FFA1B.4070401@lbl.gov>
On 4/26/16 2:48 PM, Bernadette Farias Lóscio wrote:
> Hi Annette,
>
> Thanks for your message and your efforts too :) I have just few comments.
>
> cheers,
> Berna
>
>>     23 (Introduction):
>>
>>     Phil made the native-speaker review. Phenomenon was removed. We
>>     propose to keep the examples [1].
>     We need to use examples that are examples of the thing we are
>     talking about, which is the expansion of the Web as a medium for
>     the exchange of data. These examples don't represent use of the
>     web per se, though they are things that could drive more usage of
>     the web, if people decided to do that. The worst offender in this
>     regard is "the provision of important cultural heritage
>     collections". Important cultural heritage collections have been
>     around for millennia. That only works as an example if it refers
>     to putting those collections on the web.
>
>
> --> If we say "...the Web as a medium for data sharing." rather than 
> "the Web as a medium for the exchange of data." Would it be ok?
>
No, sorry, but that doesn't address my concern. What is missing in the 
examples is a connection to the web.
>>
>> 27 (Context): Eric helped us to rewrite the diagram description:
>>
>> The following is a composite diagram illustrating the anatomy of a 
>> published and acessible Web dataset. Data values correspond to the 
>> data itself and may be available in one or more distributions, which 
>> should be defined by the publisher considering data consumer's 
>> expectations. The Metadata component corresponds to the additional 
>> information that describes the dataset and dataset distributions, 
>> helping consumers manipulate and reuse the data. In order to allow 
>> easy access to the dataset and its corresponding distributions, 
>> multiple dataset access mechanisms can be available. Finally, to 
>> promote the interoperability among datasets it is important to adopt 
>> data vocabularies and standards.
>>
> Eric's description is very helpful in understanding the right side of 
> the figure, and I think the right-hand side is helpful, but the 
> left-hand side is still not working for me.  The colored rectangles 
> are very abstract concepts, and representing them in this way doesn't 
> make them less abstract. Also, if you inserted the details of the 
> distributions into the dataset, you would have metadata represented at 
> two different levels. It's not clear to me why that choice was made, 
> but it seems to suggest that there is metadata for the dataset that 
> isn't to be included in the distributions. It also appears that the 
> concept of a dataset only exists before it is distributed. Is the left 
> side about storage of the data? If so, then the colored rectangles 
> make little sense being there. I think the goal of the diagram was to 
> explain the relationship between datasets, distributions, data, and 
> metadata. If it concentrated on those elements, it would be more useful.
>
> --> ok! I'm gonna try to redraw the diagram.
>
>> Machine-readable: A format in a standard computer language (not 
>> natural language text) that can be read automatically by a computer 
>> system. Traditional word processing documents and portable document 
>> format (PDF) files are easily read by humans but typically are 
>> difficult for machines to interpret. Formats such as XML, JSON, 
>> NetCDF, RDF or spreadsheets with header columns that can be exported 
>> as CSV are machine readable formats.
>>
>> This definition of machine-readable was proposed by Phil and it is 
>> from [2].
>>
> I disagree with the word "language" here, as a computer language 
> usually refers to a programming language, like C++ or Java.
>
> How about
> "Machine-readable data: Data in a standard format that can be read and 
> processed automatically by a computing system. Traditional word 
> processing documents and portable document format (PDF) files are 
> easily read by humans but typically are difficult for machines to 
> interpret and manipulate. Formats such as XML, JSON, HDF5, RDF and CSV 
> are machine-readable data formats."
>
> --> I understand your point, but I'm not sure if we should change the 
> definition and still make a reference to it.
"adapted from [1]"
>
>> 69 (license):
>>
>> Could you contact Renato Ianella? Do you have any updates about this 
>> comment?
>>
> I think I understand what Renato is after. He is pointing out that for 
> ODRL, they pretty much avoided using the word "license" altogether. 
> For the verb, they use "grantUse" (though, I don't think we have the 
> option of using that term in our text, since it's not in standard 
> English in any side of the Atlantic), and for the noun they use 
> "agreement". I'm sure there are many (of the 66) places in our text 
> where "agreement" would work. We could read through and look for 
> opportunities to substitute "agreement" for the noun "license". We 
> would still have to use "license" for the verb and for the noun in 
> places where "agreement" didn't provide enough context.
>
> --> I think we should keep using license rather than changing to 
> agreement.
fine with me
>
> The comment that I was referring to is the following:
> "We say "Data license information can be provided as a link to a 
> human-readable license or as a link/embedded machine-readable 
> license." Since licensing info is part of metadata, and we tell people 
> to provide metadata for both humans and machines, we should also 
> require licensing info for both humans and machines"
>
> We discussed this comment in one of our skype meetings and the idea of 
> having "link/embedded machine-readable license" was not clear for you.
>
> I have a proposal:
>
> Data license information can be provided as a link to a human-readable 
> license or to a machine-readable license, as well as an embedded 
> machine-readable license.
My point was that they should make the license info available to humans 
and to machines, not just one or the other. How about:
"Data license information should be available via a link to, or embedded 
copy of, a human-readable license agreement. It should also be made 
available for processing via a link to, or embedded copy of, a 
machine-readable license agreement."
> --------------------------
>
> ----------------------------------------------------------------------------
> Bernadette Farias Lóscio
> Centro de Informática
> Universidade Federal de Pernambuco - UFPE, Brazil
> ----------------------------------------------------------------------------
>
> -- 
> Annette Greiner
> NERSC Data and Analytics Services
> Lawrence Berkeley National Laboratory
>
>
>
>
> -- 
> Bernadette Farias Lóscio
> Centro de Informática
> Universidade Federal de Pernambuco - UFPE, Brazil
> ----------------------------------------------------------------------------

-- 
Annette Greiner
NERSC Data and Analytics Services
Lawrence Berkeley National Laboratory
Received on Tuesday, 26 April 2016 23:31:06 UTC