Comments on BP document

Dear DWBP WG,

Congratulations for the great work done!

I would like to contribute a couple of comments on the BP document, 
concerning data versioning and access. I don't know if they can be 
addressed at this stage, but I thought they may be worth to be mentioned 
- at least to know the position of the WG.

Thanks!

Andrea

----

1. Data versioning

The issue is about a specific metadata field, namely the date of last 
modification of a dataset (dct:modified in DCAT).

This field conveys useful information for end users - e.g., they can 
check whether the data are actually recent enough for their purposes - 
and it is sometimes considered more important than the dataset issue date.

Going through the BP doc, I realised dct:modified occurs just in one of 
the examples (#4), and it is not included in BP2 in the list of 
recommended fields for datasets and distributions. There's actually 
another field (dct:accrualPeriodicity) that is referred to from the data 
versioning section, as a way to inform end users about the data update 
frequency. Nonetheless, the two fields are not mutually exclusive, and 
dct:accrualPeriodicity cannot replace dct:modified when the update 
frequency is "irregular" or "unknown".

May I ask which is the position of the WG on this issue?


2. Data access

There's a scenario that I'm not sure it is addressed, at least 
explicitly. This concerns data that, to be accessed, require users to 
register. This is different from data that can be accessed only by 
authorised users. It's basically just about preventing data from being 
anonymously accessed, because, for some data providers, it is important 
to know who downloads / uses the data.

This is quite common for research data, but there are also quite a few 
examples from the public sector.

A first issue here is that, usually, this compulsory registration does 
not result in clear benefits from the end users' side, who may be 
reasonably concerned to provide personal information - that, in many 
cases, is not limited to your email address, but you're also asked to 
say which is you real name, the organisation you're working for, etc.

To address this, a recommendation to data providers could be: if you 
require users to register / authenticate to get to the data, you should 
explain (a) why, (b) how their personal information will be used, and 
(c) which are the benefits (if any) they can get (e.g., they will be 
allowed to submit feedback, they will be updated about data they're 
interesting in).

I think this could be addressed by extending BP22 accordingly ("Provide 
an explanation for data that is not available").

Another issue is that, although these data are open to everyone, the 
need to authenticate creates a barrier to machine-based data access. 
This can be addressed by supporting Web-based authentication / 
authorisation protocols, but this is usually not the case.

Of course, this applies as well to data subject to access control.

Maybe, BP23 ("Make data available through an API") could be extended  to 
mention that, whenever direct data access is prevented, data providers 
should support standard authentication / authorisation APIs.

----

Received on Tuesday, 31 May 2016 09:19:34 UTC