Re: Sensitive data text for enrichment section from Bernadette Farias Lóscio on 2016-05-13 (public-dwbp-wg@w3.org from May 2016)

From: Bernadette Farias Lóscio <bfl@cin.ufpe.br>
Date: Fri, 13 May 2016 09:42:38 -0300
To: Annette Greiner <amgreiner@lbl.gov>
Cc: Phil Archer <phila@w3.org>, Public DWBP WG <public-dwbp-wg@w3.org>
Message-ID: <CANx1Pzwez0J4Y7Kte59n3CHVtJC15yKNT=YLqxgBzKqBFbD2dg@mail.gmail.com>
Hi Phil and Annette,

Thanks a lot for your contributions!

cheers,
Berna

2016-05-07 17:05 GMT-03:00 Annette Greiner <amgreiner@lbl.gov>:

> Hi Phil,
> Thanks for letting me weigh in. I understand the connection you’re making
> here, and I think it’s a good thing to mention in the enrichment section.
> What I think is crucial but is not yet reflected in here is the issue of
> privacy breach arising from putting together disparate data that presents
> less risk separately. The second paragraph here is a good but more general
> discussion of security and privacy issues that strikes me as not belonging
> in this particular section. I would suggest instead addressing the more
> general issues in the introduction to our document. Most of the third
> paragraph would also be better in the document introduction, but the last
> sentence is relevant here. As I see it, the real issue with data enrichment
> is combining datasets that each hold so little information about any
> individual that they cannot be identified but that together offer enough
> information that they can be. I would suggest that here we just say,
>
> > Data enrichment refers to a set of processes that can be used to
> enhance, refine or otherwise improve raw or previously processed data. This
> idea and other similar concepts contribute to making data a valuable asset
> for almost any modern business or enterprise. It is a diverse topic in
> itself, details of which are beyond the scope of the current document.
> However, it is worth noting that some of these techniques should be
> approached with caution, as ethical concerns may arise. In scientific
> research, care must be taken to avoid enrichment that distorts results or
> statistical outcomes. For data about individuals, privacy issues may arise
> when combining datasets. That is, enriching one dataset with another, when
> neither contains sufficient information about any individual to identify
> them, may yield a combined dataset that compromises privacy. Furthermore,
> these techniqes can be carried out at scale, which in turn highlights the
> need for caution.
>
> Then, in the document introduction, I would suggest adding the following,
> after the paragraph that begins “In this context…”.
>
> > Not all data should be shared openly, however. Security, commercial
> sensitivity and, above all, individuals' privacy need to be taken into
> account. It is for data publishers, not a technical standards working
> group, to determine policy on which data should be shared and under what
> circumstances. Data sharing policies are likely to assess the exposure risk
> and determine the appropriate security measures to be taken to protect
> sensitive data, such as secure authentication and authorization.
> >
> > Depending on circumstances, sensitive information about individuals
> might include full name, home address, email address, national
> identification number, IP address, vehicle registration plate number,
> driver's license number, face, fingerprints, or handwriting, credit card
> numbers, digital identity, date of birth, birthplace, genetic information,
> telephone number, login name, screen name, nickname, health records etc.
> Although it is likely to be safe to share some of that information openly,
> and even more within a controlled environment, publishers should bear in
> mind that combining data from multiple sources may allow inadvertent
> identification of individuals.
>
> (I took out mention of https, as it will soon be everywhere, which would
> make our doc out of date.)
>
> Also, I noticed a grammatical error in the implementation section of BP
> 31. (Subject-verb agreement is off.) It should read "Techniques for data
> enrichment are complex and go well beyond the scope of this document, which
> can only highlight the possibilities."
> -Annette
>
> > On May 6, 2016, at 7:50 AM, Phil Archer <phila@w3.org> wrote:
> >
> > Berna,
> >
> > As promised, I've copied the text from the sensitive data section and
> merged some of it with the data enrichment intro to end up with this as a
> suggestion.
> >
> > @Annette - we resolved to do this and move the BP about data
> unavailability to the data access section. Do you agree with this?
> >
> > ===Begins==
> >
> > Data enrichment refers to a set of processes that can be used to
> enhance, refine or otherwise improve raw or previously processed data. This
> idea and other similar concepts contribute to making data a valuable asset
> for almost any modern business or enterprise. It is a diverse topic in
> itself, details of which are beyond the scope of the current document.
> However, it is worth noting that techniques exist to carry out such
> enrichment at scale which in turn highlights the need for caution.
> >
> > Not all data should be shared openly. Security, commercial sensitivity
> and, above all, individuals' privacy need to be taken into account. It is
> for data publishers, not a technical standards working group, to determine
> policy on which data should be shared and under what circumstances. Data
> sharing policies are likely to assess the exposure risk and determine the
> appropriate security measures to be taken to protect sensitive data, such
> as secure authentication and use of HTTPS.
> >
> > Depending on circumstance, sensitive information about individuals might
> include: full name, home address, email address, national identification
> number, IP address, vehicle registration plate number, driver's license
> number, face, fingerprints, or handwriting, credit card numbers, digital
> identity, date of birth, birthplace, genetic information, telephone number,
> login name, screen name, nickname, health records etc. Although it is
> likely to be safe to share some of that information openly, and even more
> within a controlled environment, publishers should bear in mind that data
> enrichment techniques may allow some elements to be discovered and linked
> from elsewhere.
> >
> > Notwithstanding that caution, data enrichment offers exciting
> possibilities for both data publishers and consumers.
> >
> >
> >
> > == ends==
> >
> > --
> >
> >
> > Phil Archer
> > W3C Data Activity Lead
> > http://www.w3.org/2013/data/
> >
> > http://philarcher.org
> > +44 (0)7887 767755
> > @philarcher1
>
>


-- 
Bernadette Farias Lóscio
Centro de Informática
Universidade Federal de Pernambuco - UFPE, Brazil
----------------------------------------------------------------------------
Received on Friday, 13 May 2016 12:43:30 UTC