Re: SPARQL Protocol Privacy section from Rigo Wenning on 2005-11-21 (public-rdf-dawg-comments@w3.org from November 2005)

From: Rigo Wenning <rigo@w3.org>
Date: Mon, 21 Nov 2005 15:49:03 +0100
To: Eric Prud'hommeaux <eric@w3.org>
Cc: Kendall Clark <kendall@monkeyfist.com>, "'public-p3p-spec'" <public-p3p-spec@w3.org>, public-rdf-dawg-comments@w3.org
Message-Id: <200511211549.04864.rigo@w3.org>
Am Wednesday 16 November 2005 13:30 verlautbarte Eric Prud'hommeaux :
> He suggested that Rigo should review the Privacy section. 
>   http://www.w3.org/2001/sw/DataAccess/proto-wd/#policy-privacy
>
> I propose that Thomas and Rigo send any suggested changes to the
> comments list.
>   mailto:public-rdf-dawg-comments@w3.org

My comments on Privacy: 

Dear all, I've discussed a bit with Eric Prud'hommeaux and found the 
following things interesting and worth a remark to the Group.

I think the current section is rather a template text that can be found 
in a lot of places but actually does not carry a lot of meaning while 
having huge hidden requirements. What would it mean to have every 
SPARQL request or engine to comply with the Directive 95/46/EC of 24 
October 1995 on the protection of individuals with regard to the 
processing of personal data and on the free movement of such data or 
the Directive 2002/58/EC of 12 July 2002 concerning the processing of 
personal data and the protection of privacy in the electronic 
communications sector? There are a lot of rules ... The conformance of 
a SPARQL engine would depend on such rules.

On the other hand, the real issues are not addressed by the current 
text. Most queries coming from natural persons that are identifiable 
(via IP or other IDs) will be personal data and thus should be treated 
with care. On the other hand, data about companies is only protected in 
Italy by data protection laws and shouldn't trigger too much data 
protection/privacy attention and is dealt with in the 
security/confidentiality area.

Now if the query is sent over the wire by the individual, imagine for a 
moment someone looking for information about aids or other medical 
information. This is highly sensitive information that floats over all 
those hubs and routers etc.

Another example shown by Eric to me has three parties involved: 1/ the 
requester, 2/ some SPARQL-service and 3/ some content/RDF repository. 
In this case, it might be a question of privacy, whether the personal 
data (e.g. login name) of the requester 1/ is passed on by 2/ to the 
repository. Normally in those setups, 2/ and 3/ have some business 
agreement. This means that 3/ does normally not need to know about the 
identity of 1/ to fulfil the request. Such a scenario could be very 
privacy enhancing as an individual would be perhaps even able to access 
the repository 3/ by two different SPARQL-services thus making tracking 
harder.

All those considerations lead to the following suggestion of a paragraph 
into the SPARQL Protocol Specification: 

<header>Privacy Considerations</header>

Query strings and URIs attached to it can reveal very sensitive 
information. If this sensitive information is linked or linkable to a 
company, we normally speak about security. If it is linked or linkable 
to a person, we talk about privacy. This section gives recommendations 
and hints how to treat the latter. Cases where personal sensitive data 
might appear in SPARQL queries over the SPARQL Protocol should require 
special attention. 

If a setup concerns mostly consumers and natural persons, the personal 
information should be in some way protected. This can be achieved using 
SSL. But not every setup is so sensitive that it needs the burden of 
the full encryption engine. Nevertheless, in cases involving personal 
data, this personal data should be obfuscated in some way in the query 
string. This could be done by using some known technics like base-64 or 
Rot13. It might also be possible to use must-understand symmetric 
encryption. 

In cases of more than two parties involved and if the party making the 
request is a consumer or a natural person (on its own behalf), the 
party that receives the first request MUST NOT pass personal data on 
two subsequent services UNLESS this data is necessary for the 
completion of the request.

Personal might be found in query-strings but also in URIs that are sent 
with the query. Personal data as understood here is every information 
that is linked or reasonably linkable to a natural person.

In case, the query also serves as a point of data collection, the 
description of data handling practices via P3P is recommended. The P3P 
generic attribute can be used in Schemata to link a data handling 
policy (P3P Policy) to a certain XML-element. See the [P3P generic 
attribute 1] for more information.

1.http://www.w3.org/TR/P3P11/#generic_attribute


Best, 

-- 
Rigo Wenning            W3C/ERCIM
Staff Counsel           Privacy Activity Lead
mail:rigo@w3.org        2004, Routes des Lucioles
http://www.w3.org/      F-06902 Sophia Antipolis
Received on Monday, 21 November 2005 14:49:24 UTC