Re: [w3ctag/design-reviews] RDF Canonicalization (Issue #855) from Phil Archer on 2023-08-31 (public-webapps-github@w3.org from August 2023)

From: Phil Archer <notifications@github.com>
Date: Thu, 31 Aug 2023 02:08:46 -0700
To: w3ctag/design-reviews <design-reviews@noreply.github.com>
Cc: Subscribed <subscribed@noreply.github.com>
Message-ID: <w3ctag/design-reviews/issues/855/1700655692@github.com>

@rhiaro, @hadleybeeman
We have added text related to potential for hash algorithms to be shown to be insecure Markus's addition above can be seen as a short section at https://www.w3.org/TR/rdf-canon/#insecure-hash-algorithms). A further addition concerning use of alternative has mechanisms is in preparation (https://github.com/w3c/rdf-canon/pull/161).

Meanwhile, we have been through the P&S questionnaire and offer the following responses.

As an overall comment, RDF Dataset Canonicalization takes an RDF dataset as input and returns a different form of the same dataset as output (unless the input is already canonicalized - the process is idempotent). The questionnaire is well-suited to highlighting potential security and privacy issues with Web applications running in browsers. As our specification only specifies an algorithm for handling data, many of the questions don’t apply to our work.

Implementations may interact with the Web, of course, but such interactions are not specified in the document and are therefore out of scope. That said, the privacy and security considerations sections of the document highlight issues of which any implementation should be aware.

*2.1 What information might this feature expose to Web sites or other parties, and for what purposes is that exposure necessary?*

The document defines an algorithm that canonicalizes an RDF dataset. It does not introduce or remove any information from the dataset, and does not expose any new information.

*2.2 Do features in your specification expose the minimum amount of information necessary to enable their intended uses?*

Yes. The specification defines an algorithm that canonicalizes whatever data is given. The output from the algorithm includes canonicalized identifiers for blank nodes that are produced from the input. New information that wasn’t in the dataset being processed isn’t introduced.

*2.3 How do the features in your specification deal with personal information, personally-identifiable information (PII), or information derived from them?*

The algorithm canonicalizes any data given to it. Decisions on handling personally identifiable information are up to the application. Therefore these issues, while obviously important, are out of scope for the draft standard.

*2.4 How do the features in your specification deal with sensitive information?*

See previous answer. Data is only used internally within the application. How any sensitive data is handled is up to the implementation.

*2.5 Do the features in your specification introduce new state for an origin that persists across browsing sessions?*

No.

*2.6 Do the features in your specification expose information about the underlying platform to origins?*

No.

*2.7 Does this specification allow an origin to send data to the underlying platform?*

No.

*2.8 Do features in this specification enable access to device sensors?*

No.

*2.9 Do features in this specification enable new script execution/loading mechanisms?*

No.

*2.10 Do features in this specification allow an origin to access other devices?*

No.

*2.11 Do features in this specification allow an origin some measure of control over a user agent’s native UI?*

No.

*2.12 What temporary identifiers do the features in this specification create or expose to the web?*

None. While the specification defines an algorithm that transforms identifiers, the algorithm itself does not expose these to the web. It is up to the application that uses the algorithm to decide whether or how to expose any output from the algorithm.

*2.13 How does this specification distinguish between behavior in first-party and third-party contexts?*

It does not. The specification defines a canonicalization algorithm that internally rearranges input data to output data. It is up to the application to feed data into the algorithm and use whatever its outputs are.

*2.14 How do the features in this specification work in the context of a browser’s Private Browsing or Incognito mode?*

This is out of scope. The specification defines an algorithm that can be run in whatever context the application decides to run it in and the algorithm only rearranges input data into a canonical form. Whether the application runs in a browser at all is not defined by this spec.

*2.15 Does this specification have both "Security Considerations" and "Privacy Considerations" sections?*

Yes. [Privacy considerations](https://www.w3.org/TR/rdf-canon/#privacy-considerations). [Security considerations](https://www.w3.org/TR/rdf-canon/#security-considerations).

*2.16 Do features in your specification enable origins to downgrade default security protections?*

No.

*2.17 How does your feature handle non-"fully active" documents?*

It does not, this is out of scope for a canonicalization algorithm. The canonicalization algorithm works on RDF datasets which are unrelated to non-”fully active” documents.

*2.18 What should this questionnaire have asked?*

As noted in the preamble, the questionnaire focuses on browsers/Web apps. It does not target the needs of data representation formats, so it is not particularly useful for a whole category of specifications. This might be useful feedback for the privacy group in the long term to add questions to cover more specifications.

--
Reply to this email directly or view it on GitHub:
https://github.com/w3ctag/design-reviews/issues/855#issuecomment-1700655692
You are receiving this because you are subscribed to this thread.

Message ID: <w3ctag/design-reviews/issues/855/1700655692@github.com>

Received on Thursday, 31 August 2023 09:08:54 UTC