Re: CR: Tracking Preference Expression (DNT) from timeless on 2015-08-26 (public-tracking-comments@w3.org from August 2015)

From: timeless <timeless@gmail.com>
Date: Tue, 25 Aug 2015 21:24:28 -0400
To: public-tracking-comments@w3.org
Message-ID: <CACsW8eGWS3KytCiSFGNS4A-W1CdrHb4EcN30HRbcHH2btHsCog@mail.gmail.com>
:Review of http://www.w3.org/TR/2015/CR-tracking-dnt-20150820/

> A Web page is often composed of many information sources beyond the initial resource request,

Web alone shouldn't be capitalized.

> Confirms that there exists in the database a web-wide exception for a specific site.

"web" isn't capitalized here.

WebIDL [2] has:
> This document defines an interface definition language, Web IDL, that can be used to describe interfaces that are intended to be implemented in web browsers.

IndieUI [3] has:
> This provides an intermediate layer between device- and modality-specific user interaction events, and the basic user interface functionality used by web applications.

> A user is a natural person who is making, or has made, use of the Web.

this is an exceptional case where it /might/ be ok to capitalize "the
Web". I'd actually suggest it isn't really necessary here. It's a
different story if you want to write "The Web Platform" or something
similar... If you're really interested in capitalizing Web, you can
write "the World Wide Web" ....

That said. Suppose that HitchBOT [4] wants to use the web, and has
some privacy preferences. (As it happens, HitchBOT was fairly public,
but perhaps a slightly more paranoid version might choose to not want
to be targeted by "personal defense" advertising....)

> A party is a natural person, a legal entity, or a set of legal entities that share common owner(s), common controller(s),
> and a group identity that is easily discoverable by a user.

If there's a group that isn't easily discoverable, what happens?

> Common branding or providing a list of affiliates that is available via a link from a resource where a party describes DNT practices are examples of ways to provide this discoverability.

> Each of those parties is considered a first party if a user would reasonably expect to communicate with all of them when accessing that resource.

all => each

> For any data collected as a result of one or more network interactions resulting from a user's action,

This is a fairly long and unwieldy sentence, could you split it into
multiple sentences, as in:

When a user intentionally makes an action, the intended recipient of
that action is the first party (or parties).
A third party is any party that is neither the user, nor the first
party, nor any service provider acting on behalf of either of those
parties.
It is understood that data may be collected by the user (or any
agents), the first party (or any agents), and potentially by other
unidentified third parties.
While this collection may happen, it may be seen as undesirable by the user.

> Conformance criteria and considerations regarding error handling are defined in Section 2.5 of [RFC7230].

Why isn't this linked? (It was linked in the previous occurrence, and
[RFC7231] was linked at least twice prior to this point.)

> If an origin server has multiple, request-specific tracking policies, such that the tracking status might differ depending on some aspect of the request
> (e.g., method, target URI, header fields, data, etc.)

e.g. & etc. -> drop etc.

> indicates that data collected via the target resource might be used for tracking and that an applicable tracking status representation can be obtained by performing a retrieval request on

should this end with a `:` ?

> /.well-known/dnt/fRx42

as in:

> A tracking status resource space is defined by the following URI Template [RFC6570]:
> /.well-known/dnt/{+status-id}

> A site-wide tracking status resource provides information about the potential tracking behavior of resources located at that origin server. A site-wide tracking status resource has the well-known identifier
> /.well-known/dnt/

ibid

> The representation can be cached, as described in section 6.4.4 Caching.

can -> may (?)

> where the value of status-id is a string of URI-safe characters provided by a Tk field-value in response to a prior request.

could this please link back to earlier where those characters were defined?

> where the value of status-id is a string of URI-safe characters provided by a Tk field-value in response to a prior request.
> For example, a prior response containing

`:`

> Example 5
> Tk: ?;ahoy

> refers to the specific tracking status resource

`:`

> /.well-known/dnt/ahoy

> If the tracking status is applicable to all users, regardless of the received DNT-field-value or other data received via the request,
> then the origin server SHOULD mark the response as cacheable [RFC7234] and assign a time-to-live (expiration or max-use)
> that is sufficient to enable shared caching but not greater than the earliest point at which the service's tracking behavior might increase.

My, what a very long sentence.

Could you rewrite `service's tracking behavior might increase`?
Possibly the service might increase its tracking behavior?

> The following example representation demonstrates a status object with all of the properties defined by this specification.

`:`

> Instead, the tracking status provides the ability to identify a set of compliance regimes to which the server claims to comply, with the assumption being that each regime defines its own requirements on compliant behavior.
> For example, [TCS] is a work-in-progress that intends to define such a compliance regime.

regime isn't defined [5]

> However, there is no single standard for extension interfaces.

If this changes [6].

> The DNT header field is a mechanism for expressing the user's tracking preference in an HTTP request ([RFC7230]).

these ()s seem superfluous -- certainly not

> At most one DNT header field can be present in a valid request.

`can` isn't rfc speak

> A proxy MUST NOT generate a DNT header field unless it has been specifically installed or
> configured to do so by the user making the request and adheres to the above requirements
> as if it were a user agent.

a user => the user's user ?

> A tracking status value of U means that the request resulted in a potential change to the tracking status applicable to this user, user agent, or device.
U is bold+italic

> An origin server MUST NOT send U as a tracking status value anywhere other than a Tk header field that is in response to a state-changing request.
U is orange+tt

> An origin server is REQUIRED to send a Tk header field if its site-wide tracking status value is ? (dynamic) or G (gateway), or when an interactive change is made to the tracking status and indicated by U (updated).
U is underlined

> This indication of an interactive status change is accomplished by sending a Tk header field in the response with a tracking status value of U (updated).
U is orange+tt+underlined

> If the tracking status value indicates prior consent (C),
C is orange+tt+parenthesized

This is too many possible presentations. Especially this one is the
only case of parentheses outside the TOC and Section headings.

> SHOULD check responses to its state-changing requests (e.g., POST, PUT, DELETE, etc.) for a Tk header field with the U tracking status value,
U is underlined

I happen to dislike this style / I favor a style with orange+tt

> If the tracking status value is N, then the origin server claims that no tracking is performed for the designated resource for at least the next 24 hours or until the Cache-Control information indicates that this response expires.

instead of `at least`, how about `the longer of` ?

> If the tracking status value is not N,
> then the origin server claims that it might track the user agent for requests on the URI being checked for at least the next 24 hours or until the Cache-Control information indicates that this response expires.

This feels convoluted.

Imagine that the UA makes a request in +6 hours. Does the UA have any
indication that potentially after +30 hours, the records made in +6
hours will be discarded?

I think you want `requests made by the user agent` instead of `the
user agent for requests`. Although, I don't think that helps with my
question.

I think to address my question, you need to change `for` to `within`.

> The following principles guide the design of user-agent-managed exceptions.

`:` ?

> When asking for a site-specific exception,
> the top-level origin making the request might make some implicit or explicit claims as to the actions and behavior of its third parties;
> for this reason,
> it might want to establish exceptions for only those for which it is sure that those claims are true.

My what a long sentence.

> (Consider a site that has some trusted advertisers and analytics providers, along with some mashed-up content from less-trusted sites).

period should be inside parentheses

> To allow the user to see and possibly revoke stored exceptions;
and/or ?
> Other aspects of the exception mechanism, as desired.

> The top-level origin of the current browser context;
and ?
> The target of the request.

as in:
> the user has not yet made a choice for a specific preference; or,
> the user has chosen not to transmit a preference.

> The user agent adds to its local database

Does this document consider private browsing? (local database is only
used here, never defined)

> A pair of values A and X match if and only if one of the following is true:

How does matching handle case and IDN?

> User-agents MUST handle each API request as a 'unit',
> whether or not the user-agent has stored the exception immediately.
> The following principles guide the design of user-agent-managed exceptions.

These are the only instances of user-agent; there are ~75 "user agent"
instances in the document. I'd suggest you remove the dash from the
first of these two, and probably the third....

> A and X are the same string;
or ?
> A has the form '*.domain' and X is 'domain' or is of the form 'string.domain', where 'string' is any sequence of characters.

Normally you use double quotes around strings.

What if a user wants to say ["*", "advertiser"] but NOT ["site", "advertiser"] ?
This seems like something a user might want to do...

> User-agents MUST handle each API request as a 'unit',

Usually you use fancy quotes for non technical prose.

> Each separate call to an API is a separate unit.

trailing whitespace.

How does one envision this working?

Imagine I make 20 calls to this api, once for (a, b, c), once for (a,
c), once for (a, b), once for (b, c), once for (a), once for (b), once
for (c), once for (b, c, d), ....

If the user wants to remove something (e.g. b), surely they wouldn't
be forced to manually delete each exception record that mentions b
individually. I guess the UA would all records mentioning b. But would
that encourage sites to establish webs of records as I describe to
protect their tracking preferences?

> domain of type DOMString, nullablea
> cookie-domain as defined in [RFC6265], to which the exception applies.

IDN?

> explanationString of type DOMString, nullable
> A short explanation of the request.

There's no definition of short. May I store 1GB of essay text here?
(These days most computers have >4GB of ram, and perhaps 1TB of
storage, making the essay "short")

> siteName of type DOMString, nullable
> A user-readable string for the name of the top-level origin.

I look forward to people entering misleading values here...

> arrayOfDomainStrings of type sequence<DOMString>,
I don't think you want this trailing comma
> A JavaScript array of strings.

>    void storeSiteSpecificTrackingException (StoreSiteSpecificExceptionPropertyBag properties);
> If the list arrayOfDomainStrings is supplied, the user agent MAY choose to store a site-wide exception. If it does so it MUST indicate this in the return value.

What part of void exposes this?

> If permission is stored for an explicit list, then the set of duplets (one per target):
> [document-origin, target]
> is added to the database of remembered grants.

This doesn't match any of the processing model above (see removing
`b`), nor does it fit with expiry.

> If domain is supplied and not empty then it is treated in the same way as the domain parameter to cookies and allows setting for subdomains.

comma before then

> The domain argument can be set to fully-qualified right-hand segment of the document host name, up to one level below TLD.

> For example, www.foo.bar.example.com can set the domain parameter as as "bar.example.com" or "example.com", but not to "something.else.example.com" or "com".

not/or => neither/nor

the example is www.foo.bar.example.com, but what about `example.com`?
that would be able to set for `com`, which seems problematic.

There's no reference to https://publicsuffix.org/

> A particular response to the API — like a DNT response header field — is only valid immediately; a user might later choose to edit stored exceptions and revoke some or all of them.

what's a response here? the lack of an exception?
keeping in mind that the function must return immediately:

> When called, storeSiteSpecificTrackingException MUST return immediately.

> If expires is supplied and not null or empty the remembered grant will be cancelled

-eled

> (i.e. processed as if the relevant Cancel API had been called)
> no later than the specified date and time.

what if the expires field fails to parse?

> After this the database of remembered grants will no longer contain any duplets for which the first part is the current document origin;
> i.e., no duplets [document-origin, target] for any target.

Afaict, if i do:
storeSiteSpecificTrackingException({arrayOfDomainStrings:["first.marketer",
"marketer.second"]})
storeSiteSpecificTrackingException({arrayOfDomainStrings:["first.marketer",
"marketer.third"], expires: tomorrowAsProperString() })
when tomorrow comes, the first set with marketer.second will be wiped
out. That doesn't match my understanding from earlier with "b".

> There is no callback.

There aren't any callbacks anywhere that I can see, why mention it here?

>    void removeSiteSpecificTrackingException (RemoveExceptionPropertyBag properties);
> If domain is supplied and is not empty then this ensures that the database of remembered grants no longer contains any duplets for which the first part is the domain wildcard; i.e., no duplets [*.domain, target] for any target.
> domain of type DOMString, nullable
> a cookie-domain as defined in [RFC6265], to which the exception applies.

storeSiteSpecificTrackingException had complicated prose about domain
being related to the actual page's domain. It feels like evil.com and
call removeSiteSpecificTrackingException("example.com").

> if some kind of processing error occurred then an appropriate exception will be thrown.

will -> should ?

>    boolean confirmSiteSpecificTrackingException (ConfirmSiteSpecificExceptionPropertyBag properties);

> If the domain argument is not supplied or is null or empty then the execution of this API uses the 'implicit' parameter, when the API is called, the document origin. This forms the first part of the duplet in the logical model.

the following isn't nested under the preceding:
> If the user agent stores explicit lists, and the call includes one, the database is checked for the existence of all the duplets (one per target):
> [document-origin, target]

As written, it discards the domain argument.

compare:
> If the user agent stores explicit lists, the call includes one, and the domain argument is provided and is not empty, then the database is checked for the existence of all the duplets (one per target):

Note that here, it feels like "evil.com" can ask about privacy
preferences for "example.com"

> true all the duplets exist in the database;
or ?
> false one or more of the duplets does not exist in the database.


> An API is provided so that a site might obtain such a web-wide exception from the user.

"web-wide" is not a common phrase [7] -- the hits are all poor.
"Global" is a much more normative phrase.

If this API hasn't been heavily adopted, I'd request that WebWide not
be used in the API.


> The properties of the StoreExceptionPropertyBag dictionary are as described above in the request for site-specific exceptions.

change `above` (unhelpful) to `storeSiteSpecificTrackingException` or
`7.4 Site-specific Exceptions`

in doing so, you can remove the duplicated text...

> The single duplet [ * , document-origin] or [ * , *.domain] (based on if domain is provided and is not null and not empty) is added to the database of remembered grants. The properties of the StoreExceptionPropertyBag dictionary are as described above in the request for site-specific exceptions.

Give `the properties` its own paragraph.

> Ensures that the database of remembered grants no longer contains the duplet [ * , document-origin] or [ * , *.domain] (based on if domain is provided and is not null and not empty). There is no callback. After the call has been made, the indicated pair is assured not to be in the database. The same matching process defined for determining which header field to send is also used to detect which entry (if any) to remove from the database.

It feels like you lost some paragraphs here.

>    boolean confirmWebWideTrackingException (ConfirmExceptionPropertyBag properties);

can evil.com ask about "example.com" here?

> an icon in the status bar indicating that an exception has been stored, which, when clicked on, gives the user more information about the exception and an option to revoke such an exception.
> an infobar stating "Example News (news.example.com) has indicated to Browser that you have consented to granting it exceptions to your general Do Not Track preference. If you believe this is incorrect, click Revoke."
> no UI at all.

Capitalize sentences?

> Some user agents might choose to provide ambient notice that user-opted tracking is ongoing, or easy access to view and control these preferences.

This reminds me of P3P.

What I'd like to see is something encouraging crowd(friend)sourcing
rejections/requests of privacy requests. If you're going to suggest
things, perhaps suggesting surfacing something like this....

> To obtain an exception, a document (page, frame, etc.) that loads the Javascript [sp] is needed.

> If there is a problem with the calling parameters, then a Javascript exception will be raised.

trailing whitespace;
The claim is wrong, since there was no promise about expires
triggering an exception.

> Even though the site has acquired the user's informed consent before calling the 'Store' API,

I'm not a fan of the quotes here, and you don't use them below:

> As stated in the normative text, the site needs to explain and acquire consent immediately prior to calling the Store API,

you mix singular API:
> Even though the site has acquired the user's informed consent before calling the 'Store' API,
with plural APIs:
> On other visits, a site can call the 'Confirm' APIs to enquire [sp] whether a specific exception has been granted and stands in the user agent.

"ask" (enquire is British and really too formal)

> If they do grant it (using some positive interaction such as a button), the site can return to checking the 'Confirm' API.

This doesn't match the earlier text at all. Unless it means "for future visits".



[1] http://www.w3.org/TR/2015/CR-tracking-dnt-20150820/
[2] http://www.w3.org/TR/2015/WD-WebIDL-1-20150804/
[3] http://www.w3.org/TR/indie-ui-events/
[4] http://m.hitchbot.me/
[5] http://www.w3.org/TR/2015/CR-tracking-dnt-20150820/#terminology
[6] https://blog.mozilla.org/addons/2015/08/21/the-future-of-developing-firefox-add-ons/
[7] http://www.bing.com/search?q=%22web-wide%22&qs=n&form=QBLH&pq=%22web-wide%22&sc=2-10&sp=-1&sk=&cvid=75fba4efa58c414f854ac2cac9712cd7
Received on Wednesday, 26 August 2015 01:24:57 UTC