Google indexing chat discussions - URIs as capabilities from Noah Mendelsohn on 2023-09-27 (www-tag@w3.org from September 2023)

From: Noah Mendelsohn <noah@cs.tufts.edu>
Date: Wed, 27 Sep 2023 10:45:53 -0400
To: "www-tag@w3.org" <www-tag@w3.org>
Message-ID: <3ef80c80-4da5-8003-9305-73f0126e9561@cs.tufts.edu>

Hello TAG folks after all these years. I hope you are doing well!

Back when I was in the group (centuries ago), one of the things we discussed was the degree to which the Web was architected to support the use of URIs as capabilities. By that I mean, roughly: to what degree and in which circumstances can we assume that distribution or publication of some particular URI is sufficiently well controlled and limited, that possession of the URI can be taken as permission to make some use of the identified resource?

Obviously, such URI-as-capability designs have become common in corners of the Web. For example, many Web-based file storage systems (Dropbox, Google Drive, among many others) allow you to mint a URI with the understanding that you will probably share it, perhaps via email, only with those who are to have access. (...and yes, precisely because URIs are suspect as a capability mechanism, the better among these storage systems also allow you to mint URIs in which access control is properly orthogonal to identification, I.e. for which you need to log on to get access to some particular document).

Back in the day, I was somewhat of an outlier on the TAG in being nervous about endorsing this URI-as-capability practice. My view was and is that mechanisms for controlling the distribution and use of URIs are at best ad hoc and unreliable. Most obviously, URIs wind up in places like traffic logs, screen shots of browser windows, etc. but in general there is no clean architecture for managing their transmission or distribution. Of course, it's a feature of the web that, from a technological point of view, plain-text URIs can be freely copied anywhere.

Anyway, what reminded me of all this was a recent Slashdot posting [1], pointing to a set of discussions relating to Google's apparent use of Bard conversations as input to the Google search index. What caught my eye was the explanation:

"Google Brain research scientist Peter J. Liu replied to Ghotra on X by noting that the Google Search indexing only occurred for those conversations that users had elected to click the share link on, not all Bard conversations, to which Ghotra patiently explained: "Most users wouldn't be aware of the fact that shared conversation mean it would be indexed by Google and then show up in SERP, most people even I was thinking of it as a feature to share conversation with some friend or colleague & it being just visible to people who have conversation URL."

Ultimately, Google's Search Liaison account on X, which provides "insights on how Google Search works," wrote back to Ghotra to say "Bard allows people to share chats, if they choose. We also don't intend for these shared chats to be indexed by Google Search. We're working on blocking them from being indexed now."

So, yes as a policy matter Google has decided not to do this in the future for this particular class of URis, but this incident reminds me how easy it is to conflate sending a URI to some particular party for some particular purpose, with ensuring that the same URI won't, accidentally or on purpose, be used for other unintended purposes.

BTW: I'm not specifically asking the TAG to do anything about this, unless you happen to feel that this needs renewed attention around now. Just noting that things like this somewhat reinforce my nervousness about relying on URIs as capabilities. In the world I come from, capability-based systems are built so that access to capabilities is carefully and explicitly managed, typically by some trusted security kernel.

Noah

[1] https://tech.slashdot.org/story/23/09/27/0235250/google-search-caught-publicly-indexing-users-conversations-with-bard-ai

Received on Thursday, 28 September 2023 09:16:41 UTC