- From: Adrian Gropper <agropper@healthurl.com>
- Date: Sun, 1 Dec 2024 22:35:32 -0500
- To: W3C Credentials Community Group <public-credentials@w3.org>
- Message-ID: <CANYRo8g1Ba0EYx56x=E_BDoLFvvQ4ztb6kruQt+KkVaWOinuFw@mail.gmail.com>
Chrisitne's analysis "How decentralized is Bluesky really?"[1], discusses the need for protection against the organization becoming an adversary in the future. Social networks, like X and Bluesky, are subject to very strong network effects that drive huge investment of resources. Other organizations subject to similarly strong network effects include Wikipedia, search engines, and large language models (LLM). Some service categories subject to strong network effects try to lock-in users as a core strategy. In that way, investor funds spent on customer acquisition to seed the network effect are protected in subsequent stages of enshitification[2]. LLMs, like the search engines they are already replacing, are subject to strong network effects with one major new twist. Nobody expects to dump megabytes of private data as "context" for a Google search the way we might as part of an LLM chat. Furthermore, LLM operators treasure the opportunity to improve their models based on sensitive and hard-to-come-by data such as medical records. Society at large also benefits from improved models and, all other things being equal, would gladly contribute private and sensitive data to promote medical science or economic progress. As a consequence of their need for private data under the control of the data subject, LLMs will present themselves as trustworthy. But what's to keep them honest? Regulations are one approach, but there's plenty of reason to believe that regulating proprietary, closed, LLMs will be difficult and even the definition of open-source AI is controversial.[3] The best models also need high quality training data curated by experts that need to be paid. Plenty of capital for software development, expert curation, and access to private data is therefore essential for a network effect around a world-leading LLM. What's to keep locked-in customers using and paying for the LLM as the organization behind it transitions toward adversarial policies and enshitification? Avoiding customer lock-in is an obvious way to reduce the incentive for trusted LLMs to become adversarial. Search engines and browsers as user agents, for example, are kept honest by how easy it is for customers to switch. But browsers are not semi-autonomous the way AI user agents are likely to be. This means that users will invest in training their AI agent as an authorization server for access to their private data and that LLM vendors will try to convince users to "park" private data with them as a way to speed results, improve personalization, and reduce API cost. Simply put, LLMs will want to be your trusted agent and lock-in will be an important strategy. The standards that keep our browsers from locking us in are not sufficient for a semi-autonomous agent with access to orders of magnitude more private data. The current state of the art is unsettled. Many personal agent projects forgo the benefits of access to a world-class LLM. Apple Intelligence takes a different approach with strong branding and lock-in to their privacy-preserving user agent but inclusion of LLM access at the user's risk. The third approach, currently favored by large LLMs, is to operate proprietary app stores and offer incentives for users to park private data with the LLM operator themselves. The browser as user agent has been challenged by apps for many years. Biometric secure elements and more powerful voice interfaces like Siri will further erode the browser model. We need new standards to protect us from the temptation to enshitify LLMs. Adrian [1] https://dustycloud.org/blog/how-decentralized-is-bluesky/ [2] https://americandialect.org/2023-word-of-the-year-is-enshittification/ [3] https://hackmd.io/@opensourceinitiative/osaid-faq
Received on Monday, 2 December 2024 03:35:48 UTC