Principles of Identity in Web Architecture from Melvin Carvalho on 2021-06-06 (www-tag@w3.org from June 2021)

From: Melvin Carvalho <melvincarvalho@gmail.com>
Date: Sun, 6 Jun 2021 12:00:29 +0200
To: TAG List <www-tag@w3.org>
Message-ID: <CAKaEYhJuHqj8PqKuUR9yz-FgGkTSAm-6gftiThWTZHdn_cFcTA@mail.gmail.com>
At TPAC 2012 I proposed to timbl, a modular approach to Identity on the
Web.

Back then the majority of systems tightly coupled together, Identity,
authorization and authentication.  My proposal was that the Identity part
should stand on its own merits, and be a modular piece of a wider
architecture

To my surprise and delight, he agreed with this, and persuaded our group to
take this approach, and rewrite specs into what was to become the WebID
suite

*Architectural Principles*

I wanted now to propose some further architectural principles, based on
what we've learnt in the following decade, and align it with web
architecture.  They are as follows:

1. Separate identifiers from identity
2. Identifiers are a string of characters, a global primary key
3. Your identity are keys,values,links tied to your identifer
4. Your identity is protocol, medium and transport agnostic
5. Separate data and protocol meta data from identity data

Applying these 5 architectural principles, I believe it would be possible
for every identity system on the web to be largely interoperable.  And by
web I include other URI schemes that http, and the P2P web

A few words on each point

*1. Separate identifiers from Identity*

Identity comes in many shapes and forms.  People tend to talk about
identity and identifiers interchangeably and we seem not to have a common
vocabulary that everyone can live with.  I'll use the term identifier
loosely to be a string of characters to denote a user (or agent).  And
Identity are attributes associated with that.

*2. Identifiers are a string of characters, a global primary key*

When talking about identifiers in a system, it's important to actually get
down to what that identifier looks like.  What is the string of
characters.  In order to interoperate with other systems, this must be well
defined, and should be a primary key to your system.  Too often this is not
done and there is more than one primary key, or overloading occurs, "your
public key is your identity".  Ideally this should be a URI, tho not all
large systems on the web will use a URI, which leads to balkanization.
Many databases work on the principle of primary and foreign keys.  Identity
needs this.

*3. Your identity are attributes, values, links tied to your identifier*

I'm going to loosely describe your identity as attributes, values and links
tied to your identifier.  Most identity systems do this under the hood.
For a while RDF was recommended by the TAG as the solution to this, but
different systems will use different solutions such as JSON(-LD) or CBOR.
What's important I think is the Entity Attribute Value (EAV) model of tying
attributes to an identifier.  Also important that links are allowed in that
structure.  Unfortunately JSON doesnt have a native syntax for links like
turtle does.  Perhaps this is an area of standardization.  Links enable
heterogeneous systems to work together

*4. Your identity is protocol, medium and transport agnostic*

When people talk about the web they talk about http.  However, there is
every indication, that web was designed to bring together many large
systems.  http: URIs working with file:,  irc:, ftp: etc.  It should even
work with systems that have UUIDs and not (yet) URIs.  The principle is
that any data that you want to share should not include anything about the
transport.  Instead, that can get cleanly separated into meta data

*5. Separate data and protocol meta data from identity data*

The http/html web quite cleanly separates a document from its data, and
protocol from content.  It does this using headers for a document.  Also
within the document HEAD and BODY tags aim to cleanly separate data about
the document from data about the thing within.  In http the thing within is
cleanly separated from the protocol data using the "#" character.  In
JSON-LD 1.1 you can do something similar using "@id" : "".  Put your meta
data in there, and your identity data is linked to that.  In this way it
can be reused in different systems, publishing, messaging, ledgers, auth,
leading to increased functionality for the end user, tied together
seamlessly

*Summary*

There's growing interest in using the web in a more distributed and
decentralized way.  IMHO, by employing some or all of the 5 rough
architectural principles above, it's possible to bring together different
systems operating on the internet in a more distributed and decentralized
way

Related:  timbl's essay on the giant global graph:
https://web.archive.org/web/20160713021037/http://dig.csail.mit.edu/breadcrumbs/node/215

Feedback on any or all of the points welcome!
Received on Sunday, 6 June 2021 10:01:07 UTC