Re: Uniquely identifying instances from Lorenzo Moriondo on 2018-07-03 (public-hydra@w3.org from July 2018)

From: Lorenzo Moriondo <tunedconsulting@gmail.com>
Date: Tue, 3 Jul 2018 18:05:57 +0100
To: Markus Lanthaler <markus.lanthaler@gmx.net>
Cc: Hydra <public-hydra@w3.org>, Akshay dahiya <xadahiya@gmail.com>, chris andrew <chrisandrew119@gmail.com>, Vaibhav Chellani <vaibhavchellani223@gmail.com>, sandeep chauhan <sandeepsajan0@gmail.com>
Message-ID: <CAKgLLmsO4tHcs_uxU8-Chca3hbD7u_yUPwmtY=crEk6r9SiwEA@mail.gmail.com>

Hi
 Markus
,

I take the opportunity of this discussion to address also a more general
theme.
I thought this first implementation for unique ids as the first step to
reach a way of addressing resources by using only properties known by the
 servers/agents in the network
, with no need of primary keys at all.

As a
n initial temporary
 step we are just defaulting the id field to a function generating uuid.
But in the future we may want to have something more advanced. For example,
let's say there is an object like:


{
  "
device_ip
": "
192.168.1.99

",

 "kind": "sensor",
  
"
network
": "
Infra2
",
  "
added

": "
123456789
"
,
  "serial_no": "123456-4124-13213"
}

The resource
's
type may
 support
 a
declaration
to define some kind of unique constrain that specifies that every object
with a unique combination of
some of 
the above properties is unique (equivalent to `UniqueConstraint(
device_ip
,
network
,
serial_no
)` used by some Python ORMs). With these two pieces (the object and the
constraint definition) the server and the client may address the object by
calling
 something like `md5(value|value|value)
.hexdigest
`, this would allow any peer in the network to use a standard
naming/identification
"mini-protocol" 
for the given type.
This can be even more powerful in the case of ids that are
"unique-by-design" (like ISBN for books or patents ids for patents). By
defining a single property as "globally unique" we can be sure to identify
it in the whole domain of its type even over different networks.
 We are then sure than every graph in every agent can recognize easily the
node in its graph as the same node on another agents' graph representation.


If this makes sense, we may need a part of the spec with the *semantics to
specify constraints* on properties (like what already happens in SQL) and
another part to let the resource holders to publish the
*semantics to let other agents build the same unique ids from same props'
values*. So that the id of an object may be the result of calling a
function over the values of the properties declared as part of the
constraint.
In this scenario, a constraint may be defined as a sequence of supported
properties that are flagged with `hydra:required`
 for example; or a property may be declared globally unique and the id be
just `md5(value).hexdigest`.

This feature may avoid a lot of data management issues in the future and
avoid possible conflicts over servers providing the same resources/types
(like data duplication issues). And, for example, allow
basic 
checks on schema validation where the schema is the ApiDoc itself. In
general, we want to implement a synchronisation protocol between data
providers and data agent/clients so to allow caching/querying of RDF graphs.

I know Hydra is meant to be an API framework but the trend and the
experiments we made look like to move towards a closer integration of
interface and persistence layer (like for example in pure REST
microservices), so to have both using the same semantics. This is
particularly true from the fact that interface
,
web-server and database are autodeployed from the very same Hydra doc file
(obviously avoiding locking into any particular DBMS).

As we move into a more operational field, it may happen that the spec will
need new features in this sense because data operations add a different
layer of needs to the core modeling layer.
 
Let me know if this make sense and if this layer of features is supposed to
be part of a separate documentation or be added to the core spec. In my
opinion, the first option would only increase the complexity in usage; the
second may
 be a useful addition to usability and maintainability
.


> The semantics of an identifier shouldn't change.. otherwise pretty much
> all bets are off.
>

I think the semantics it may be different from type to type, as far as
the publisher of the resource/type can define it accordingly to unique
properties in the type, or indicate one of the property as a globally
unique identifier for the domain of the type (a sort of "mini-protocol"
with the semantics to allow the publisher to define multi-props constraint
and an hashing function).

Using the id plus timestamp may partially solve initially, in the very same
way that the uuid does, and that is what we are doing at the moment. But
assigning a new pk to an object that is already given a unique global
identifier (like ISBN for a book again) it just increases complexity in
keeping maps of ids when it can be avoided just declaring
`md5(ISBN).hexdigest` to be globally unique for the type "Books".



Best,

Lorenzo



>
>
> Cheers,
> Markus
>
>
>
> --
> Markus Lanthaler
> @markuslanthaler
>
>
>
>

Received on Tuesday, 3 July 2018 17:06:31 UTC