[Specifications] Semantics to define constraints or globally unique ids from Lorenzo via GitHub on 2018-07-06 (public-hydra-logs@w3.org from July 2018)

From: Lorenzo via GitHub <sysbot+gh@w3.org>
Date: Fri, 06 Jul 2018 16:37:00 +0000
To: public-hydra-logs@w3.org
Message-ID: <issues.opened-339001758-1530895019-sysbot+gh@w3.org>
Mec-iS has just created a new issue for https://github.com/HydraCG/Specifications:

== Semantics to define constraints or globally unique ids ==
As explained in a recent [post in the mailing list](https://lists.w3.org/Archives/Public/public-hydra/2018Jul/0004.html):

> 
> I take the opportunity of this discussion to address also a more general theme.
> I thought this first implementation for unique ids as the first step to reach a way of addressing resources by using only properties known by the servers/agents in the network, with no need of primary keys at all.
> 
> As an initial temporary step we are just defaulting the id field to a function generating uuid. But in the future we may want to have something more advanced. For example, let's say there is an object like:
> 
```
{
  "device_ip": "192.168.1.99",
  "kind": "sensor",
  "network": "Infra2",
  "added": "123456789",
  "serial_no": "123456-4124-13213"
}
```
> 
> The resource's type may support a declaration to define some kind of unique constrain that specifies that every object with a unique combination of some of the above properties is unique (equivalent to `UniqueConstraint(device_ip, network, serial_no)` used by some Python ORMs). With these two pieces (the object and the constraint definition) the server and the client may address the object by calling something like `md5(value|value|value).hexdigest`, this would allow any peer in the network to use a standard naming/identification "mini-protocol" for the given type.
> This can be even more powerful in the case of ids that are "unique-by-design" (like ISBN for books or patents ids for patents). By defining a single property as "globally unique" we can be sure to identify it in the whole domain of its type even over different networks. We are then sure than every graph in every agent can recognize easily the node in its graph as the same node on another agents' graph representation.
> 
> If this makes sense, we may need a part of the spec with the semantics to specify constraints on properties (like what already happens in SQL) and another part to let the resource holders to publish the semantics to let other agents build the same unique ids from same props' values. So that the id of an object may be the result of calling a function over the values of the properties declared as part of the constraint.
> In this scenario, a constraint may be defined as a sequence of supported properties that are flagged with `hydra:required` for example; or a property may be declared globally unique and the id be just `md5(value).hexdigest`.
> 
> This feature may avoid a lot of data management issues in the future and avoid possible conflicts over servers providing the same resources/types (like data duplication issues). And, for example, allow basic checks on schema validation where the schema is the ApiDoc itself. In general, we want to implement a synchronisation protocol between data providers and data agent/clients so to allow caching/querying of RDF graphs.
> 
> I know Hydra is meant to be an API framework but the trend and the experiments we made look like to move towards a closer integration of interface and persistence layer (like for example in pure REST microservices), so to have both using the same semantics. This is particularly true from the fact that interface, web-server and database are autodeployed from the very same Hydra doc file (obviously avoiding locking into any particular DBMS).
> 
> As we move into a more operational field, it may happen that the spec will need new features in this sense because data operations add a different layer of needs to the core modeling layer. Let me know if this make sense and if this layer of features is supposed to be part of a separate documentation or be added to the core spec. In my opinion, the first option would only increase the complexity in usage; the second may be a useful addition to usability and maintainability.
> 
> 
>> The semantics of an identifier shouldn't change.. otherwise pretty much all bets are off.
> 
> I think the semantics it may be different from type to type, as far as the publisher of the resource/type can define it accordingly to unique properties in the type, or indicate one of the property as a globally unique identifier for the domain of the type (a sort of "mini-protocol" with the semantics to allow the publisher to define multi-props constraint and an hashing function).
> 
> Using the id plus timestamp may partially solve initially, in the very same way that the uuid does, and that is what we are doing at the moment. But assigning a new pk to an object that is already given a unique global identifier (like ISBN for a book again) it just increases complexity in keeping maps of ids when it can be avoided just declaring `md5(ISBN).hexdigest` to be globally unique for the type "Books".


Please view or discuss this issue at https://github.com/HydraCG/Specifications/issues/168 using your GitHub account
Received on Friday, 6 July 2018 16:37:03 UTC