Re: IDs (was EasierRDF) from Hugh Glaser on 2022-02-18 (semantic-web@w3.org from February 2022)

From: Hugh Glaser <hugh@glasers.org>
Date: Fri, 18 Feb 2022 12:32:22 +0000
To: hans.teijgeler@quicknet.nl
Cc: rowen.rathling@gmail.com, Matthew Lange <matthew@ic-foods.org>, Semantic Web <semantic-web@w3.org>
Message-Id: <407FB112-2A90-4797-9DD7-18A2561435C2@glasers.org>
Thanks Hans,

It seems to me that your approach is that roughly the UUID is the only strong ID, and any other “ID”s are simply labels to you.
And you manage all the data during ingestion by generating a brand new UUID ( https://xkcd.com/927/ :-) ).
So you have a relatively closed system: you create your own representations of knowledge coming in, and people consuming have to use your IDs.

Not saying that’s bad - this stuff is hard enough without getting involved in the problems I am asking about.

BTW, I am aware of ISO15926, DT & CDDB, and somewhat involved in the developing 4D stuff.
And I think that Mathew & Al & I have a gentle agreement that the standards don’t really address the ID management problems, which will need something to do so in due course.
Especially (National) Digital Twin stuff - you just can’t rely on everyone using the same IDs for everything; and you can’t expect to put all the data in a single, curated, store, certainly not at a national scale.

Best
Hugh


> On 17 Feb 2022, at 23:07, hans.teijgeler@quicknet.nl wrote:
> 
> Hi Hugh,
>  
> First of all our domain is a process plant, and all plant items are declared and get their UUID and a label.
> These are stored in the consolidating triple store of the plant.
>  
> Then comes a project to revamp, or to extend the plant with a new unit. In most cases that is handled by a so-called EPC contractor (EPC = Engineering Procurement, Construction).
> In the case of a revamp the plant owner must share the relevant data about the existing situation. Ideally this is done by federation, leaving that information under the control of the plant owner since that plant is still in operation for the, say, two years that such a revamp takes (which makes it an update of a moving target).
>  
> In either case the EPC contractor starts to design the new situation, as the design part of a Digital Twin (to be: DTs are making inroads in this field, Messrs Aveva, for instance, are designing a new one based on ISO 15926. At least that is what they told us). Once the design is finished, the procurement has been done, and the plant has been constructed the contractor hands over the design & engineering information to the plant owner. It is this rather complex use case that triggered the development of ISO 15926, also because plant owners and EPC contractors have a ‘promiscuous’ relationship in most cases, causing endless interfacing.
>  
> Now to your questions:
> [HG] How do you manage the UUIDs?
> [HT] The issue of UUIDs is not managed because the chances of double occurrence of a UUID is, for all practical purposes, zero. I read: ‘128-bits is big enough and the generation algorithm is unique enough that if 1,000,000,000 GUIDs per second were generated for 1 year the probability of a duplicate would be only 50%.’. On top of that they are residing in an endpoint of which there are also a gazillion. I use this generator, but undoubtedly there are more of those.
>  
> [HG] For example, when you start to use data from a new DB, do you modify it to re-write the primary keys (or whatever) to the UUID?
> [HT] In the treatise I sent to you on Feb. 15th you can read that we intend to map the data of all applications that are used in the context of a plant to ISO 15926-8 in Turtle. In those apps the identifiers (tag numbers) are those dictated by the plant owner. When mapped a check is made by the software to see if that identifier already exists. If not, the technical discipline involved must decide what to do. But that is exceptional, unless the users of the app spelled the identifier incorrectly. In the little diagram I sent an in-between triple store is shown where the responsible discipline can correct or ignore or transfer the triples to the consolidating triple store. Because of the uniqueness of the UUIDs that transfer can basically be done by changing the endpoint of it during the transfer.
>  
> [HG] Or do you add the UUID to the DB, with an internal mapping table to the keys, and then modify everyones existing queries to include the UUIDs?
> [HT] The mapping between identifier and UUID is in the declaration, e.g.:
>  
> ex:847931fd-eade-4beb-b07d-a9e889611c19
>             rdf:type lci:InanimatePhysicalObject, dm:WholeLifeIndividual, dm: ActualIndiidual, rdl:RDS414674 ; # VESSEL
>             rdfs:label "HG-ey37" ;
>             meta:valEffectiveDate "2021-04-13T15:29:00Z"^^xsd:dateTime .
>  
> That mapping is used to fetch the UUID for a human-readable label. When an object has more labels, such as serial number, asset number, maintenance number, etc, each identification is covered by a template instance, for example:
>  
> ex:763c75da-97c1-4b4e-b699-cf616c7c7a5d
>       rdf:type tpl:ClassifiedIdentificationOfIndividual
>       rdfs:label "[VESSEL] individual [HG-ey37] has an [IDENTIFICATION BY ASSET NUMBER] [AN-45348832]"@en ; # storage of this label is optional – it could be generated on the fly
>       tpl:hasIdentified ex:847931fd-eade-4beb-b07d-a9e889611c19  ; # HG-ey37
>       tpl:hasIdentifier "AN-45348832" ;
>       tpl:hasIdentificationType rdl:RDS2221102  ; # IDENTIFICATION BY ASSET NUMBER
>       meta:valEffectiveDate "2021-09-21T10:24:00Z"^^xsd:dateTime .
>  
> Please note that templates are representing elementary, autonomous, information chunks, not the object(s) where the information is about. Actually an elementary KG.
>  
> [HG] Or is there some wrapping layer around it somehow?
> [HT] No
>  
> [HG] How do you make your UUIDs discoverable, in particular in relation to external IDs that come from different DBs?
> [HT] In addition to what I wrote above, our Reference Data Library has extensions for a number of standardization bodies, like ASTM, ASME, DIN, BS, IEC, etc. What we do is assigning our own number and making reference to a particular standard class. For instance:
>  
> Transmitter
> id                                 http://data.15926.org/iec/ABA880
> rdfs:label                     Transmitter
> skos:definition             A <Transmitter> is a <Measuring instrument component> and a <PROCESS VARIABLE TRANSMITTER> that 
> accepts a process variable and converts it according to a definite law into a standardized output signal.
> owl:sameAs                https://cdd.iec.ch/cdd/iec61987/iec61987.nsf/TU0/0112-2---61987%23ABA880
> meta:valEffectiveDate 2021-10-03Z
> rdf:type                       ClassOfFunctionalObject
> rdfs:subClassOf         Measuring instrument component
> rdfs:subClassOf         PROCESS VARIABLE TRANSMITTER
>  
> In other cases, where the standardization body has no endpoint, we just refer to a standard, such as:
>  
> FLANGED END RING JOINT ASME B16.5 CLASS 2500 NPS 10
> id                                 http://data.15926.org/asme/RDS730304
> rdfs:label                     FLANGED END RING JOINT ASME B16.5 CLASS 2500 NPS 10
> rdf:type                       ClassOfFeature
> rdfs:subClassOf         FLANGED END RING JOINT ASME B16.5
> rdfs:subClassOf         FLANGED END ASME B16.5 CLASS 2500 NPS 10
> skos:definition A         <FLANGED END RING JOINT ASME B16.5 CLASS 2500 NPS 10> is a <FLANGED END ASME B16.5 CLASS 2500 NPS 10> 
> and a <FLANGED END RING JOINT ASME B16.5> conforming to the specification for Class 2500, NPS 10
>  
> You see: small questions – large answers.
>  
> Regards, Hans 
> 15926.org
>  
> PS This presentation may interest you: https://www.youtube.com/watch?v=tRGHBYsz2KM  It describes the next step after ISO 15926: the CDBB Project in the UK: https://www.cdbb.cam.ac.uk/
> _________________________________________________________________________________________________________________________
> 
> From: Hugh Glaser <hugh@glasers.org> 
> Sent: donderdag 17 februari 2022 20:09
> To: hans.teijgeler@quicknet.nl
> Cc: Matthew Lange <matthew@ic-foods.org>; Semantic Web <semantic-web@w3.org>
> Subject: Re: EasierRDF
>  
> By the way, how do you manage the UUIDs?
> For example, when you start to use data from a new DB, do you modify it to re-write the primary keys (or whatever) to the UUID?
> Or do you add the UUID to the DB, with an internal mapping table to the keys, and then modify everyones existing queries to include the UUIDs?
> Or is there some wrapping layer around it somehow?
> How do you make your UUIDs discoverable, in particular in relation to external IDs that come from different DBs?
>  
> Or maybe I am thinking of a different world.
> Cheers
>  
> > On 17 Feb 2022, at 14:03, <hans.teijgeler@quicknet.nl> <hans.teijgeler@quicknet.nl> wrote:
> > 
> > Hi Hugh,
> > 
> > We use UUIDs, because of the long period in time and the many contributors of life-cycle information.
> > Next to the UUID we use rdfs:label for easy access. Label can change, the UUID stays lifelong.
> > 
> > Regards, Hans
Received on Friday, 18 February 2022 12:32:46 UTC