RE: IDs (was EasierRDF) from hans.teijgeler@quicknet.nl on 2022-02-18 (semantic-web@w3.org from February 2022)

From: <hans.teijgeler@quicknet.nl>
Date: Fri, 18 Feb 2022 19:22:41 +0100
To: "'Hugh Glaser'" <hugh@glasers.org>
Cc: "'Matthew Lange'" <matthew@ic-foods.org>, "'Semantic Web'" <semantic-web@w3.org>
Message-ID: <013901d824f4$84afe610$8e0fb230$@quicknet.nl>
Hugh,

 

[HG]......you manage all the data during ingestion by generating a brand new
UUID

[HT] Only for new template instances, since these are representing new
information

 

[HG] ..... you create your own representations of knowledge coming in, and
people consuming have to use your IDs.

[HT] The users arent using those UUIDs, they keep using their identifiers,
and the fetched input information is also UUID-ignorant. The use of "our"
IDs is limited to the reference data, because of the adherence to the upper
ontology. Users can set up their local RDL extension, using their own
identifiers, as long as these are made specializations of the 15926 RDL.
CFIHOS is using theirs, and these are subclasses of RDL classes, for
example:

transmitter

id                                 http://data.15926.org/cfihos/30000661

rdfs:label                     transmitter

rdf:type                       ClassOfFunctionalObject # this a an ISO
15926-2 entity type

rdfs:subClassOf         instrument equipment # this is a CFIHOS superclass

rdfs:subClassOf         TRANSMITTER   # this is an RDL class

skos:definition             A physical object that is an element that
receives a process variable signal from a sensor and converts it into an
output signal

 

The system is wide open because everybody does his/her thing and is unaware
of ISO 15926 and yet they can share information via the, oftentimes
federated, triple stores. And yes, a public "sameAs store", that you
mentioned earlier, is a good idea. When using UUIDs duplicates in that store
will, for all intents and purposes, non-existent. But put some strong
protection in that store to avoid hackers to create havoc.

 

[HG] Especially (National) Digital Twin stuff - you just can't rely on
everyone using the same IDs for everything; and you can't expect to put all
the data in a single, curated, store, certainly not at a national scale.

[HT] They don't have to, but they need to use an upper ontology that
overarches all participating domains. Without that you keep writing and
maintaining interfaces till Doomsday.

 

Regards, Hans 

__________________________________________________________________

-----Original Message-----
From: Hugh Glaser <hugh@glasers.org> 
Sent: vrijdag 18 februari 2022 13:32
To: hans.teijgeler@quicknet.nl
Cc: rowen.rathling@gmail.com; Matthew Lange <matthew@ic-foods.org>; Semantic
Web <semantic-web@w3.org>
Subject: Re: IDs (was EasierRDF)

 

Thanks Hans,

 

It seems to me that your approach is that roughly the UUID is the only
strong ID, and any other "ID"s are simply labels to you.

And you manage all the data during ingestion by generating a brand new UUID
(  <https://xkcd.com/927/> https://xkcd.com/927/ :-) ).

So you have a relatively closed system: you create your own representations
of knowledge coming in, and people consuming have to use your IDs.

 

Not saying that's bad - this stuff is hard enough without getting involved
in the problems I am asking about.

 

BTW, I am aware of ISO15926, DT & CDDB, and somewhat involved in the
developing 4D stuff.

And I think that Matthew & AlI have a gentle agreement that the standards
don't really address the ID management problems, which will need something
to do so in due course.

Especially (National) Digital Twin stuff - you just can't rely on everyone
using the same IDs for everything; and you can't expect to put all the data
in a single, curated, store, certainly not at a national scale.

 

Best

Hugh

 

 

> On 17 Feb 2022, at 23:07,  <mailto:hans.teijgeler@quicknet.nl>
hans.teijgeler@quicknet.nl wrote:

> 

> Hi Hugh,

>  

> First of all our domain is a process plant, and all plant items are
declared and get their UUID and a label.

> These are stored in the consolidating triple store of the plant.

>  

> Then comes a project to revamp, or to extend the plant with a new unit. In
most cases that is handled by a so-called EPC contractor (EPC = Engineering
Procurement, Construction).

> In the case of a revamp the plant owner must share the relevant data about
the existing situation. Ideally this is done by federation, leaving that
information under the control of the plant owner since that plant is still
in operation for the, say, two years that such a revamp takes (which makes
it an update of a moving target).

>  

> In either case the EPC contractor starts to design the new situation, as
the design part of a Digital Twin (to be: DTs are making inroads in this
field, Messrs Aveva, for instance, are designing a new one based on ISO
15926. At least that is what they told us). Once the design is finished, the
procurement has been done, and the plant has been constructed the contractor
hands over the design & engineering information to the plant owner. It is
this rather complex use case that triggered the development of ISO 15926,
also because plant owners and EPC contractors have a 'promiscuous'
relationship in most cases, causing endless interfacing.

>  

> Now to your questions:

> [HG] How do you manage the UUIDs?

> [HT] The issue of UUIDs is not managed because the chances of double
occurrence of a UUID is, for all practical purposes, zero. I read: '128-bits
is big enough and the generation algorithm is unique enough that if
1,000,000,000 GUIDs per second were generated for 1 year the probability of
a duplicate would be only 50%.'. On top of that they are residing in an
endpoint of which there are also a gazillion. I use this generator, but
undoubtedly there are more of those.

>  

> [HG] For example, when you start to use data from a new DB, do you modify
it to re-write the primary keys (or whatever) to the UUID?

> [HT] In the treatise I sent to you on Feb. 15th you can read that we
intend to map the data of all applications that are used in the context of a
plant to ISO 15926-8 in Turtle. In those apps the identifiers (tag numbers)
are those dictated by the plant owner. When mapped a check is made by the
software to see if that identifier already exists. If not, the technical
discipline involved must decide what to do. But that is exceptional, unless
the users of the app spelled the identifier incorrectly. In the little
diagram I sent an in-between triple store is shown where the responsible
discipline can correct or ignore or transfer the triples to the
consolidating triple store. Because of the uniqueness of the UUIDs that
transfer can basically be done by changing the endpoint of it during the
transfer.

>  

> [HG] Or do you add the UUID to the DB, with an internal mapping table to
the keys, and then modify everyones existing queries to include the UUIDs?

> [HT] The mapping between identifier and UUID is in the declaration, e.g.:

>  

> ex:847931fd-eade-4beb-b07d-a9e889611c19

>             rdf:type lci:InanimatePhysicalObject, dm:WholeLifeIndividual,
dm: ActualIndiidual, rdl:RDS414674 ; # VESSEL

>             rdfs:label "HG-ey37" ;

>             meta:valEffectiveDate "2021-04-13T15:29:00Z"^^xsd:dateTime .

>  

> That mapping is used to fetch the UUID for a human-readable label. When an
object has more labels, such as serial number, asset number, maintenance
number, etc, each identification is covered by a template instance, for
example:

>  

> ex:763c75da-97c1-4b4e-b699-cf616c7c7a5d

>       rdf:type tpl:ClassifiedIdentificationOfIndividual

>       rdfs:label "[VESSEL] individual [HG-ey37] has an [IDENTIFICATION BY
ASSET NUMBER] [AN-45348832]"@en ; # storage of this label is optional - it
could be generated on the fly

>       tpl:hasIdentified ex:847931fd-eade-4beb-b07d-a9e889611c19  ; #
HG-ey37

>       tpl:hasIdentifier "AN-45348832" ;

>       tpl:hasIdentificationType rdl:RDS2221102  ; # IDENTIFICATION BY
ASSET NUMBER

>       meta:valEffectiveDate "2021-09-21T10:24:00Z"^^xsd:dateTime .

>  

> Please note that templates are representing elementary, autonomous,
information chunks, not the object(s) where the information is about.
Actually an elementary KG.

>  

> [HG] Or is there some wrapping layer around it somehow?

> [HT] No

>  

> [HG] How do you make your UUIDs discoverable, in particular in relation to
external IDs that come from different DBs?

> [HT] In addition to what I wrote above, our Reference Data Library has
extensions for a number of standardization bodies, like ASTM, ASME, DIN, BS,
IEC, etc. What we do is assigning our own number and making reference to a
particular standard class. For instance:

>  

> Transmitter

> id                                  <http://data.15926.org/iec/ABA880>
http://data.15926.org/iec/ABA880

> rdfs:label                     Transmitter

> skos:definition             A <Transmitter> is a <Measuring instrument
component> and a <PROCESS VARIABLE TRANSMITTER> that 

> accepts a process variable and converts it according to a definite law
into a standardized output signal.

> owl:sameAs
<https://cdd.iec.ch/cdd/iec61987/iec61987.nsf/TU0/0112-2---61987%23ABA880>
https://cdd.iec.ch/cdd/iec61987/iec61987.nsf/TU0/0112-2---61987%23ABA880

> meta:valEffectiveDate 2021-10-03Z

> rdf:type                       ClassOfFunctionalObject

> rdfs:subClassOf         Measuring instrument component

> rdfs:subClassOf         PROCESS VARIABLE TRANSMITTER

>  

> In other cases, where the standardization body has no endpoint, we just
refer to a standard, such as:

>  

> FLANGED END RING JOINT ASME B16.5 CLASS 2500 NPS 10

> id                                  <http://data.15926.org/asme/RDS730304>
http://data.15926.org/asme/RDS730304

> rdfs:label                     FLANGED END RING JOINT ASME B16.5 CLASS
2500 NPS 10

> rdf:type                       ClassOfFeature

> rdfs:subClassOf         FLANGED END RING JOINT ASME B16.5

> rdfs:subClassOf         FLANGED END ASME B16.5 CLASS 2500 NPS 10

> skos:definition A         <FLANGED END RING JOINT ASME B16.5 CLASS 2500
NPS 10> is a <FLANGED END ASME B16.5 CLASS 2500 NPS 10> 

> and a <FLANGED END RING JOINT ASME B16.5> conforming to the 

> specification for Class 2500, NPS 10

>  

> You see: small questions - large answers.

>  

> Regards, Hans

> 15926.org

>  

> PS This presentation may interest you: 

>  <https://www.youtube.com/watch?v=tRGHBYsz2KM>
https://www.youtube.com/watch?v=tRGHBYsz2KM  It describes the next 

> step after ISO 15926: the CDBB Project in the UK: 

>  <https://www.cdbb.cam.ac.uk/> https://www.cdbb.cam.ac.uk/ 

> ______________________________________________________________________

> ___________________________________________________

> 

> From: Hugh Glaser < <mailto:hugh@glasers.org> hugh@glasers.org>

> Sent: donderdag 17 februari 2022 20:09

> To:  <mailto:hans.teijgeler@quicknet.nl> hans.teijgeler@quicknet.nl

> Cc: Matthew Lange < <mailto:matthew@ic-foods.org> matthew@ic-foods.org>;
Semantic Web 

> < <mailto:semantic-web@w3.org> semantic-web@w3.org>

> Subject: Re: EasierRDF

>  

> By the way, how do you manage the UUIDs?

> For example, when you start to use data from a new DB, do you modify it to
re-write the primary keys (or whatever) to the UUID?

> Or do you add the UUID to the DB, with an internal mapping table to the
keys, and then modify everyones existing queries to include the UUIDs?

> Or is there some wrapping layer around it somehow?

> How do you make your UUIDs discoverable, in particular in relation to
external IDs that come from different DBs?

>  

> Or maybe I am thinking of a different world.

> Cheers

>  

> > On 17 Feb 2022, at 14:03, < <mailto:hans.teijgeler@quicknet.nl>
hans.teijgeler@quicknet.nl> < <mailto:hans.teijgeler@quicknet.nl>
hans.teijgeler@quicknet.nl> wrote:

> > 

> > Hi Hugh,

> > 

> > We use UUIDs, because of the long period in time and the many
contributors of life-cycle information.

> > Next to the UUID we use rdfs:label for easy access. Label can change,
the UUID stays lifelong.

> > 

> > Regards, Hans
Received on Friday, 18 February 2022 18:22:58 UTC