- From: Blake Regalia <blake.regalia@gmail.com>
- Date: Tue, 24 Jul 2018 17:36:47 -0700
- To: hugh@glasers.org
- Cc: tl@rat.io, semantic-web@w3.org, hans.teijgeler@quicknet.nl
- Message-ID: <CANMU0MGaFo3ngOr4DC-JmSGmk7naLs85rYL-OGnxzEdmwSmr+g@mail.gmail.com>
Thomas, It's worth checking out the RDF Dataset Normalization Algorithm (aka the Universal RDF Dataset Normalization Algorithm 2015 or URDNA2015): https://json-ld.github.io/normalization/spec/ Essentially, you would normalize the triple as if it were the only triple in a graph, then perform a hash (SHA-256) on the resulting (normalized) n-quads string and use that as the suffix to some IRI you mint. - Blake On Tue, Jul 24, 2018 at 12:44 PM Hugh Glaser <hugh@glasers.org> wrote: > Hi, > > I think UUIDs don't satisfy > >> - reproducible: the same triple must always get the same identifier > but come close. > > For the non-domain bit: > Can't you just use a (big)hash of the (normalised?) text of the triple? > (So, perhaps expand any namespace declarations, if you want to be > independent of the namespace identifiers chosen, ensure the separators are > all something appropriate, and think about case-sensitivity.) > > Then do something like (php) > function hashIt($name) { > $hash = sha1(mb_strtolower(trim($name))); > return substr($hash, 0, 8) .'-'. substr($hash, 8, 8) .'-'. > substr($hash, 16, 8) .'-'. substr($hash, 24, 8) .'-'. substr($hash, 32,8); > } > (I use this for a similar purpose to make text fuse, hence the > mb_strtolower, but you probably don't want that.) > > which gives the ability to do > http://www.example.com/triple/ee2becf1-1fc8a9c5-55f98825-38c859a6-81e24602 > for > <http://www.w3.org/2001/08/rdf-test/> < > http://purl.org/dc/elements/1.1/creator> "Dave Beckett" . > > It looks like a UUID, but definitely isn't, because it doesn't have the > stuff at the start of the hash or conform to any of the versions of the ISO > standard. > You could add the start stuff if you want (at the cost of a little > entropy, of course.) > > It also doesn't have any of the time-stamp or MAC address salting of some > UUID versions. > You aren't quite clear on that bit: > > - reproducible: the same triple must always get the same identifier > Do you mean if generated by a machine anywhere? > If so, then the straight hash is good - if not you want to add salt of a > local machine parameter, such as MAC address. > > For the domain(s): > I'm not sure what your requirements are. > Do you just want the identifiers to fuse for the same triples? > But not be able to find out anything about them? > That is, a one-way encoding? > That is, opaque IDs? > If so, you can choose anything you like, s long as it doesn't resolve to > anything sensible. > A more interesting thing is to use the domain of the machine doing the ID > generation, and then put the triple at the endpoint for the ID - so > resolution of the ID will return the triple (in text, or whatever you > fancy, maybe RDF?) > You can always make the resolution require authentication if you want the > triple to stay opaque for normal use. > > Best > Hugh > > > On 24 Jul 2018, at 17:17, Hans Teijgeler <hans.teijgeler@quicknet.nl> > wrote: > > > > Hi Thomas, > > > > We use UUIDs throughout, such as > http://www.example.com/c00c7d47-a04a-46c5-9d81-623def3ff31c > > > > Look, for example, at https://www.uuidgenerator.net/ > > Regards, Hans > > 15826,org > > On 24-7-2018 17:56, thomas lörtsch wrote: > >> I’m looking for an algorithm to generate identifiers for any possible > RDF statements that are themselve IRIs, globally unique, reproducible and > not dependend on the source of the statement (i.e. document URI). > >> > >> In other words, the algorithm has to have the following properties: > >> - reproducible: the same triple must always get the same identifier > >> - global: the identifier must be independent from the source of the > triple > >> (a document title, named graph name, graph URI etc) > >> - unique: triples that differ in subject, predicate or object must have > different identifiers > >> - conformant: the identifier must be an IRI > >> - autonomous and finite: it can’t rely on any central repository, > secondary service etc > >> > >> Does such an algorithm exist? Is it even possible given that each node > in the stateent could be an IRI of maximal length and entropy? > >> > >> > >> Thanks, > >> Thomas > >> > > > > > > Virusvrij. www.avg.com > > -- > Hugh > 023 8061 5652 > > >
Received on Wednesday, 25 July 2018 00:37:27 UTC