- From: Dimitris Kontokostas <jimkont@gmail.com>
- Date: Wed, 25 Jul 2018 10:10:09 +0300
- To: blake.regalia@gmail.com
- Cc: Hugh Glaser <hugh@glasers.org>, tl@rat.io, semantic-web@w3.org, hans.teijgeler@quicknet.nl
- Message-ID: <CA+u4+a2kmvDb1M4H_==8si2bnv8SnGzjNYL69AjvUxR4qS96FQ@mail.gmail.com>
Hi, a simple approach would be to generate a unique string from the triple in the form of e.g. "$subjectIri - $predicateIri - $objectStringValueOrIri - $datatypeOREmpty - $langOrEmpty" (it could also be the hash of the NTriples representation of the triple) then perform a strong hash algorithm on the string, e.g. sha256 and append that to a namespace of your choice if you find the IRI too big you can pass the hash through a url-safe base 64 function Best, Dimitris p.s. of course, blank nodes cannot go this route On Wed, Jul 25, 2018 at 3:43 AM Blake Regalia <blake.regalia@gmail.com> wrote: > Thomas, > > It's worth checking out the RDF Dataset Normalization Algorithm (aka the > Universal RDF Dataset Normalization Algorithm 2015 or URDNA2015): > https://json-ld.github.io/normalization/spec/ > > Essentially, you would normalize the triple as if it were the only triple > in a graph, then perform a hash (SHA-256) on the resulting (normalized) > n-quads string and use that as the suffix to some IRI you mint. > > - Blake > > > On Tue, Jul 24, 2018 at 12:44 PM Hugh Glaser <hugh@glasers.org> wrote: > >> Hi, >> >> I think UUIDs don't satisfy >> >> - reproducible: the same triple must always get the same identifier >> but come close. >> >> For the non-domain bit: >> Can't you just use a (big)hash of the (normalised?) text of the triple? >> (So, perhaps expand any namespace declarations, if you want to be >> independent of the namespace identifiers chosen, ensure the separators are >> all something appropriate, and think about case-sensitivity.) >> >> Then do something like (php) >> function hashIt($name) { >> $hash = sha1(mb_strtolower(trim($name))); >> return substr($hash, 0, 8) .'-'. substr($hash, 8, 8) .'-'. >> substr($hash, 16, 8) .'-'. substr($hash, 24, 8) .'-'. substr($hash, 32,8); >> } >> (I use this for a similar purpose to make text fuse, hence the >> mb_strtolower, but you probably don't want that.) >> >> which gives the ability to do >> http://www.example.com/triple/ee2becf1-1fc8a9c5-55f98825-38c859a6-81e24602 >> for >> <http://www.w3.org/2001/08/rdf-test/> < >> http://purl.org/dc/elements/1.1/creator> "Dave Beckett" . >> >> It looks like a UUID, but definitely isn't, because it doesn't have the >> stuff at the start of the hash or conform to any of the versions of the ISO >> standard. >> You could add the start stuff if you want (at the cost of a little >> entropy, of course.) >> >> It also doesn't have any of the time-stamp or MAC address salting of some >> UUID versions. >> You aren't quite clear on that bit: >> > - reproducible: the same triple must always get the same identifier >> Do you mean if generated by a machine anywhere? >> If so, then the straight hash is good - if not you want to add salt of a >> local machine parameter, such as MAC address. >> >> For the domain(s): >> I'm not sure what your requirements are. >> Do you just want the identifiers to fuse for the same triples? >> But not be able to find out anything about them? >> That is, a one-way encoding? >> That is, opaque IDs? >> If so, you can choose anything you like, s long as it doesn't resolve to >> anything sensible. >> A more interesting thing is to use the domain of the machine doing the ID >> generation, and then put the triple at the endpoint for the ID - so >> resolution of the ID will return the triple (in text, or whatever you >> fancy, maybe RDF?) >> You can always make the resolution require authentication if you want the >> triple to stay opaque for normal use. >> >> Best >> Hugh >> >> > On 24 Jul 2018, at 17:17, Hans Teijgeler <hans.teijgeler@quicknet.nl> >> wrote: >> > >> > Hi Thomas, >> > >> > We use UUIDs throughout, such as >> http://www.example.com/c00c7d47-a04a-46c5-9d81-623def3ff31c >> > >> > Look, for example, at https://www.uuidgenerator.net/ >> > Regards, Hans >> > 15826,org >> > On 24-7-2018 17:56, thomas lörtsch wrote: >> >> I’m looking for an algorithm to generate identifiers for any possible >> RDF statements that are themselve IRIs, globally unique, reproducible and >> not dependend on the source of the statement (i.e. document URI). >> >> >> >> In other words, the algorithm has to have the following properties: >> >> - reproducible: the same triple must always get the same identifier >> >> - global: the identifier must be independent from the source of the >> triple >> >> (a document title, named graph name, graph URI etc) >> >> - unique: triples that differ in subject, predicate or object must >> have different identifiers >> >> - conformant: the identifier must be an IRI >> >> - autonomous and finite: it can’t rely on any central repository, >> secondary service etc >> >> >> >> Does such an algorithm exist? Is it even possible given that each node >> in the stateent could be an IRI of maximal length and entropy? >> >> >> >> >> >> Thanks, >> >> Thomas >> >> >> > >> > >> > Virusvrij. www.avg.com >> >> -- >> Hugh >> 023 8061 5652 >> >> >> -- Kontokostas Dimitris
Received on Wednesday, 25 July 2018 07:11:00 UTC