- From: Eric Korb <eric.korb@accreditrust.com>
- Date: Thu, 14 May 2015 08:03:28 -0400
- To: Dave Longley <dlongley@digitalbazaar.com>
- Cc: Gregg Kellogg <gregg@greggkellogg.com>, Credentials Community Group <public-credentials@w3.org>, ahogan@dcc.uchile.cl
- Message-ID: <CAMX+RnDvcnYpvBDyOgzy3QhL80eS-p4wNirXuTsUhF=ACtJZNw@mail.gmail.com>
+1 *"Trust only credentials that are TrueCred*™ *verified."* ---------------------------------- Eric Korb, President/CEO - accreditrust.com <https://www.accreditrust.com> On Wed, May 13, 2015 at 11:20 AM, Dave Longley <dlongley@digitalbazaar.com> wrote: > On 05/13/2015 02:47 AM, Gregg Kellogg wrote: > > Worth looking at for a comparison. > > > I think Aidan Hogan's paper actually describes a nearly identical > algorithm to what we created. Which is a good thing. His paper does a much > better job of laying it out in mathematical terms. > > > Gregg Kellogg > > Sent from my iPad > > Begin forwarded message: > > *Resent-From:* semantic-web@w3.org > *From:* ahogan@dcc.uchile.cl > *Date:* May 12, 2015 at 11:14:50 PM PDT > *To:* semantic-web@w3.org > *Subject:* *Re: deterministic naming of blank nodes* > > Hi, > > Sorry for resurrecting old threads but this discussion sort of prompted me > to work on the topic of "deterministic naming of blank nodes" and I > thought that it might be time (six months or so later) to report back on > some findings. > > The result is the paper here: > > Aidan Hogan. "Skolemising Blank Nodes while Preserving Isomorphism". In > WWW, Florence, Italy, May 18–22, 2015 (to appear). > Available from: http://aidanhogan.com/docs/skolems_blank_nodes_www.pdf > > I'll be presenting it next Friday if anyone happens to be around the > Florence area. > > Roughly speaking, the paper presents an algorithm for deterministically > labelling blank nodes of any RDF graph such that the labels of two > isomorphic RDF graphs will "correspond". This could be used to find > isomorphic RDF graphs in a large set of RDF graphs (without needing > pairwise checks), for computing signatures, or for Skolemising in such a > way that the output of two RDF graphs will be equal if and only if they > are isomorphic. > > The algorithm again works for general RDF graphs and doesn't assume any > well-behavedness or similar. The algorithm is exponential (based roughly > on a Nauty-style recursion) but the experiments show that, in terms of > isomorphism at least, you would have to have really really really *really* > weird RDF graphs to run into any real trouble (we're talking cliques of > blank nodes with k>55 or Miyazki graphs or similar). We skolemised all the > individual graphs in the BTC-14 collection ... the slowest took 31–40 > seconds, mainly due to its size: 7.3 million triples with 254 thousand > blank nodes. > > One drawback of the current approach is that at the moment you have to > load the RDF graph into memory, which may be prohibitive for very very > large graphs (of connected blank nodes), but I'm not sure how often that > happens in practice. (It wasn't a problem for any of the graphs in the > BTC-14 dataset for example.) > > There's also a software package available here: > > https://github.com/aidhog/blabel/wiki > > Of course the usual disclaimers about research code apply! > > Anyways, I thought these results might be of interest and hopefully might > have applications in the context of Skolemisation or signing RDF graphs or > similar. > > Best, > Aidan > > On 21/10/2014 04:30, Laurens Rietveld wrote: > > Coincidentally, I will present SampLD[1] in two hours at the ISWC (11.00 > > local time, sala 300), which might be suitable for your experiment. > > SampLD is a scalable approach for sampling RDF graphs, solely using the > > topology of the graph: > > > The Linked Data cloud has grown to become the largest knowledge base > > ever constructed. > > Its size is now turning into a major bottleneck for many > > applications. In order to facilitate access to this structured > > information, this paper proposes an automatic sampling method > > targeted at maximizing answer coverage for applications using SPARQL > > querying. > > The approach presented in this paper is novel: no similar RDF > > sampling approach exist. Additionally, the concept of creating a > > sample aimed at maximizing SPARQL answer coverage, is unique. > > We empirically show that the relevance of triples for sampling (a > > semantic notion) is influenced by the topology of the graph (purely > > structural), and can be determined > > without prior knowledge of the queries. > > Experiments show a significantly higher recall of topology based > > sampling methods over random and naive baseline approaches (e.g. up > > to 90% for Open-BioMed at a sample size of 6%). > > > > And, to support LOD-scale experiments, you might be interested in the > > LOD Laundromat[2,3] as well, presented this Thursday at ISWC as well. > > Here, we (re)publish clean, sanitized versions of Linked Data sets as > > sorted gzipped n-triples (13 billion triples and counting), with > > corresponding meta-data (e.g. the number of blank nodes in this > > dataset[4]). > > > Best, Laurens > > > [1] > > > http://laurensrietveld.nl/pdf/Structural_Properties_as_Proxy_for_Semantic_Relevance_in_RDF_Graph_Sampling.pdf > > [2] > > > http://laurensrietveld.nl/pdf/LOD_Laundromat_-_A_Uniform_Way_of_Publishing_Other_Peoples_Dirty_Data.pdf > > [3] http://lodlaundromat.org > > [4] > > > http://lodlaundromat.org/sparql/?query=PREFIX+llm%3A+%3Chttp%3A%2F%2Flodlaundromat.org%2Fmetrics%2Fontology%2F%3E%0APREFIX+void%3A+%3Chttp%3A%2F%2Frdfs.org%2Fns%2Fvoid%23%3E%0ASELECT+*+WHERE+%7B%0A++%5B%5D+llm%3Ametrics%2Fllm%3AdistinctBlankNodes+%3FnumBnodes+%3B%0A++++%09void%3AdataDump+%3Fdownload%0A%7D+ORDER+BY+DESC(%3FnumBnodes)+LIMIT+100 > <http://lodlaundromat.org/sparql/?query=PREFIX+llm%3A+%3Chttp%3A%2F%2Flodlaundromat.org%2Fmetrics%2Fontology%2F%3E%0APREFIX+void%3A+%3Chttp%3A%2F%2Frdfs.org%2Fns%2Fvoid%23%3E%0ASELECT+*+WHERE+%7B%0A++%5B%5D+llm%3Ametrics%2Fllm%3AdistinctBlankNodes+%3FnumBnodes+%3B%0A++++%09void%3AdataDump+%3Fdownload%0A%7D+ORDER+BY+DESC%28%3FnumBnodes%29+LIMIT+100> > > -- > > > VU University Amsterdam____ > > > Faculty of Exact Sciences____ > > > Department of Computer Science____ > > > De Boelelaan 1081 A____ > > > 1081 HV Amsterdam____ > > > The Netherlands > > > www.laurensrietveld.nl <http://www.laurensrietveld.nl> > > > laurens.rietveld@vu.nl <mailto:laurens.rietveld@vu.nl > <laurens.rietveld@vu.nl>> > > > Visiting address: > > > De Boelelaan 1081____ > > > Science Building Room T312 > > > > On Mon, Oct 20, 2014 at 5:47 PM, Sandro Hawke <sandro@w3.org > > <mailto:sandro@w3.org <sandro@w3.org>>> wrote: > > > On 10/07/2014 10:25 PM, Sampo Syreeni wrote: > > On 2014-10-07, Sandro Hawke wrote: > > > That may be true, but it is hard for me to see how any benefit this > > could bring would outweigh the absolute pain in the ass it would be > > for everyone to change their RDF stacks. > > > It was not me who said that. That was Alan Ruttenberg. > > > > So, why not subdivide the process? It ought to be easy and efficient > > enough to detect a rather expansive subset of graphs which do admit > > unique and efficient labeling. At the very least graphs which only > > use > > a blank node precisely twice (to define something and to refer to it > > once as in the bracket notation) are pretty simple, using a simple > > hash table with counters -- that perhaps being the commonest case as > > well. > > > If the test succeeds, define a unique labeling based on the rest of > > the attributes of the triple and lexical ordering; if not, ask the > > user whether general graph isomorphism comparison is wanted, and if > > so, do that, somehow signaling that it really went that far (perhaps > > inband in the format of the labels? or out of band as the case may > > be); if not, or if you can't do graph isomorphism in your code, then > > slap on nonunique labels, again differentiating them somehow from the > > first two cases. > > > That is certainly not an easy or clean solution, but it doesn't break > > the stack, and it works in most of the places where you want to do > > fast path processing under the assumption that in fact the labels are > > canonical, and can be relied upon to have 1-1 correpondence from > > syntax to node. > > > I agree. Does anyone have a good sampling of the LOD cloud we could > > easily use for this experiment? > > > -- Sandro > > > > > > > -- > Dave Longley > CTO > Digital Bazaar, Inc.http://digitalbazaar.com > >
Received on Thursday, 14 May 2015 12:04:17 UTC