- From: Stephen D. Williams <sdw@lig.net>
- Date: Fri, 21 Feb 2014 02:19:56 -0800
- To: Miel Vander Sande <miel.vandersande@ugent.be>
- CC: SWIG Web <semantic-web@w3.org>
- Message-ID: <5307284C.3050802@lig.net>
Thanks! That is a very helpful pointer. I've been concentrating on other areas too long... On an initial glance, I don't see any active standardization work, which is good since it doesn't seem to have all the features I would want... In particular, N-quads support (and I have particular interests in optimizing fine-grained named graph metadata handling), some possibly better encoding methods, and an in-place modifiable version. Plus explicit support for deltas / chunks / baseline. Some very interesting choices (bitmap graph representation) and a lot of related papers to digest. I'm glad people have recognized the need and have spent good effort solving the problem. I'll see what I can add as I get into it soon. Stephen On 2/21/14, 1:26 AM, Miel Vander Sande wrote: > Hi Stephen, > > I think DERI has created exactly what you're looking for. It's called http://www.rdfhdt.org/ and we've recently started using it. > It's not only compact, but it also allows incredibly fast lookup. > > Kind regards, > > Miel Vander Sande > Researcher Semantic Web - Linked Open Data > Multimedia Lab [Ghent University - iMinds] > > On Feb 18, 2014, at 7:20 PM, Stephen Williams <sdw@lig.net <mailto:sdw@lig.net>> wrote: > >> I worked on W3C Efficient XML Interchange (EXI) from before the formation of the working group almost all the way through >> standardization when my work situation changed. A number of my ideas are in there, although a number that I felt strongly about >> are not (deltas, standardization of interchange of compiled schema baseline, byte alignment of byte data through a fast, >> efficient novel peephole algorithm that adds almost no padding). At the end, I became much more interested in the RDF >> interchange problem, but have worked on other things since. >> >> At that time and somewhat since I developed an architecture and design for efficient RDF / N-tuples. There are many tradeoffs, >> but we spent years working together on EXI examining very similar issues but for a significantly different problem space. RDF >> and other graph data has a wider range of possible uses, characteristics of data, and possibilities for specific and general >> optimization. I'm planning to finish that design and implementation soon for my own work that leverages the semantic web >> technologies, including RDF or RDF-like data. I'm more focused on the user interface paradigm, app, ecosystem design than >> interchange, but that is a big part of the problem. >> >> One of the main points of EXI, and of ERI, is compactness and fast usable representation, avoiding and/or minimizing parsing, >> without necessarily requiring decompression. Decompression is optionally layered when it makes sense because of repetition or >> data names/values, but the structure can be compact without compression. In the case of triples (and quads, etc.), it is very >> easy to fully separate structure at two levels from values which are naturally reused, dictionary "compressed", and then >> optionally compressed. >> >> I'm very interested in quads or n-tuples (probably N-Quads) where I don't have to represent provenance, document/database/group >> membership, and other metadata strictly as triples (although they can always be recast as triples). I'm also interested in ways >> of chunking/delta graph data for efficiency of transport, memory, computation, etc. >> >> Has anyone been working on compact, efficient binary representation of RDF/N-Quads or similar? Chunking / deltas? >> Does anyone want to work on these problems? I'm deep into some projects, but I might be interested in some arrangement to push >> this forward, consulting or co-founding or something otherwise mutually beneficial. As was my early binary / efficient XML work, >> this is all independent research for me. >> >> My main interest is in solving the user interface / visualization / mental model problem for A) a much better experience when >> working with all kinds of large/complex knowledge and B) interfacing to / representing / creating organized semantic / linked >> data. I'm working on a Knowlege Browser and related paradigms to complement the web browser and search paradigms. My goal is to >> improve knowledge organization and access for everyone, from neophytes to advanced knowledge-based workers. >> >> Thanks, >> Stephen >> >> On 2/18/14 12:53 AM, Ross Horne wrote: >>> Hi Michel, >>> >>> I think you point is worth considering. Google make heavy use of >>> zippy, rather than gzip, simply to reduce the latency of reading and >>> sending large amounts of data, see [1]. (Of course, storage is not a >>> limitation.) >>> >>> Could zippy have a role in Linked Data protocols? >>> >>> Regards, >>> >>> Ross >>> >>> [1] Dean, Jeff. "Designs, lessons and advice from building large >>> distributed systems." Keynote from LADIS (2009). >>> http://www.lamsade.dauphine.fr/~litwin/cours98/CoursBD/doc/dean-keynote-ladis2009_scalable_distributed_google_system.pdf >>> >>> >>> On 18 February 2014 10:42, Michel Dumontier<michel.dumontier@gmail.com> wrote: >>>> Hi Tim, >>>> That folder contains 350GB of compressed RDF. I'm not about to unzip it >>>> because a crawler can't decompress it on the fly. Honestly, it worries me >>>> that people aren't considering the practicalities of storing, indexing, and >>>> presenting all this data. >>>> Nevertheless, Bio2RDF does provide void definitions, URI resolution, and >>>> access to SPARQL endpoints. I can only hope our data gets discovered. >>>> >>>> m. >>>> >>>> Michel Dumontier >>>> Associate Professor of Medicine (Biomedical Informatics), Stanford >>>> University >>>> Chair, W3C Semantic Web for Health Care and the Life Sciences Interest Group >>>> http://dumontierlab.com >>>> >>>> >>>> On Sat, Feb 15, 2014 at 10:31 PM, Tim Berners-Lee<timbl@w3.org> wrote: >>>>> On 2014-02 -14, at 09:46, Michel Dumontier wrote: >>>>> >>>>> Andreas, >>>>> >>>>> I'd like to help by getting bio2rdf data into the crawl, really. but we >>>>> gzip all of our files, and they are in n-quads format. >>>>> >>>>> http://download.bio2rdf.org/release/3/ >>>>> >>>>> think you can add gzip/bzip2 support ? >>>>> >>>>> m. >>>>> >>>>> Michel Dumontier >>>>> Associate Professor of Medicine (Biomedical Informatics), Stanford >>>>> University >>>>> Chair, W3C Semantic Web for Health Care and the Life Sciences Interest >>>>> Group >>>>> http://dumontierlab.com >>>>> >>>>> >>>>> An on 2014-02 -15, at 18:00, Hugh Glaser wrote: >>>>> >>>>> Hi Andreas and Tobias. >>>>> Good luck! >>>>> Actually, I think essentially ignoring dumps and doing a "real" crawl, is >>>>> a feature, rather than a bug. >>>>> >>>>> >>>>> >>>>> Michel, >>>>> >>>>> Agree with High. I would encourage you unzip the data files on your own >>>>> servers >>>>> so the URIs will work and your data is really Linked Data. >>>>> There are lots of advantages to the community to be compatible. >>>>> >>>>> Tim >>>>> >>>>> >> >> >> -- >> Stephen D. Williamssdw@lig.net stephendwilliams@gmail.com LinkedIn:http://sdw.st/in >> V:650-450-UNIX (8649) V:866.SDW.UNIX V:703.371.9362 F:703.995.0407 >> AIM:sdw Skype:StephenDWilliams Yahoo:sdwlignet Resume:http://sdw.st/gres >> Personal:http://sdw.st facebook.com/sdwlig <http://facebook.com/sdwlig> twitter.com/scienteer <http://twitter.com/scienteer> > -- Stephen D. Williams sdw@lig.net stephendwilliams@gmail.com LinkedIn: http://sdw.st/in V:650-450-UNIX (8649) V:866.SDW.UNIX V:703.371.9362 F:703.995.0407 AIM:sdw Skype:StephenDWilliams Yahoo:sdwlignet Resume: http://sdw.st/gres Personal: http://sdw.st facebook.com/sdwlig twitter.com/scienteer
Received on Friday, 21 February 2014 10:20:25 UTC