- From: Brent Shambaugh <brent.shambaugh@gmail.com>
- Date: Tue, 3 Feb 2015 12:52:54 -0600
- To: carmen <_@whats-your.name>
- Cc: public-rww <public-rww@w3.org>
- Message-ID: <CACvcBVoTBwXf=T1RW28aZWXFTR4kuiDWKNZXpubNhq4B67bZng@mail.gmail.com>
Carmen, I may not be there yet. You're talking about sorting things to put them into an index, as described by Ch1, Data Mining, in Mining of Massive DataSets, (http://infolab.stanford.edu/~ullman/mmds/ch1.pdf) and using unix tools to help you get there? -Brent Shambaugh Website: bshambaugh.org On Thu, Jan 22, 2015 at 8:40 PM, carmen <_@whats-your.name> wrote: > >>> by only indexing the names of linked data containers > > all basic UNIX beard stuff, but.. > when possible, construct "cool URIs" for resources which allow for > > [] sortability via simple tools, like SORT(1) > http://man.whats-your.name/sort.n3 > - further optimizations might incl incrementing date or integer slugs , > so in append-only >> uris.txt scenarios, it's already sorted > - with grep, head, tail, you can chew through megabytes in a fraction of > a second these days, > basic range-queries SQLite or SPARQL also do but with higher > configuration-complexity/overhead > - a MIME type description for lists of URIs formalized: > http://amundsen.com/hypermedia/urilist/ > - it is very trivial to write a RDF "parser" for text/uri-list MIME, > yielding <#uri> a <http://www.w3.org/2000/01/rdf-schema#Resource>; or > something along thoselines > > [] have a reasonably balanced tree-structure in the hierpart > - identified contents fetched/rendered in ~1s max, per-hour dirs for > news-aggre, per-month for blogposts? > > [] put crucial identifying/tag/keyword bits in too > - can grep on the uri-list , faster than entire file > - can GNU find on the pathnames, also fast-like-grep on flash/SSD > - opportunity for data-reduction/summarizing/grouping > - "graceful degradation", not > http://site/post/00ea1da4192a2030f9ae023de3b3143ed6, > requiring site and its search-engines to be up/used to find anything > > >
Received on Tuesday, 3 February 2015 18:53:24 UTC