Re: triple Indexing for Apps like Cimba from Brent Shambaugh on 2015-02-03 (public-rww@w3.org from February 2015)

From: Brent Shambaugh <brent.shambaugh@gmail.com>
Date: Tue, 3 Feb 2015 12:52:54 -0600
To: carmen <_@whats-your.name>
Cc: public-rww <public-rww@w3.org>
Message-ID: <CACvcBVoTBwXf=T1RW28aZWXFTR4kuiDWKNZXpubNhq4B67bZng@mail.gmail.com>

Carmen,

I may not be there yet.  You're talking about sorting things to put them
into an index, as described by Ch1, Data Mining, in Mining of Massive
DataSets, (http://infolab.stanford.edu/~ullman/mmds/ch1.pdf) and using unix
tools to help you get there?

-Brent Shambaugh

Website: bshambaugh.org

On Thu, Jan 22, 2015 at 8:40 PM, carmen <_@whats-your.name> wrote:

> >>> by only indexing the names of linked data containers
>
> all basic UNIX beard stuff, but..
> when possible, construct "cool URIs" for resources which allow for
>
>  [] sortability via simple tools, like SORT(1)
> http://man.whats-your.name/sort.n3
>   - further optimizations might incl incrementing date or integer slugs ,
>     so in append-only >> uris.txt scenarios, it's already sorted
>   - with grep, head, tail, you can chew through megabytes in a fraction of
> a second these days,
>     basic range-queries SQLite or SPARQL also do but with higher
> configuration-complexity/overhead
>   - a MIME type description for lists of URIs formalized:
> http://amundsen.com/hypermedia/urilist/
>   - it is very trivial to write a RDF "parser" for text/uri-list MIME,
>     yielding <#uri> a <http://www.w3.org/2000/01/rdf-schema#Resource>; or
> something along thoselines
>
>  [] have a reasonably balanced tree-structure in the hierpart
>   - identified contents fetched/rendered in ~1s max, per-hour dirs for
> news-aggre, per-month for blogposts?
>
>  [] put crucial identifying/tag/keyword bits in too
>   - can grep on the uri-list , faster than entire file
>   - can GNU find on the pathnames, also fast-like-grep on flash/SSD
>   - opportunity for data-reduction/summarizing/grouping
>   - "graceful degradation", not
> http://site/post/00ea1da4192a2030f9ae023de3b3143ed6,
>      requiring site and its search-engines to be up/used to find anything
>
>
>

Received on Tuesday, 3 February 2015 18:53:24 UTC