database-free 'pagination' of resource and triple-patterns from carmen on 2010-06-21 (public-semweb-ui@w3.org from June 2010)

From: carmen <_@whats-your.name>
Date: Mon, 21 Jun 2010 12:20:59 -0700
To: public-semweb-ui@w3.org
Message-ID: <20100621192059.GA3696@myhost.Belkin>
 15 recent-most edits, 
     'get all comments belonging to this post' 
  'show me stuff from tag 'Lenovo' starting march 2010'

thes things shouldn't require a big SQL engine, and 437K of php code

why delve into writing code??
Redland backends had poor SQL, and tons of roundtrips. find(s p o) returning too much stuff, wrapping it all in Ruby just nuked interpreter with OOM errors. Virtuoso too big, java stuff requires JVM and mucho RAM, 4store didnt exist and now that it does cant get GCC quite happy on my phone, which is my cloud-server)

so need some primitive understandable (Easily implementably in shell scripts, or any language) functions to enable basic access of specific data

<aside> 'secret sauce' reddit-type 'hype' algorithms or things that elevates a web app above commoditized/has-500-php-implementations, usually not SPARQL-expressible. can one recursively walk a path based on some predicates? i strive for minimal-overhead-as-possible per pattern lookup to get into memory of a turing-complete langugage </aside>

essentially path generation and a stat() and/or readdir()

* resources, triples, tuples, literals 
   ^ convert to/from v
* URIs
   ^ convert to/from v
* paths


(S _ _), (S P _), (S P O) is direct, each a child of the previous in tree

(_ P _), and (_ P O) are in base index - rotate to (Pred Obj Subj), and use already-have-it infrastructure

(_ _ O) isnt so interesting, as youve lost the typing. it is the domain of full-text search or semantic-extraction

\but dont really want _ (find all matches) rather offsets/ranges in b-tree (courtesy of btrfs)


GET already has one URI, at most we have to specify another URI, and an offset

eg GET /don@dada.org?,=sioc:from&o=1984-03-13&count=13

such as danbri's recent mails:

http://i574.photobucket.com/albums/ss187/ix9/hyper/2010-06-09-221605_1368x768_scrot.png


/posts/page/3  is such an awful namefail, beacuse its constantly changing, and meaningless
why not count up from oldest material so number identifiers are semi-reliable? 

blogger gets it right, 
but i dont have google infrastructure so ive written some code (element on cz/gitorious/rubyforge)

Twitter decided they need their identifiers sortable

its definitely cool,as your pattern results are additionally subsorted by another property, like date

(why my procmailrc has a strftime() as ~/.mail/`date +%Y/%m/%d/%h`, and a symlink it to the message-ID's path so it can be found both ways)

 http://i574.photobucket.com/albums/ss187/ix9/hyper/boing.png  <tag>

http://i574.photobucket.com/albums/ss187/ix9/hyper/2010-06-09-211435_1368x768_scrot.png http://i574.photobucket.com/albums/ss187/ix9/hyper/2010-06-09-205209_1368x768_scrot.png 

all this stuff is instantaneous on my slow Atom using a SD card as the database. proably some CPU left to try 4store again on 'research' queries

sometimes you dont want dates in URIs, AND people use tag: URIs for posts (evil!?)  <#tag>

typed literals have custom URI-izers , end up in paths like /date/2010/07/03 and then depth-first range on index works on arbitrary id'd resources


always interested in thoughts on URI schemes, your favorite Ceph/NFS/rsync cloud-persistence schemes organic tarball decomposition etc..
Received on Monday, 21 June 2010 19:21:40 UTC