- From: carmen <_@whats-your.name>
- Date: Tue, 8 Feb 2011 12:07:32 +0000
- To: public-semweb-ui@w3.org
;; This buffer is for notes _babel Java exceptions, server down, and currently this { "items" : [ { "TDATE" : "0:00:00", "MOD" : "D", "STATION" : [ "EWTN (WEWN)", "WEWN", "WEWN EWTN Catholic R.", "Radio Free Asia", "CNR1 Jammer", "IBB", "R.FARDA", "Radio Farda" ], (wong as theres only STATION per row, suppose its open source and i could install a Babel locally and try to figure it out) _google-refine latest snapshot, unreported parse errors, visible as entire lines or even the rest of the document appearing in single facet fieldnames.. wrote a TSV parser that works on the xls2txt(http://wizard.ae.krakow.pl/%7Ejb/xls2txt/) output of a XLS file from hfskeds(http://www.hfskeds.com/skeds/) def csv open(node).readlines.map{|l|l.chomp.split(/,/)}.do{|t| t[0].do{|x| t[1..-1].each_with_index{|r,ow|r.each_with_index{|v,i| yield '#r'+ow.to_s,x[i],v }}}} end this is turned into an inmemory RDF/JSON graph, # fromStream :: Graph -> tripleSource -> Graph def fromStream m,*i send(*i) do |s,p,o| m[s] ||= {'uri'=>s} m[s][p] ||= [] m[s][p].push o end; m end and finalyl to Exhibit JSON via fn Render+'application/json+exhibit',->d,e{ fields=e.q['f'].do{|f|f.split /,/} {items: d.values.map{|r| r.keys.-(['uri']).map{|k| f=k.frag.do{|f|(f.gsub /\W/,'').downcase} # alphanumeric id restriction if !fields || (fields.member? f) r[f]=r[k][0].to_s # rename fieldnames, unwrap value r.delete k unless f==k # cleanup unless id same as before else r.delete k end} r[:label]=r.delete 'uri' # requires label only r }}.to_json} the reason we massage the fieldnames is elucidated in this message http://www.mail-archive.com/general@simile.mit.edu/msg01052.html all of this is integrated into http://gitorious.org/element , drop a .tsv file in a directory ,add ?view=exhibit to querystring , get an exhibit brought me to the next problem, browser freezing up for 90 seconds as Exhibit did something - DOM generation and facet statistics i guess so i forget exactly what happened next but was already using dynamic stylesheets in a mail app (each replied-to line wrapped in class=quote , and span.quote {display:none} added to document to hide. it was pretty obvious this would be faster than document.getElementsByClassName('quote').forEach(function(){this.hide}) decided to take same approach to faceted filtering in browser, i have no idea if my choices r the fastest but they work and will probably do further experiments (eg, situating common facet values as innermost or outermost ala the SPARQL trick of using the smallest pattern first) changing qs view=exhibit -> view=e if a= isnt specified (comma-seperated list of predicate URIs) you are presented with a list, like: http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://rdfs.org/sioc/ns#addressed_to http://rdfs.org/sioc/ns#has_creator http://purl.org/rss/1.0/category [Go] click the ones you want, [Go] at which point, left side is filled with facet-selector panes custom views are selected with ev=board a convention of view/board/base view/board/item where base is handed a function that it calls to put the items wrapped in special divs that the CSS will use to filter a music player, /item draws a single playlist row: http://blog.whats-your.name/public/smiths.png figuring out result set is only half the battle for browser, excessive use of floats, relative sizes and so on become noticeable in huge data sets hfskeds is 30K rows, 22 cols or .66 million triples. roughly the upper bounds of what i'd want to use, on a Netbook. takes about 5 seconds to load a doc and 0.8 second to redraw after filter change can squeeze out faster redraw <pre>, fixed-heights/widths, absolute positioning shortwave schedules were main dataset so lets get into some of those http://blog.whats-your.name/public/25m.html #!/bin/sh curl 'http://m/a.tsv?view=e&ev=sw&a=LANGUAGE,STATION&min=2200&minP=kc/s&maxP=kc/s&max=2500' > 120m.html curl 'http://m/a.tsv?view=e&ev=sw&a=LANGUAGE,STATION&min=3100&minP=kc/s&maxP=kc/s&max=3450' > 90m.html curl 'http://m/a.tsv?view=e&ev=sw&a=LANGUAGE,STATION&min=3890&minP=kc/s&maxP=kc/s&max=4000' > 75m.html curl 'http://m/a.tsv?view=e&ev=sw&a=LANGUAGE,STATION&min=4740&minP=kc/s&maxP=kc/s&max=5125' > 60m.html curl 'http://m/a.tsv?view=e&ev=sw&a=LANGUAGE,STATION&min=5800&minP=kc/s&maxP=kc/s&max=6300' > 49m.html curl 'http://m/a.tsv?view=e&ev=sw&a=LANGUAGE,STATION&min=7200&minP=kc/s&maxP=kc/s&max=7600' > 40m.html curl 'http://m/a.tsv?view=e&ev=sw&a=LANGUAGE,STATION&min=9400&minP=kc/s&maxP=kc/s&max=9999' > 31m.html curl 'http://m/a.tsv?view=e&ev=sw&a=LANGUAGE,STATION&min=11500&minP=kc/s&maxP=kc/s&max=12160' > 25m.html curl 'http://m/a.tsv?view=e&ev=sw&a=LANGUAGE,STATION&min=13500&minP=kc/s&maxP=kc/s&max=13900' > 22m.html curl 'http://m/a.tsv?view=e&ev=sw&a=LANGUAGE,STATION&min=15100&minP=kc/s&maxP=kc/s&max=15900' > 19m.html curl 'http://m/a.tsv?view=e&ev=sw&a=LANGUAGE,STATION&min=17500&minP=kc/s&maxP=kc/s&max=17900' > 16m.html created a HTML file for each band and uploaded to webserver.. as you can see a default filter exists, maxP, minP (matchP too) which is handy for common uses custom filters to be activated via QS (comma-seperated list) can be written, eg exerpt sort of a natural-language one, realizing any time an int < 2400 in email is probably referring to a time, and >2400 to frequency (minus a few false positives for phone numbers, years) m[u]={'uri' => u, 'big'=>l.scan(/\b[A-Z][A-Z][A-Z]+\b/), Content=>l} l.scan(/\d{4,}/){|d| d=d.to_i if (d > 2400) && (d < 30000) m[u]['kc/s']=[d] elsif m[u]['BTIM']=[d];m[u]['ETIM']=[d+30] end} m.delete u unless m[u].has_keys ['BTIM','kc/s'] )} filter mutates the request-time JSON model however sees fit, adding new properties and so on.. http://blog.whats-your.name/public/GlenDoes31.html i did a few more of these, Eibi L and H: http://blog.whats-your.name/public/eibiL.html (this is the largest one up now, data-wise) http://blog.whats-your.name/public/bbc.html BBC onto some other examples /t is a lifestream (http://www.cs.yale.edu/homes/freeman/dissertation/etf.pdf) serving a time-range of resource (with options for start/end direction (Ascending/descending) and count) here filtered by source http://i574.photobucket.com/albums/ss187/ix9/hyper/2011-01-16-203039_1366x768_scrot.png always add a sioc:addressed_to and sioc:creator to triple-izers for this usage /search examine shows us top poster is Cory Doctorow (no surprise there) http://i574.photobucket.com/albums/ss187/ix9/hyper/to.png i imported all boingboing posts for this one, thats discussed @ http://blog.whats-your.name/public/bb.html a couple possibilities hash URIs for filters. i will wait for Exhibit 3.0 to come up with their convention and use that. or just soemthing like facet=val,val2&facet2=val3,val4 visible set - jQuery has a :visible meta-selector, which i have not tried to see how fast it is. would be useful if you want to reserialize a document deleting all invisible (filtered) elements.. probably we should make noise about adding right to css as it likely has feature already eg Ctrl-F only searches visible els "just publish RDFa" would be cool, some JS that introspects a DOM and adds the appropriate facet wrappers -c
Received on Tuesday, 8 February 2011 12:08:34 UTC