Re: schema search-engines from Melvin Carvalho on 2014-05-07 (public-webize@w3.org from May 2014)

From: Melvin Carvalho <melvincarvalho@gmail.com>
Date: Wed, 7 May 2014 17:41:56 +0200
To: cr <_@whats-your.name>
Cc: public-webize@w3.org
Message-ID: <CAKaEYhJn2b76Mj_8SmF364XHuxvDVWB+TRXm5rQ9yYyQm6gdSg@mail.gmail.com>
On 7 May 2014 04:09, cr <_@whats-your.name> wrote:

> apologies in advance for paradox of the need to "Webize" things that are
> on the web
>
> another 11-stars of X509 certified, WebID authenticated, WAC authorized,
> ACL rewriting, LDP platforming, PATCH pushing, real-time-streaming,
> distributed-federating, OWL-reasoning, runtime-concensusing
> reputation-voting-web-of-trusting
>
> no just looking for predicates to use.. found this
> http://www.w3.org/wiki/VocabularyMarket
>
>  http://ws.nju.edu.cn/falcons/ = 503 Service Temporarily Unavailable
>  http://www.schemaweb.info/ = fallen off face of earth, Talis listed in
> DNS
>  http://schemacache.test.talis.com/Schemas/ likewise dead
>
>  first name, seems a suitably simple test query, knowing FOAF should show
> up near the top
>
> http://watson.kmi.open.ac.uk/WatsonWUI/ , fiddling w/ checkboxes re
> properties
>
> just random nonsense. the first 2 URLs returned are:
>  http://www.example.com/ns/1.0#first
>  http://www.example.com/ns/1.0#Name  hello? unresolvable
>
> http://lov.okfn.org/dataset/lov/search/?s=first%20name#s=first%20name
>
> relevance wise, this one is really good. weighted on usage metrics +
> favoring well-named URIs. but it takes 10 seconds to return and can't
> figure out how to get RDF of the results
>
> i am using curl, shell/ruby/perl-scripts, no full DOM/JS environment - LOV
> does't work w/o JS enabled. maybe we can just do what the JS does?
>
> it POSTs this
> 7|0|7|
> http://lov.okfn.org/dataset/lov/search/|2C60FC0C7A6CD06AB87BC12B0015575F|com.pyv.lovsearch.client.rdfRepositoryService.RdfRepositoryService|search|java.lang.String/2004016611|I|firstname|1|2|3|4|6|5|6|6|5|5|5|7|0|-1|0|0|0|
>
> what? those long hashes are maybe temporary request-ID or auth tokens that
> are goign to expire .. response is similarly confusing
>
> good start but can 2.0 be made fast and usable w/o the
> GWT-autogenerated-UI or ?
>
> http://sindice.com/search?q=first+name  FOAF does appear, but not until
> #6 and where's SIOC? Schema.org?
>
> that's w/ generic frontpage search. the 'predicate' and 'ontology' boxes
> on sidebar, which i'd think a better bet, just say:
>
> Wrong response from search API http://api.sindice.com/v3/search , or
> Your search is too broad, trying narrowing the terms
>
> there's some sidebar post about how Sindice is being abandoned by its
> founders on the side, in case i was thinking about depending on this service
>
>
> http://swoogle.umbc.edu/index.php?option=com_frontpage&service=search&queryType=search_swt&searchStart=1&searchString=first%20name
>
> best yet. foaf/sioc at the top. even has RDF version of results, which
> unfortunately are a bad MIME type so tabulator doesn't notice it.
>
> at least that's some other implementation to compare against with basic
> scripts that override the bad MIME though... progress
>
> http://schema.org shows only its own schemas, search only works w/ JS,
> and no idea how to get RDF of the results or resources on Accept: if
> possible at all
>
> if you wrote one, let me know as it wasn't omitted on purpose, just don't
> know about it. and you proably didn't and aren't on this list anyways
>
> chances are everyone's chasing after SEO by doing whatever schema.orgtells them and as a curated-collection of mainstream propertynames it seems
> decent, content-negotiation issues aside
>
> i'll toss another schema search on the scrap-heap. sure to break sooner
> than swoogle as i've already ruined 3-4 prior iterations w/ no intentions
> of resurrecting
>
> v3 used a CSV file, and grep. it took about 0.5s to return on a 4MB file,
> and was weighted on http://gromgull.net/2010/09/btc2010data/
>
> that 0.5s, + 200ms of network/connection/roundtrip-etc latency, maybe
> 1-200ms of ruby serialization latency and about the same for tabulator's JS
> to parse it made maybe 1.5s (using mid-00s laptops with single-core 1Ghz
> CPUs)
>  just annoying enough of a lag to try a v4 with:
>  Groonga for search <http://groonga.org/> just a mmap() file or so,
> without extra SQL deps and good Ruby integration
>  RDF libraries for Ruby <http://rdf.greggkellogg.net/yard/index.html>
>  Prefix.cc to bootstrap <http://prefix.cc/>
>  thin as a webserver <http://code.macournoyer.com/thin/>
>  ww to add linked-data stars <http://src.whats-your.name/ww/ruby>
>  tabulator to render+browse <https://github.com/linkeddata/tabulator>
>
> if http://data.whats-your.name/ opens in tabulator, well i don't know how
> to add a searchbox. XMLLiteral of <form> and hope it works? you could
> instead request
>    http://data.whats-your.name/index.html and enter your terms, or add to
> URL like http://data.whats-your.name/first+name
>
> should your search-term or path match a known prefix, you're forwarded
> right to the entire ontology. additionally tabulator is enabled if you're
> on HTML view (on the other page-types, like toplevel schema index or
> search-results, add ?view=tabulate to querystring, or ?rdfa for RDFa)
>
> http://data.whats-your.name/angstrom.ttl - besides .html other
> explicit-format suffixes within reason can be used to override conneg
>
> then you'll get to find out how many servers are missing CORS headers..
>

I used to use swoogle a lot, but I think LOV is the goto place for vocabs
now.  Sindice I think is shutting down.

prefix.cc is also nice ...
Received on Wednesday, 7 May 2014 15:42:27 UTC