Re: Semantic Web User Agent Conformance

On Nov 22, 2007 9:38 PM, Sandro Hawke wrote:

> The first time I implemented this bit of code I called it a "pool".  [...]
> The most recent time I implemented this bit of code I called it a
> "harvester". [...] In the SPARQL world, it's a kind of proxy. [...] I think
> maybe the right term for this bit of code is 'Semantic Web Engine'.

The module I've written for this is called "web" because then you can
do "web.GET" for HTTP and so on and it looks pretty and
self-documenting.

The objects that web.GET returns are modelling HTTP Responses, so
they're a Response class in the API, but of course this will vary
depending on protocol. file:// will just return a File object.

What I'm doing, though, is decorating them with a method called
"format" which I can use to get the discovered type of the document
for passing along to the appropriate RDF parser or parsers.

>>> from trio import web
>>> resp = web.GET('http://inamidst.com/sbp/foaf.rdf')
>>> resp.format()
'rdfxml'
>>>

In fact I'm not entirely sure yet what I want to do about GRDDL and
RDFa because you can't tell if either are present unless you actually
parse the document according to all of the GRDDL and RDFa procedures.
Unless, that is, this conformance thread amounts to something! As I've
noted, I've already asked the RDFa folk to make it so that conforming
RDFa documents SHOULD have an @profile (to make RDFa documents easy to
detect) and to state that a conforming RDFa user agent MUST parse
@profile-including documents but only MAY process others.

If that were so, resp.format() could return 'rdfa' after only having
to peek so far as the <head> element in HTML.

I want this kind of efficiency for three reasons. First, it's just
good design. Second, if I have to parse, say, lots of my friends'
travel schedules and I don't know what wacky formats they're all using
today, I don't want to have to slow things down by parsing them using
every possible parser before finding out what their travel schedules
are. Third, both GRDDL and RDFa are currently in a state where you
can't do detection--what happens if continually more and more formats
come out which do the same? The trend has to stop with RDFa.

Anyway, I'm not sure what you call my "web" module... it *is* a World
Wide Web user agent, just augmented by that single Semantic Web
method, the format detector. Is that what you'd call a Semantic Web
Engine?

> I occasionally imagine I'll implement a good one and call it "Alfred"
> (after Alfred Horn, Alfred North Whitehead, and Bruce Wayne's butler,
> since it has elements of each).

Heh! +1 for calling them Alfreds.

-- 
Sean B. Palmer, http://inamidst.com/sbp/

Received on Friday, 23 November 2007 09:13:39 UTC