Re: Experimental RDFa extractor in JS from Gregg Kellogg on 2012-04-20 (public-rdfa-wg@w3.org from April 2012)

From: Gregg Kellogg <gregg@greggkellogg.net>
Date: Fri, 20 Apr 2012 09:56:50 -0400
To: Niklas Lindström <lindstream@gmail.com>
CC: Ivan Herman <ivan@w3.org>, public-rdfa-wg <public-rdfa-wg@w3.org>
Message-ID: <4A7F2381-520B-4F73-8989-C76AEFA3C2A1@kellogg-assoc.com>
On Apr 20, 2012, at 3:52 AM, "Niklas Lindström" <lindstream@gmail.com> wrote:

> Hi Ivan,
> 
> 2012/4/20 Ivan Herman <ivan@w3.org>:
>> Niklas,
>> 
>> I think this is a great idea and I am very excited to see that. I think that a system that returns JSON to application developers is the best possible choice for now, as we do not have any RDF API. And, maybe, that is all what WebApp developers need.
> 
> Thanks! I agree. It felt like a valuable way forward with fairly
> little effort (though piggybacking on the JSON-LD design work of
> course). It'll be very interesting to evaluate the usability of the
> resulting data in various scenarios.
> 
> 
>> I think the first goal should be (if you can) is to cover the whole of Lite, plus possibly fully cover @about. That would be a major first step. Then it could be completed.
> 
> Definitely. I think Lite is basically covered already. It's the
> interplay of many attributes in the same element that I haven't got to
> yet (to e.g. fully cover @about). Right now I'm short on time, but I
> hope to continue down this path some time next week.

I've made my own fork, and I might try to improve coverage as well, and possibly be able to run through test cases without a distiller.

>> What kind of JSON-LD do you produce? For pyRdfa I tried to push as much as I could into @context; mainly in the case of @vocab usage that meant that the rest of the JSON part really looked very simple. That is a major plus for WebApp developers.
> 
> Yes, that's what I do too, for exactly those reasons. The shape of the
> output is entirely based on the form of the input, i.e. using the same
> terms and CURIEs (populating @context as needed). One thing I haven't
> yet done, but plan to, is to merge descriptions about the same
> resource even if they're dispersed throughout the page.

Note that you can leave such merging to JSON-LD framing, which does this anyway.

> While that
> does deviate from the actual shape in the source page, it is so much
> better for consumption, and I think is to be expected. Another thing I
> don't do is any kind of coercion. Literals with datatype or deviating
> from any given @language are represented in expanded JSON-LD form.
> I've yet to decide whether to change that or make it configurable.

This might also be left to JSON-LD API methods. For instance, the "automatic" flag to compaction could generate the best context for you to use, and coerce your data for you. It can be expensive, though, and for any real application, a JSON-LD context matching the data could be provided to compact or frame.

> (You should really try out the bookmarklet [1] in a Firefox (ideally
> with the JSONView [2] plugin installed). :) I tried it on your own
> FOAF page for instance, which is rich in data and really interesting
> to examine this way. (Note that @xmlns:* aren't captured yet though,
> so the result here isn't really correct.))

Running in-browser, access to xmlns* might be challenging.

> It should be noted that, of course, graph cycles aren't possible to
> follow directly in a tree. So any time a reference to an already
> created resource description (i.e. a JSON object @id:d with the
> resource IRI) is referenced, I just put a link there (an object with
> just the @id).

Perfect! This is what framing is for, to turn such references into object embeds.

> While I plan to expose the idMap I'll use for the
> aforementioned dispersed resource merging, trying to solve this in
> general means veering into the API design again.

If you d this at all, you might just automatically create a frame matching the existing document structure.

> While I have many
> ideas for how to get there from here eventually, I'll focus on the
> basic JSON-LD tree for now. Hopefully we'll se how valuable that
> becomes in itself in various scenarios. (As we know and have seen
> before, there are many intricate tradeoffs possible regarding e.g.
> graph vs. tree and data details vs. strings.)

>> The only, though insignificant, issue is that you won't be able to run the official test suite directly. Nevertheless, I think it would be hugely important to have whatever you have be part of the official report (via a manually edited EARL file, for example).
> 
> Absolutely. Actually, I think I'll manage to set up an extractor
> service for this eventually. I'm already using Node to run it on the
> command-line against test files, so it should be straightforward. The
> remaining thing then is whether the test runner accepts JSON-LD (I
> actually think it might – Gregg?), or if I should plug this into
> Antonio Garrote's rdfstore-js [3]. Either way it should be quite
> doable.

The distiller does accept JSON-LD, but probably needs a small update. You could also use jsonld.js and use the toRDF method to get n-triples out of it in the page.

>> Niklas, this could be very important...

Agreed!

Gregg

>> Thanks
> 
> Thanks for the positive feedback!
> 
> Best regards,
> Niklas
> 
> [1]: http://niklasl.github.com/rdfa-lab/

> [2]: http://jsonview.com/

> [3]: https://github.com/antoniogarrote/rdfstore-js

> 
> 
>> Ivan
>> 
>> 
>> On Apr 20, 2012, at 01:58 , Niklas Lindström wrote:
>> 
>>> Hi all!
>>> 
>>> The last couple of days I've been experimenting with a different kind
>>> of approach to implementing an RDFa extractor. The result so far is a
>>> draft with admittedly rather partial coverage. However, I hope some
>>> aspects of it will be of interest even at this stage:
>>> 
>>> 1. It is implemented in pure Javascript. (Well, actually, in some 170
>>> lines of CoffeeScript, but the generated result is the same.)
>>> 2. It runs both in the browser and on Node (used with jsdom).
>>> 3. It does not produce triples. It directly creates a JSON-LD extract
>>> (corresponding in shape to the RDFa). This is the difference, and the
>>> fun part.
>>> 
>>> Now, it really doesn't handle anything but the most simple RDFa 1.1.
>>> Possibly all of Lite, plus @datatype, @rel (including hanging),
>>> @inlist, @rev and perhaps one or two more. It only copes with @about
>>> if it's alone, it doesn't handle combinations of @rel and @property,
>>> and so on. I'll strive to make it a lot more compliant given time of
>>> course.
>>> 
>>> - You can check out the code at: https://github.com/niklasl/rdfa-lab

>>> - Or enjoy the bookmarklet (only tested in Firefox), available at:
>>> http://niklasl.github.com/rdfa-lab/

>>> 
>>> (Just add the latter to your bookmarks and apply on any page
>>> containing RDFa. I recommend the JSONView [1] browser add-on for a
>>> good experience.)
>>> 
>>> I hope you'll enjoy the little things it can do. (For one, using the
>>> resulting JSON-LD directly in a JS application should prove
>>> interesting.)
>>> 
>>> Best regards,
>>> Niklas
>>> 
>>> [1]: http://jsonview.com/

>>> 
>> 
>> 
>> ----
>> Ivan Herman, W3C Semantic Web Activity Lead
>> Home: http://www.w3.org/People/Ivan/

>> mobile: +31-641044153
>> FOAF: http://www.ivan-herman.net/foaf.rdf

>> 
>> 
>> 
>> 
>> 
>
Received on Friday, 20 April 2012 13:56:30 UTC