Re: Experimental RDFa extractor in JS from Gregg Kellogg on 2012-04-20 (public-rdfa-wg@w3.org from April 2012)

From: Gregg Kellogg <gregg@greggkellogg.net>
Date: Fri, 20 Apr 2012 13:40:01 -0400
To: Niklas Lindström <lindstream@gmail.com>
CC: Ivan Herman <ivan@w3.org>, public-rdfa-wg <public-rdfa-wg@w3.org>
Message-ID: <51F05E07-A587-4A55-956F-148E6084E213@greggkellogg.net>
On Apr 20, 2012, at 9:31 AM, Niklas Lindström wrote:

> Hi!
> 
> 2012/4/20 Gregg Kellogg <gregg@greggkellogg.net>:
>> On Apr 20, 2012, at 8:23 AM, Niklas Lindström wrote:
>> 
>>> Hi Gregg,
>>> 
>>> 2012/4/20 Gregg Kellogg <gregg@greggkellogg.net>:
>>>> On Apr 20, 2012, at 3:52 AM, "Niklas Lindström" <lindstream@gmail.com> wrote:
>>> [...]
>>>>> Yes, that's what I do too, for exactly those reasons. The shape of the
>>>>> output is entirely based on the form of the input, i.e. using the same
>>>>> terms and CURIEs (populating @context as needed). One thing I haven't
>>>>> yet done, but plan to, is to merge descriptions about the same
>>>>> resource even if they're dispersed throughout the page.
>>>> 
>>>> Note that you can leave such merging to JSON-LD framing, which does this anyway.
>>>> 
>>>>> While that
>>>>> does deviate from the actual shape in the source page, it is so much
>>>>> better for consumption, and I think is to be expected. Another thing I
>>>>> don't do is any kind of coercion. Literals with datatype or deviating
>>>>> from any given @language are represented in expanded JSON-LD form.
>>>>> I've yet to decide whether to change that or make it configurable.
>>>> 
>>>> This might also be left to JSON-LD API methods. For instance, the "automatic" flag to compaction could generate the best context for you to use, and coerce your data for you. It can be expensive, though, and for any real application, a JSON-LD context matching the data could be provided to compact or frame.
>>> 
>>> At this point I'd like to stick to a strict and very simple solution,
>>> with one predicable result tree (based on the source RDFa structure,
>>> but merging anything dispersed). I'd like this to be lightweight and
>>> simple, with close to no API. The fact that this solution produces
>>> JSON-LD is a benefit, but it is basically skimmed data, mainly usable
>>> for simple things. I think of it mostly as an RDFa equivalent to the
>>> microdata-to-JSON approach. (And the merging I speak of is roughly
>>> corresponding to how that handles the @itemref stuff.)
>> 
>> For some reason, my point is being confused. I think the approach your taking is just great. My point was that if anything more complicated needs to be done, it can be left to JSON-LD tools. I'm all for keeping your implementation as simple possible, and be close to the form of the document you've produced.
>> 
>> If a developer wants to do more with the data than what you produce, the JSON-LD API has a number of useful tools. There should't be any direct dependencies between your tool and the JSON-LD implementations, leave the that up to a developer.
> 
> Ah, sorry; then all is good and well, and we're definitely on the same
> page! For my part, I suspect that my argumentation here was equally
> directed towards myself, since I've actually had impulses to embark on
> the very path I argue against. :) That is, instead of making decisions
> to end up with a basic simplicity, I've actually pondered on whether
> to adopt JSON-LD API flags for controlling the resulting shape.. Which
> as we all agree on is better done in separate steps, with separate
> libraries.
> 
> The main thing yet to decide then is whether to merge or keep
> disparate description objects of the same resource. Do you find any
> merit in my reasoning there, regarding expectations, usability, and in
> the "microdata-using-@itemref to JSON" parallell?

If the subjects are split across the HTML page, I think there's value in maintaining the separate definitions. Like I said, a developer can always use framing to merge them back together. Once they're merged, you do loose the context of where data was asserted, which might be useful to maintain.

> To some extent, I
> already diverge from the exact "frame" of the source when it comes to
> @rev handling.

I guess I wouldn't worry about @rev. Focus on the RDFa 1.1 Lite cases.

> And as mentioned, when dealing with multiple references
> to the same resource. So I think a certain amount of "normalization"
> is inevitable. Although I believe it best to keep nesting in general,
> rather than to move all resource objects to the top-level @graph. But
> that's also debatable. We know one size won't fit all, so the goal
> here is to establish some kind of "reasonable expectation based on the
> source", if possible...

That's the way to go. Keep it simple.

> (By the way, I've pushed the changes I mentioned. Still quite a moving
> target, but it may be an improvement.)

Thanks, I'm not sure when I'll have time to jump in, but I thought I'd see about creating a structure, to easily run through the test suite on your development machine. I do this for my ruby implementation, and it is much faster than using the RDFa Test Suite application, even on my own machine. Although, it may be that rdfstore-js is the way to go.

I'll leave the logic of the parser to you, for the time being.

Gregg


> Best regards,
> Niklas
Received on Friday, 20 April 2012 17:44:31 UTC