Re: [ANN] JSON Datsets / any HTML to RDF from Martin McEvoy on 2010-10-14 (semantic-web@w3.org from October 2010)

From: Martin McEvoy <martin@weborganics.co.uk>
Date: Thu, 14 Oct 2010 21:40:26 +0100
To: Toby Inkster <tai@g5n.co.uk>
CC: Semantic Web <semantic-web@w3.org>
Message-ID: <4CB76ABA.5030707@weborganics.co.uk>
  Hello Toby,

On 13/10/2010 16:09, Toby Inkster wrote:
> On Wed, 2010-10-13 at 12:15 +0100, Martin McEvoy wrote:
>> I am pleased to announce JSON Datasets[1] a way to extract RDF from
>> any HTML document using JSON.
> Hi Martin,
>
> I seem to remember reading some of your early work on this concept some
> months ago. Can't remember how I stumbled upon it.

I think the topic came up on the RDFa WG at the end of last year, when 
discussing alternative methods of prefix mappings.

> Anyhow, it's an interesting idea. It seems to be that it's quite
> GRDDL-like, in that an HTML file can link to a file that contains a set
> of rules which, once applied to the original file, produce RDF as
> output.
>
> I know that you're quite the XSLT transformation guru, and are probably
> quite familiar with GRDDL. Are you also aware that GRDDL allows
> transformations to be written using languages other than XSLT? Your
> JSON-based language seems like it would make a good transformation
> language for GRDDL.
>
> Essentially all that would need to be done to make JSON Datasets
> conformant to GRDDL would be to replace this method of linking:
>
>  <link rel="dataset"
>        href="http://example.com/my-dataset.json"
>  type="application/json">
>
> With the GRDDL methods of linking to a transformation. There are two
> such methods that are relevant to HTML (there are another two which are
> XML-based) - firstly a direct link from the document to the JSON file:
>
>  <link rel="transformation"
>        href="http://example.com/my-dataset.json"
>  type="application/json">
>
> And secondly, an indirect link from the document to a profile document
>
>  <head profile="http://example.com/profile">
>
> Where the profile document contains a link to the JSON:
>
>  <link rel="profileTransformation"
>        href="http://example.com/my-dataset.json"
>  type="application/json">
>
> If those were used by JSON datasets instead of rel="dataset" then you
> might find that your apprach receives wider support. For example, my
> Perl implementation of GRDDL supports pluggable transformation
> languages; adding support for your JSON-based format would not be
> especially tough. Adding support for rel="dataset" though, I would
> consider to be out of scope for the project.

I have no problem re-using rel=transformation or profileTransformation, 
I had the same thought as you but until now I didn't know GRDDL could 
use other languages.

> Some critiques of the JSON format itself:
>
> The use of the term "where" is a little confusing. The terminology of
> the query syntax seems to borrow from SQL and SPARQL, but the behaviour
> of "where" seems totally different. In SQL and SPARQL, "where" is
> essentially used to perform joins, and to narrow down criteria. In your
> language it seems to be a mapping from one structure (a graph) to
> another (RDF triples) - that seems to be more similar to SQL's "SELECT
> foo AS bar". Perhaps this:
>
> {
>    "select":  {
>      "from":"http://example.com/",
>      "prefix": {
>        "dc":"http://purl.org/dc/elements/1.1/"
>      },
>      "where": {
>        "title": {  "label": "dc:title" }
>      }
>    }
> }
>
> Might be better expressed as:
>
> {
>    "prefix": {
>      "dc":"http://purl.org/dc/elements/1.1/"
>    },
>    "select":  {
>      "title": {  "label": "dc:title" }
>    },
>    "from":"http://example.com/"
> }

..  "select" is a little confusing when you put it like that :) I like 
your example though It looks cleaner ...
> And actually, "label" might be better is called "as":
>
> {
>    "prefix": {
>      "dc":"http://purl.org/dc/elements/1.1/"
>    },
>    "select":  {
>      "title": {  "as": "dc:title" }
>    },
>    "from":"http://example.com/"
> }

... and I like the above too ...

> Are prefixes required, or just a shortcut? Could the above be written as
> the following?
>
> {
>    "select":  {
>      "title": {  "as":"http://purl.org/dc/elements/1.1/title"  }
>    },
>    "from":"http://example.com/"
> }

Prefixes are required at the moment, you may know I am not a huge fan of 
typing out long urls instead of keywords... having said that I have no 
problem implementing It as you are the second person to bring it up.

> It's not clear whether selectors may be combined. "h1", ".example" and
> "#heading" are all valid selectors, but what about "h1.example" and
> "#heading h1.example". If you're going to use a subset of CSS, you need
> to be awfully clear about what subset you're specifying, otherwise
> people coming to your spec, knowing CSS already, are going to say,
> "well, it's like CSS, so I must be able to do foo."

You can only use one selector at a time I'm afraid,  selectors are css 
"like" in appearance but really that's where the similarity ends, I 
should perhaps make more of a point about that, having said that I will 
have a go (If I have the time) over the weekend at implementing combined 
selectors as I can see it may be useful. ...

> You might consider switching to, or at least allowing XPath for
> selectors. It's mighty powerful, and should be able to handle useful
> idioms like class=fn which is inside class=vcard, but not inside a
> nested class=vcard.

XPath is mighty powerful indeed, but complex to the average author, 
there Is value in Implementing both, and seeing how it goes.

> Lastly in your spec, you use a lot of XML terminology when describing
> the output. Personally I found that quite confusing. You might want to
> consider explaining how the output is constructed in terms of the
> abstract triples, or if you want to describe it in more concrete terms,
> in terms of N-Triples.

Ah yes It does use a lot XML terminology sorry about that, I will update 
the spec to use N-Triples, again this is a point that has been mentioned 
before by someone.
> I think if you did that, it might even help clarify the format in your
> own mind and further improve it - for example, you may not have noticed,
> but because you've defined the "label" property in XML terms, you've
> ended up with a property which sometimes ends up setting an RDF
> property, and at other times an RDF class, as in the case of
> <http://weborganics.co.uk/dataset/#query-rev>  where it's used to set a
> class of "Person". How it sometimes sets one and sometimes sets the
> other seems to happen via magic (perhaps using the same rule as RDF/XML
> where the same also happens, and is similarly confusing).

Thanks for some great feedback Toby, It's been valuable.

Best wishes

-- 
Martin McEvoy
Received on Thursday, 14 October 2010 20:41:13 UTC