- From: Martin McEvoy <martin@weborganics.co.uk>
- Date: Thu, 14 Oct 2010 21:40:26 +0100
- To: Toby Inkster <tai@g5n.co.uk>
- CC: Semantic Web <semantic-web@w3.org>
Hello Toby,
On 13/10/2010 16:09, Toby Inkster wrote:
> On Wed, 2010-10-13 at 12:15 +0100, Martin McEvoy wrote:
>> I am pleased to announce JSON Datasets[1] a way to extract RDF from
>> any HTML document using JSON.
> Hi Martin,
>
> I seem to remember reading some of your early work on this concept some
> months ago. Can't remember how I stumbled upon it.
I think the topic came up on the RDFa WG at the end of last year, when
discussing alternative methods of prefix mappings.
> Anyhow, it's an interesting idea. It seems to be that it's quite
> GRDDL-like, in that an HTML file can link to a file that contains a set
> of rules which, once applied to the original file, produce RDF as
> output.
>
> I know that you're quite the XSLT transformation guru, and are probably
> quite familiar with GRDDL. Are you also aware that GRDDL allows
> transformations to be written using languages other than XSLT? Your
> JSON-based language seems like it would make a good transformation
> language for GRDDL.
>
> Essentially all that would need to be done to make JSON Datasets
> conformant to GRDDL would be to replace this method of linking:
>
> <link rel="dataset"
> href="http://example.com/my-dataset.json"
> type="application/json">
>
> With the GRDDL methods of linking to a transformation. There are two
> such methods that are relevant to HTML (there are another two which are
> XML-based) - firstly a direct link from the document to the JSON file:
>
> <link rel="transformation"
> href="http://example.com/my-dataset.json"
> type="application/json">
>
> And secondly, an indirect link from the document to a profile document
>
> <head profile="http://example.com/profile">
>
> Where the profile document contains a link to the JSON:
>
> <link rel="profileTransformation"
> href="http://example.com/my-dataset.json"
> type="application/json">
>
> If those were used by JSON datasets instead of rel="dataset" then you
> might find that your apprach receives wider support. For example, my
> Perl implementation of GRDDL supports pluggable transformation
> languages; adding support for your JSON-based format would not be
> especially tough. Adding support for rel="dataset" though, I would
> consider to be out of scope for the project.
I have no problem re-using rel=transformation or profileTransformation,
I had the same thought as you but until now I didn't know GRDDL could
use other languages.
> Some critiques of the JSON format itself:
>
> The use of the term "where" is a little confusing. The terminology of
> the query syntax seems to borrow from SQL and SPARQL, but the behaviour
> of "where" seems totally different. In SQL and SPARQL, "where" is
> essentially used to perform joins, and to narrow down criteria. In your
> language it seems to be a mapping from one structure (a graph) to
> another (RDF triples) - that seems to be more similar to SQL's "SELECT
> foo AS bar". Perhaps this:
>
> {
> "select": {
> "from":"http://example.com/",
> "prefix": {
> "dc":"http://purl.org/dc/elements/1.1/"
> },
> "where": {
> "title": { "label": "dc:title" }
> }
> }
> }
>
> Might be better expressed as:
>
> {
> "prefix": {
> "dc":"http://purl.org/dc/elements/1.1/"
> },
> "select": {
> "title": { "label": "dc:title" }
> },
> "from":"http://example.com/"
> }
.. "select" is a little confusing when you put it like that :) I like
your example though It looks cleaner ...
> And actually, "label" might be better is called "as":
>
> {
> "prefix": {
> "dc":"http://purl.org/dc/elements/1.1/"
> },
> "select": {
> "title": { "as": "dc:title" }
> },
> "from":"http://example.com/"
> }
... and I like the above too ...
> Are prefixes required, or just a shortcut? Could the above be written as
> the following?
>
> {
> "select": {
> "title": { "as":"http://purl.org/dc/elements/1.1/title" }
> },
> "from":"http://example.com/"
> }
Prefixes are required at the moment, you may know I am not a huge fan of
typing out long urls instead of keywords... having said that I have no
problem implementing It as you are the second person to bring it up.
> It's not clear whether selectors may be combined. "h1", ".example" and
> "#heading" are all valid selectors, but what about "h1.example" and
> "#heading h1.example". If you're going to use a subset of CSS, you need
> to be awfully clear about what subset you're specifying, otherwise
> people coming to your spec, knowing CSS already, are going to say,
> "well, it's like CSS, so I must be able to do foo."
You can only use one selector at a time I'm afraid, selectors are css
"like" in appearance but really that's where the similarity ends, I
should perhaps make more of a point about that, having said that I will
have a go (If I have the time) over the weekend at implementing combined
selectors as I can see it may be useful. ...
> You might consider switching to, or at least allowing XPath for
> selectors. It's mighty powerful, and should be able to handle useful
> idioms like class=fn which is inside class=vcard, but not inside a
> nested class=vcard.
XPath is mighty powerful indeed, but complex to the average author,
there Is value in Implementing both, and seeing how it goes.
> Lastly in your spec, you use a lot of XML terminology when describing
> the output. Personally I found that quite confusing. You might want to
> consider explaining how the output is constructed in terms of the
> abstract triples, or if you want to describe it in more concrete terms,
> in terms of N-Triples.
Ah yes It does use a lot XML terminology sorry about that, I will update
the spec to use N-Triples, again this is a point that has been mentioned
before by someone.
> I think if you did that, it might even help clarify the format in your
> own mind and further improve it - for example, you may not have noticed,
> but because you've defined the "label" property in XML terms, you've
> ended up with a property which sometimes ends up setting an RDF
> property, and at other times an RDF class, as in the case of
> <http://weborganics.co.uk/dataset/#query-rev> where it's used to set a
> class of "Person". How it sometimes sets one and sometimes sets the
> other seems to happen via magic (perhaps using the same rule as RDF/XML
> where the same also happens, and is similarly confusing).
Thanks for some great feedback Toby, It's been valuable.
Best wishes
--
Martin McEvoy
Received on Thursday, 14 October 2010 20:41:13 UTC