- From: Martin McEvoy <martin@weborganics.co.uk>
- Date: Thu, 14 Oct 2010 21:40:26 +0100
- To: Toby Inkster <tai@g5n.co.uk>
- CC: Semantic Web <semantic-web@w3.org>
Hello Toby, On 13/10/2010 16:09, Toby Inkster wrote: > On Wed, 2010-10-13 at 12:15 +0100, Martin McEvoy wrote: >> I am pleased to announce JSON Datasets[1] a way to extract RDF from >> any HTML document using JSON. > Hi Martin, > > I seem to remember reading some of your early work on this concept some > months ago. Can't remember how I stumbled upon it. I think the topic came up on the RDFa WG at the end of last year, when discussing alternative methods of prefix mappings. > Anyhow, it's an interesting idea. It seems to be that it's quite > GRDDL-like, in that an HTML file can link to a file that contains a set > of rules which, once applied to the original file, produce RDF as > output. > > I know that you're quite the XSLT transformation guru, and are probably > quite familiar with GRDDL. Are you also aware that GRDDL allows > transformations to be written using languages other than XSLT? Your > JSON-based language seems like it would make a good transformation > language for GRDDL. > > Essentially all that would need to be done to make JSON Datasets > conformant to GRDDL would be to replace this method of linking: > > <link rel="dataset" > href="http://example.com/my-dataset.json" > type="application/json"> > > With the GRDDL methods of linking to a transformation. There are two > such methods that are relevant to HTML (there are another two which are > XML-based) - firstly a direct link from the document to the JSON file: > > <link rel="transformation" > href="http://example.com/my-dataset.json" > type="application/json"> > > And secondly, an indirect link from the document to a profile document > > <head profile="http://example.com/profile"> > > Where the profile document contains a link to the JSON: > > <link rel="profileTransformation" > href="http://example.com/my-dataset.json" > type="application/json"> > > If those were used by JSON datasets instead of rel="dataset" then you > might find that your apprach receives wider support. For example, my > Perl implementation of GRDDL supports pluggable transformation > languages; adding support for your JSON-based format would not be > especially tough. Adding support for rel="dataset" though, I would > consider to be out of scope for the project. I have no problem re-using rel=transformation or profileTransformation, I had the same thought as you but until now I didn't know GRDDL could use other languages. > Some critiques of the JSON format itself: > > The use of the term "where" is a little confusing. The terminology of > the query syntax seems to borrow from SQL and SPARQL, but the behaviour > of "where" seems totally different. In SQL and SPARQL, "where" is > essentially used to perform joins, and to narrow down criteria. In your > language it seems to be a mapping from one structure (a graph) to > another (RDF triples) - that seems to be more similar to SQL's "SELECT > foo AS bar". Perhaps this: > > { > "select": { > "from":"http://example.com/", > "prefix": { > "dc":"http://purl.org/dc/elements/1.1/" > }, > "where": { > "title": { "label": "dc:title" } > } > } > } > > Might be better expressed as: > > { > "prefix": { > "dc":"http://purl.org/dc/elements/1.1/" > }, > "select": { > "title": { "label": "dc:title" } > }, > "from":"http://example.com/" > } .. "select" is a little confusing when you put it like that :) I like your example though It looks cleaner ... > And actually, "label" might be better is called "as": > > { > "prefix": { > "dc":"http://purl.org/dc/elements/1.1/" > }, > "select": { > "title": { "as": "dc:title" } > }, > "from":"http://example.com/" > } ... and I like the above too ... > Are prefixes required, or just a shortcut? Could the above be written as > the following? > > { > "select": { > "title": { "as":"http://purl.org/dc/elements/1.1/title" } > }, > "from":"http://example.com/" > } Prefixes are required at the moment, you may know I am not a huge fan of typing out long urls instead of keywords... having said that I have no problem implementing It as you are the second person to bring it up. > It's not clear whether selectors may be combined. "h1", ".example" and > "#heading" are all valid selectors, but what about "h1.example" and > "#heading h1.example". If you're going to use a subset of CSS, you need > to be awfully clear about what subset you're specifying, otherwise > people coming to your spec, knowing CSS already, are going to say, > "well, it's like CSS, so I must be able to do foo." You can only use one selector at a time I'm afraid, selectors are css "like" in appearance but really that's where the similarity ends, I should perhaps make more of a point about that, having said that I will have a go (If I have the time) over the weekend at implementing combined selectors as I can see it may be useful. ... > You might consider switching to, or at least allowing XPath for > selectors. It's mighty powerful, and should be able to handle useful > idioms like class=fn which is inside class=vcard, but not inside a > nested class=vcard. XPath is mighty powerful indeed, but complex to the average author, there Is value in Implementing both, and seeing how it goes. > Lastly in your spec, you use a lot of XML terminology when describing > the output. Personally I found that quite confusing. You might want to > consider explaining how the output is constructed in terms of the > abstract triples, or if you want to describe it in more concrete terms, > in terms of N-Triples. Ah yes It does use a lot XML terminology sorry about that, I will update the spec to use N-Triples, again this is a point that has been mentioned before by someone. > I think if you did that, it might even help clarify the format in your > own mind and further improve it - for example, you may not have noticed, > but because you've defined the "label" property in XML terms, you've > ended up with a property which sometimes ends up setting an RDF > property, and at other times an RDF class, as in the case of > <http://weborganics.co.uk/dataset/#query-rev> where it's used to set a > class of "Person". How it sometimes sets one and sometimes sets the > other seems to happen via magic (perhaps using the same rule as RDF/XML > where the same also happens, and is similarly confusing). Thanks for some great feedback Toby, It's been valuable. Best wishes -- Martin McEvoy
Received on Thursday, 14 October 2010 20:41:13 UTC