- From: Toby Inkster <tai@g5n.co.uk>
- Date: Wed, 13 Oct 2010 16:09:06 +0100
- To: martin@weborganics.co.uk
- Cc: Semantic Web <semantic-web@w3.org>
On Wed, 2010-10-13 at 12:15 +0100, Martin McEvoy wrote: > I am pleased to announce JSON Datasets[1] a way to extract RDF from > any HTML document using JSON. Hi Martin, I seem to remember reading some of your early work on this concept some months ago. Can't remember how I stumbled upon it. Anyhow, it's an interesting idea. It seems to be that it's quite GRDDL-like, in that an HTML file can link to a file that contains a set of rules which, once applied to the original file, produce RDF as output. I know that you're quite the XSLT transformation guru, and are probably quite familiar with GRDDL. Are you also aware that GRDDL allows transformations to be written using languages other than XSLT? Your JSON-based language seems like it would make a good transformation language for GRDDL. Essentially all that would need to be done to make JSON Datasets conformant to GRDDL would be to replace this method of linking: <link rel="dataset" href="http://example.com/my-dataset.json" type="application/json"> With the GRDDL methods of linking to a transformation. There are two such methods that are relevant to HTML (there are another two which are XML-based) - firstly a direct link from the document to the JSON file: <link rel="transformation" href="http://example.com/my-dataset.json" type="application/json"> And secondly, an indirect link from the document to a profile document <head profile="http://example.com/profile"> Where the profile document contains a link to the JSON: <link rel="profileTransformation" href="http://example.com/my-dataset.json" type="application/json"> If those were used by JSON datasets instead of rel="dataset" then you might find that your apprach receives wider support. For example, my Perl implementation of GRDDL supports pluggable transformation languages; adding support for your JSON-based format would not be especially tough. Adding support for rel="dataset" though, I would consider to be out of scope for the project. Some critiques of the JSON format itself: The use of the term "where" is a little confusing. The terminology of the query syntax seems to borrow from SQL and SPARQL, but the behaviour of "where" seems totally different. In SQL and SPARQL, "where" is essentially used to perform joins, and to narrow down criteria. In your language it seems to be a mapping from one structure (a graph) to another (RDF triples) - that seems to be more similar to SQL's "SELECT foo AS bar". Perhaps this: { "select": { "from": "http://example.com/", "prefix": { "dc": "http://purl.org/dc/elements/1.1/" }, "where": { "title": { "label": "dc:title" } } } } Might be better expressed as: { "prefix": { "dc": "http://purl.org/dc/elements/1.1/" }, "select": { "title": { "label": "dc:title" } }, "from": "http://example.com/" } And actually, "label" might be better is called "as": { "prefix": { "dc": "http://purl.org/dc/elements/1.1/" }, "select": { "title": { "as": "dc:title" } }, "from": "http://example.com/" } Are prefixes required, or just a shortcut? Could the above be written as the following? { "select": { "title": { "as": "http://purl.org/dc/elements/1.1/title" } }, "from": "http://example.com/" } It's not clear whether selectors may be combined. "h1", ".example" and "#heading" are all valid selectors, but what about "h1.example" and "#heading h1.example". If you're going to use a subset of CSS, you need to be awfully clear about what subset you're specifying, otherwise people coming to your spec, knowing CSS already, are going to say, "well, it's like CSS, so I must be able to do foo." You might consider switching to, or at least allowing XPath for selectors. It's mighty powerful, and should be able to handle useful idioms like class=fn which is inside class=vcard, but not inside a nested class=vcard. Lastly in your spec, you use a lot of XML terminology when describing the output. Personally I found that quite confusing. You might want to consider explaining how the output is constructed in terms of the abstract triples, or if you want to describe it in more concrete terms, in terms of N-Triples. I think if you did that, it might even help clarify the format in your own mind and further improve it - for example, you may not have noticed, but because you've defined the "label" property in XML terms, you've ended up with a property which sometimes ends up setting an RDF property, and at other times an RDF class, as in the case of <http://weborganics.co.uk/dataset/#query-rev> where it's used to set a class of "Person". How it sometimes sets one and sometimes sets the other seems to happen via magic (perhaps using the same rule as RDF/XML where the same also happens, and is similarly confusing). -- Toby A Inkster <mailto:mail@tobyinkster.co.uk> <http://tobyinkster.co.uk>
Received on Wednesday, 13 October 2010 15:10:01 UTC