RE: JSON-LD serialization and linked data support from Timothy Cole on 2015-08-13 (public-annotation@w3.org from August 2015)

From: Timothy Cole <t-cole3@illinois.edu>
Date: Thu, 13 Aug 2015 00:03:35 -0500
To: "'Robert Sanderson'" <azaroth42@gmail.com>, "'Frederick Hirsch'" <w3c@fjhirsch.com>
CC: "'W3C Public Annotation List'" <public-annotation@w3.org>
Message-ID: <009201d0d585$696b9810$3c42c830$@illinois.edu>
Given that we as a group have not created and reached concurrence on any 'plain JSON' annotation examples that reflect the changes we collectively want to make to our data model to accommodate differentiating roles in multi-Body/Target annotations, I think talk on today's call of a second JSON serialization (let alone an HTML serialization or whether JSON-LD is inherently useful for us or not) is premature.

 

It is a mistake, I think, to assume that leaving out the LD will greatly simplify the JSON required to meet dictates of our annotation data model. How much of the current impasse is the LD and how much is the model complexity? I actually think that the LD part is relatively small. Mostly I think we have a difference about the model, about what each of us feel is important (and logical) to express when describing an annotation, RDF or no RDF. I am loathe to reopen model issues already discussed, but better that than to think this is all about the differences between plain JSON and JSON-LD if it's really not the problem.

 

Case in point is Rob's example from his -1 to Multiple JSON serialization email.  It may be valid JSON, but it's still bad JSON, and least it's bad as measured against our current data model (no offense, Rob). This is illustrated by adding one more name-value pair to the JSON.

 

Serialization 1 (Plain JSON):

{

 "comment": "There's a typo for 'squirrel'",

  "change": "squirrel",

  "describe": "https://en.wikipedia.org/wiki/Squirrel" , 

  "target": "http://cnn.com/"

}

 

This serialization is simple. But it's not at all natural to me. Rather, in the context of our data model, I find it terribly ambiguous. Do I now have an annotation with 2 bodies and 2 targets or an annotation with 3 bodies and 1 target? And if it's two targets, which target is the one the "change" body applies to? There's no way to tell. One can argue that it doesn't matter how many targets or bodies, but our model says it does, and it says so not due to any RDF requirement, but because when we wrote the data model we felt there were potential use cases where it is important to know what's part of the body of the annotation and what's part of the target. So to maximize interoperability and minimize ambiguity, our data model distinguishes between targets and bodies, and we really need a JSON serialization that satisfies this requirement. (Or we need to change the Data Model.) The following plain JSON would at least allow me know how many bodies and how many targets I have:

 

Serialization 2 (Plain JSON):

{

    "bodies": [

        {

            "comment": "There's a typo for 'squirrel'"

        },

        {

            "change": "squirrel"

        },

        {

            "describe": "https://en.wikipedia.org/wiki/Squirrel"

        }

    ],

    "targets": [

        "http://cnn.com/"

    ]

}

 

I'm open to simpler plain JSON that differentiates between Targets and Bodies if you have it, but this seems pretty simple while still meeting requirement to segregate bodies and targets. It's not JSON-LD, of course, but it's not that far removed either. With proper context, an equivalent JSON-LD (sans @context reference) could look like this:

 

Serialization 3 (JSON-LD):

{

    "bodies": [

        {

            "comment": "There's a typo for 'squirrel'"

        },

        {

            "change": "squirrel"

        },

        {

            "describe": {  "id" : "https://en.wikipedia.org/wiki/Squirrel" }

        }

    ],

    "targets": [

        "http://cnn.com/"

    ]

}

 

Assuming the JSON-LD context maps comment, change, and describe to appropriate oa predicates (similar to hasBody but more specific to better express narrower role in the annotation), this JSON-LD translates to N-Triples, Turtle, RDF/XML as you would expect -- 1 target and 3 blank-nodes, each related to a text string or a resource through an appropriate predicate. Nonetheless, an argument can be made that this JSON-LD does  not meet the requirements of our Data Model – the blank nodes are not oa:specificResources as such nor are they strings or retrievable Web resources, and therefore, one could argue they are not really Annotation Bodies as defined by our data model.  But if Serialization 3 (JSON-LD) doesn't meet the data model requirements, then (I would argue), neither does the plain JSON Serialization 2 for the same reasons – none of which have anything to do with Plain JSON vs. JSON-LD. It has to do with the requirements of our data model.  

 

My point is that we are conflating two issues.  Some of the complexities of the JSON-LD examples we've been sharing the last couple of weeks do reflect the LD part of JSON-LD, i.e., the nuances of RDF. But some, maybe even most of the concerns raised seem to me less about plain JSON versus JSON-LD and more about different interpretations of the annotation data model itself and in particular how we want to handle expressing roles of individual bodies and targets in a multi-Body/Target annotation, i.e., what's required to fully, clearly and unambiguously describe such an annotation. 

 

To get to the simplicity of Serialization 1, if that were the goal, the problem isn't RDF, it’s the fundamental assumptions in the data model. For example, if we wanted to agree that annotations can only ever have 1 target, that targets (since you can only have 1 per annotation) don't have roles, and make some other substantive changes in our model, then Serialization 1 above is good JSON and we could have correspondingly simpler JSON-LD.  But this would be a rather radical re-writing of our data model. 

 

A number of us who have a bias towards JSON-LD and/or RDF have proposed various solutions to the individual roles in multi-Body/Target problem that was posed a while back. It's been suggested that none of the proposals would feel natural to plain JSON developers (apologies for the moniker to all who find themselves in that category – no disrespect intended). So what would feel natural and still meet our model? Is the problem LD or is it the model? 

 

Can we resurface some examples of plain JSON proposed to meet this need and maybe augment with a few new examples proposed by plain JSON developers? And then I think we will have a better chance to separate out the data model issues and the RDF issues. 

 

Like Rob, I would prefer not to have two JSON serializations, especially if it turns out that the differences between plain JSON and JSON-LD isn't the crux of the issue. 

 

Thanks,

 

Tim Cole

 

 

From: Robert Sanderson [mailto:azaroth42@gmail.com] 
Sent: Wednesday, August 12, 2015 5:34 PM
To: Frederick Hirsch <w3c@fjhirsch.com>
Cc: W3C Public Annotation List <public-annotation@w3.org>
Subject: Re: JSON-LD serialization and linked data support

 

 

Annotations are embedded in IIIF descriptions, which are in turn Linked Data via JSON-LD:

    http://iiif.io/api/presentation/2.0/#image-resources <https://urldefense.proofpoint.com/v2/url?u=http-3A__iiif.io_api_presentation_2.0_-23image-2Dresources&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=zjI0r-H6xRs5fYf2_jJkju6US9ijk0nLw4ns2nuwU2k&m=NFF6q-07GMZ79XqmGdw6eJT_P_KodVznmKFtlimCvtE&s=6tMqPJEFw5UavUjolPGKj5URvqZnWy5knRyHyvVe8Ls&e=> 

 

They can be managed by linked data or json based systems, or just flat files on disk.    CATCH is one end point that we use in the Mirador IIIF client, and the expectation is that both will upgrade to the output of this working group (both Protocol and Model/Vocab/Serialization) as soon as it stabilizes.

 

We've been discussing search of annotations:

    http://search.iiif.io/api/search/0.9/ <https://urldefense.proofpoint.com/v2/url?u=http-3A__search.iiif.io_api_search_0.9_&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=zjI0r-H6xRs5fYf2_jJkju6US9ijk0nLw4ns2nuwU2k&m=NFF6q-07GMZ79XqmGdw6eJT_P_KodVznmKFtlimCvtE&s=x3mgCRer3aERX9rRoq2cDx5nJn5pKuE9sIX80OWyA9s&e=> 

 

JSON-LD lets us integrate with the Activity Streams work in the Social Web WG.  It would work with the CSV on the Web WG.  (Note that both use @id and @type in their JSON-LD). We need it for LDP.  _Google_ uses JSON-LD (with @id and @type) and describes it as "JSON-LD is an easy-to-use JSON-based linked data format" at: 

  https://developers.google.com/schemas/formats/json-ld?hl=en <https://urldefense.proofpoint.com/v2/url?u=https-3A__developers.google.com_schemas_formats_json-2Dld-3Fhl-3Den&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=zjI0r-H6xRs5fYf2_jJkju6US9ijk0nLw4ns2nuwU2k&m=NFF6q-07GMZ79XqmGdw6eJT_P_KodVznmKFtlimCvtE&s=VLU9tvmMFYkMOtBNi7uY_mhbrQPLEit2idwN_IgLNpY&e=> 

 

If Google think that it's ready and easy for developers ... 

 

Rob

 

 

On Wed, Aug 12, 2015 at 3:15 PM, Frederick Hirsch <w3c@fjhirsch.com <mailto:w3c@fjhirsch.com> > wrote:

On today's call the topic of serializations came up and a question seemed to be raised over whether JSON-LD should be used (perhaps I heard incorrectly)

There are some strong reasons to continue to require JSON-LD as a mandatory serialization, the abstract argument being the value of linked data on the back end.

A specific concrete example of the value of linked data in combination with annotations might be "CATCH: Common Annotation, Tagging, and Citation at Harvard"

[[

It is designed to interoperate with third-party annotation tools to aggregate and associate contextualized annotation metadata from various pedagogical and research tools with reference to persistent digital media in repositories, such as the Harvard Library DRS. - See more at: https://osc.hul.harvard.edu/liblab/projects/catch-common-annotation-tagging-and-citation-harvard#sthash.fr7L4qa3.dpuf <https://urldefense.proofpoint.com/v2/url?u=https-3A__osc.hul.harvard.edu_liblab_projects_catch-2Dcommon-2Dannotation-2Dtagging-2Dand-2Dcitation-2Dharvard-23sthash.fr7L4qa3.dpuf&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=zjI0r-H6xRs5fYf2_jJkju6US9ijk0nLw4ns2nuwU2k&m=NFF6q-07GMZ79XqmGdw6eJT_P_KodVznmKFtlimCvtE&s=FiRuG5w75MEozWZdzJ3m2GHcDHDASEnZPPmVY3MbKRw&e=> 

]]

Do we have other concrete examples of how the linked data aspect of the Open Annotation model adds value to annotations? Pointers would be welcome.

I'm concerned about specifying multiple serializations as we have to be more careful of interoperability in this case, specifically is round-tripping without information loss despite the serialization a potential issue? More serializations also mean more testing.

In a related thought, is directly embedding JSON-LD in HTML ( http://www.w3.org/TR/json-ld/#embedding-json-ld-in-html-documents <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.w3.org_TR_json-2Dld_-23embedding-2Djson-2Dld-2Din-2Dhtml-2Ddocuments&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=zjI0r-H6xRs5fYf2_jJkju6US9ijk0nLw4ns2nuwU2k&m=NFF6q-07GMZ79XqmGdw6eJT_P_KodVznmKFtlimCvtE&s=9QjTs-29L7Rfy7kW2WEyDlv2c-9UhmTykmMFa-BPHIY&e=>  ) a viable option? What is the status of browser support for this? If it is supported (or is in progress) what is the case for HTML serialization as an alternative? Would it be more productive to focus on generic support for JSON-LD in browsers rather than a specific annotation serialization?

The fundamental issue I heard us discuss is that even with all our efforts to simplify the JSON-LD serialization, there will remain some aspects that do not appear 'natural' to JSON developers.  The next question I have is whether these aspects can be managed with suitable libraries etc.

Thanks

regards, Frederick

Frederick Hirsch

www.fjhirsch.com <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.fjhirsch.com&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=zjI0r-H6xRs5fYf2_jJkju6US9ijk0nLw4ns2nuwU2k&m=NFF6q-07GMZ79XqmGdw6eJT_P_KodVznmKFtlimCvtE&s=zi9MyaQrIQeqt45WjDVg322mH_ges9DbbcOrJhxovQo&e=> 
@fjhirsch







 

-- 

Rob Sanderson

Information Standards Advocate

Digital Library Systems and Services

Stanford, CA 94305
Received on Thursday, 13 August 2015 05:04:11 UTC