embed behavior makes .frame's results hard to work with

https://github.com/json-ld/json-ld.org/issues/119
(Issue reproduced below.)

*Executive summary
*The framing algorithm's approach to "multiple embeds" makes it hard for
developers to work with framed results. *Background*

Developers want to frame JSON-LD payloads in ways that make them simple to
work with. For example:

   - discover subjects of interest
   - loop over these subjects
   - resolve nested data with consistent paths

But in the current framing algorithm, machinery for avoiding circularity
and avoiding verbose output introduces complexity for developers. Best to
understand with an example.
*Example*

I'll illustrate with MedicationLists that have Medications that have
DrugCodes with titles and identifiers:

*Framing Problem: example in Playground <http://tinyurl.com/7sgetlj>*
 How developers want framing to work:

jsonld.frame(raw_data, function(err, response){
    response['@graph'].forEach(function(medlist){
        medlist.forEach(function(med){
            console.log("Drug: " + med.drugCode.title + "::" +
med.drugCode.identifier);
        });
    });});

.... but in the example above, when we hit
['@graph'][0].hasMedication[2].drugCode we find a *reference, not an embed*!
It takes severely defensive progrmaming to avoid this.
 How developers need to work around the current framing behavior:

Since framed results don't reliably re-embed resources, developers need to
check at each step whether an object is a reference or an embed. This means
first creating a hash of known embeds, and then looking up values in this
hash at every step through the framed result.

jsonld.frame(raw_data, medframe, function(err, response) {

    // identify an embed for each subject to resolve references
    var subjects = {}
    findSubjects(subjects, med_response['@graph']);

    response['@graph'].forEach(function(medlist){
        medlist.forEach(function(med){

            // need to ensure drugCode is an embed, not a reference
            var drugCode = subjects[med.drugCode['@id']];

            console.log("Drug code: " + drugCode.title + "::" +
drugCode.identifier);
        });
    });});
// pseudocode for finding subject embds in framed resultsfunction
findSubects(subjects, subtree) {
    if (_isArray(subtree)) {
        subtree.forEach(function(elt){
            findSubject(subjects, elt);
        });

        return;
    }

    if (_isEmbed(subtree)) {
        subjects[subtree['@id']] = subtree;
    }

    if (_isObject(subtree)) {
        for (k in subtree) {
            findSubjects(subjects, subtree[k]);
        }
    }};

And the workaround isn't complete

This workaround presents limitations. For instance:

   - How to deal with subjects that are *supposed* to be framed in
   different ways?
   - How to properly implement _isEmbed?

*Proposal: aggressive re-embedding*

I'd recommend re-embedding resources aggressively -- right up to (but not
crossing) the point of creating circular references. There are some risks
here, including an explosion in the framing output size for graphs rich in
bidirectional links. Does anyone have ideas for mitigating this explosion?

(One alternative approach is to allow a mode of operation that doesn't
produce a serializable framing output, but instead produces an in-memory
structure with potential circularity. For many applications, this
in-memory, potentially circular structure is a very natural fit for
developers' goals. This could be separate from framing, if there were a
simple, consistent way to take a serialized framed result and convert to an
appropriate in-memory structure.)

Received on Sunday, 13 May 2012 20:39:40 UTC