Re: Why Framing and Normalization

On 09/02/2011 07:22 PM, Danny Ayers wrote:
> On 2 September 2011 23:42, Dave Longley<dlongley@digitalbazaar.com>  wrote:
>> On 09/02/2011 12:33 PM, Danny Ayers wrote:
>>> On 2 September 2011 18:04, Dave Longley<dlongley@digitalbazaar.com>
>>>   wrote:
>>>
>>>> The publisher can publish JSON-LD in whatever format they want to. It is
>>>> the
>>>> applications that must do whatever framing is appropriate for their
>>>> application. Applications simply expect an incoming graph and are
>>>> agnostic
>>>> about its JSON structure. All of the code for the application will be
>>>> written according to an expected particular structure; that application
>>>> simply frames incoming data according to it.
>>> This is what I don't get. Let's say the publisher publishes data in an
>>> arbitrary format. The consumer-developer will have to write custom
>>> code to make the data match their application requirements.
>> Consumer-developers must always write custom code to make incoming data
>> match their application requirements. Their starting point could be reading
>> documentation on how the data is published and then writing whatever code is
>> necessary to make it conform to their application requirements. They may
>> also have to stay up to date with how the publisher's data structure
>> changes. *Or* they could skip that step entirely, receive the data as a
>> graph in any structural format whatsoever, and then make it conform to the
>> structure that best suits their application by using standard JSON-LD
>> framing.
> I will be very impressed if typical developers don't need knowledge
> about the data that's been delivered (it's feasible in the RDF world,
> but I've not seen anything comparable elsewhere).

What I mean is that they don't need knowledge about the structure of how 
the data is delivered so long as they can understand it as a graph. 
Perhaps I should say "epistructure". I mean structure that is "above" or 
"in addition to" the graph itself. Namely, structure that is only part 
of the serialization of the graph.

This epistructure is of vital importance to JSON developers because it 
is effectively the API they use to work with the data. If they know that 
they can shape an incoming graph into whatever epistructure they desire, 
then they don't have to worry about the epistructure with which it was 
published. All they need to know is that the graph contains the 
information that they want; JSON-LD framing will handle the epistructure 
for them. This also means that they can easily import data from other 
serializations into JSON-LD and use the same framing API to create the 
appropriate epistructure for their application.

> All they need to do is write the frame that matches the structure
>> of the objects they want to work with -- which is not a difficult task. So
>> long as the input is a directed Linked Data graph, JSON-LD framing can shape
>> the data into the appropriate structure for the consumer-developer's code.
> Ok, but there you're replacing custom code (which will presumably will
> be written using libraries and code with which the developer is
> familiar, with the JSON-LD framing system, with which (at least in the
> first instance) they won't be familiar.

I don't think that the syntax of a frame is too foreign to a JSON 
developer. A frame looks like the objects that JSON developers are 
already using. I also think the learning curve for the idea of filling 
out a skeleton object with the information from a graph is low. Perhaps 
the learning curve for understanding a graph model vs. a tree model is 
relatively steep, but that is a prerequisite for any solution. Of 
course, if a developer is really uncomfortable with framing for some 
reason, then they can just use the normalized form of JSON-LD, which 
defines a very specific epistructure.

>> Something to keep in mind with JSON-based development is that your code
>> essentially works with the serialized form of your objects. This is what is
>> so attractive about JSON; the serialization feels just like the objects that
>> you are interacting with. In fact, languages like JavaScript and Python let
>> you use the serialization directly in your code -- as a native part of the
>> language. We don't want people to lose that feeling when working with
>> JSON-LD. We just want to add support for Linked Data so that we can link
>> together all of the now disparate objects that are floating around on the
>> web.
> Right, I totally agree with that. But the current approach seems to
> have drifted quite a long way from that. Before they can interact with
> the data as simple objects they have to transform it into simple
> objects.

Have to? No, but I would expect them to prefer to do it. We expect JSON 
developers to want to work with whatever JSON structures they want; we 
only want to essentially "annotate" what is already there with Linked 
Data context. If we force them to change their existing structures 
whenever they want to use Linked Data then I think we've made a mistake.

>>> With the
>>> current JSON-LD, the format is relatively complex but the developer
>>> can make it match their requirements using framing. To do that they'll
>>> have to learn how to use framing - a new little language which really
>>> requires some knowledge of the RDF model. I don't see how this system
>>> makes things easier for the developer.
>> What knowledge of the RDF model, specifically, would they have to know that
>> you consider a burden?
> I wouldn't highlight any specific aspect, in the same way I wouldn't
> highlight any specific aspect of working with RDBs or XML as a burden.
> But if your objective is to build an application, you are more likely
> to favour data sources that don't involve learning a new technology.
>
> Consumer-developers are going to be working with
>> objects that have properties which point to more objects. Frames simply list
>> out the structure of those objects; the same structure that
>> consumer-developers will be relying on in the code that they write.
> If they really are the same structures they are relying on in the code
> they write, why can't they use their existing techniques?

They can. I'm not sure which techniques you're talking about that they 
can no longer use. They can use whatever the publisher puts out there, 
or they can use normalized JSON-LD which defines a specific structure, 
or they can use framing to the structure the data in a way that they 
feel best suits their application and design patterns. I would expect 
them to prefer framing over being forced to always use a flat structure.

> To me,
>> it seems very natural. In my experience, it has also been natural in
>> practice so far. In the current PaySwarm implementation we parse RDFa from
>> web pages into flat JSON-LD, and then we frame the results according to the
>> objects that we use in our code. There's no "structure" at all (other than
>> the graph) in the published RDFa, but we don't have to care about that as a
>> consumer. If we hit a web page that does offer us JSON-LD instead of RDFa,
>> and they can frame the data the way we want it, that's great. But our code
>> works and uses a single code path regardless.
> That's great, but does rely on having framing in the code path. I'm
> not convinced that step is necessary (a less complex format would
> allow more direct access to the object structures) and suspect it may
> put developers off.

Like Manu argued earlier, there can be only one way to express graphs in 
JSON or we can allow developers to choose the way they would like to 
express them. The latter, I believe, allows them to keep working in 
whatever way is natural to them. The former forces them into a box they 
may feel uncomfortable with. Furthermore, it likely means that they will 
have to change all of their existing data structures.

I think of it this way: There are already many, many JSON-based APIs out 
on the web. The way the data is structured in these APIs is specific to 
the type of data that the APIs are for. In other words, structure 
matters in JSON and it is context-specific. If we go and tell JSON 
developers that they now can only use one specific structure if they 
want to turn their data into Linked Data, then I expect us to fail.

This epistructure stuff doesn't matter in the RDF graph world -- so all 
you need if you want to simply serialize triples in JSON is a single, 
simple structure that is used almost exclusively for serialization not 
object-oriented programming. That is *not* the goal of JSON-LD. JSON-LD 
aims to bring Linked Data into an existing JSON world with as little 
disruption as possible. JSON-LD wants to *update* existing JSON. 
Something like J-Triples is meant to simply use JSON to serialize 
triples. There is a significant difference.

> Contrast with XML format data. If you want to turn that into objects
> in your favourite language you could use SAX, DOM or any of the many
> variants, transform it with XSLT into another format you can already
> receive, or even just use regular expressions. But you get to choose
> how to do that mapping to your objects. How would you feel if you
> *had* to use XSLT?

No one *has* to use framing. I don't think this analogy applies. If 
someone wants to map inputs in a different way then the existence of 
another tool to do it for them won't stand in their way.

Developers are always going to have to get incoming data into the 
objects that their system works with. This means writing some kind of 
custom code when the data comes in or rewriting the rest of your system 
to work with the incoming data as is. Most people pick the 
transformation step; JSON-LD framing provides a simple way to do this 
for them. They can do something else if they so desire. One difference 
with JSON-LD is that we don't also require publishers to perform similar 
custom transformations. If a publisher has JSON data that they are 
already publishing and that they now want to publish it as Linked Data, 
all they have to do is indicate what the context is.

> I'm sure whatever JSON-LD ends up like people will
> develop other ways of accessing the data, but what bothers me is the
> design of the format is being heavily influenced by this one
> particular method: it doesn't matter what goes over the wire because
> it can be made simple with framing.

We want to make it easy for JSON publishers to adopt JSON-LD, so we 
don't want them to have to change the structure of their objects. I 
expect that they will prefer this.

-- 
Dave Longley
CTO
Digital Bazaar, Inc.

Received on Saturday, 3 September 2011 19:06:51 UTC