Re: JSON-LD Telecon Minutes for 2012-09-18 from Gregg Kellogg on 2012-09-18 (public-linked-json@w3.org from September 2012)

From: Gregg Kellogg <gregg@greggkellogg.net>
Date: Tue, 18 Sep 2012 13:54:04 -0400
To: Manu Sporny <msporny@digitalbazaar.com>
CC: Linked JSON <public-linked-json@w3.org>, RDF WG <public-rdf-wg@w3.org>
Message-ID: <97291C40-F222-4FAE-86ED-C7BC1843E697@kellogg-assoc.com>
For the record, I did describe the original IRI compaction algorithm, but I believe it was re-specified by Markus (can't check just now). I'm on the recorded for suggesting that we dramatically simplify it, most likely by ignoring the data range issues and simply used defined terms, followed by Compact IRIs in lexicographical order. Anything more complicated should use custom logic. There should be provision for a callback to allow the user to define more custom logic.

Gregg Kellogg

Sent from my iPad

On Sep 18, 2012, at 11:26 AM, "Manu Sporny" <msporny@digitalbazaar.com> wrote:

> The minutes from today's call are now available here:
>
> http://json-ld.org/minutes/2012-09-18/

>
> Full text of the discussion follows including a link to the audio
> transcript:
>
> --------------------
> JSON-LD Community Group Telecon Minutes for 2012-09-18
>
> Agenda:
>   http://lists.w3.org/Archives/Public/public-linked-json/2012Sep/0006.html

> Topics:
>   1. ISSUE-113: IRI compaction algorithm
>   2. ISSUE-140: Consider objectify/link API method
>   3. Timeframe?
> Chair:
>   Manu Sporny
> Scribe:
>   Manu Sporny
> Present:
>   Manu Sporny, Markus Lanthaler, Niklas Lindström, David I. Lehn
> Audio:
>   http://json-ld.org/minutes/2012-09-18/audio.ogg

>
> Manu Sporny is scribing.
> Manu Sporny:  Gregg, Francois not here today. Dave Lehn will be
>   here shortly. Let's discuss the approach for solving these
>   issues.
> Manu Sporny:  Any additions/changes to the Agenda?
> No changes
>
> Topic: ISSUE-113: IRI compaction algorithm
>
> https://github.com/json-ld/json-ld.org/issues/113

> Markus Lanthaler:  The problem was that we never defined how
>   we're going to do IRI compaction, but that has been since
>   corrected, though not ideally the way we wanted it to be.
> Markus Lanthaler:  Gregg updated the spec - currently, there is
>   an algorithm that is not understandable without implementing it.
>   It isn't explained how the numbers were generated. If you don't
>   implement it, you have a difficult time understanding what the
>   algorithm is doing.
> Markus Lanthaler:  It's just a very difficult to understand
>   algorithm. It makes it quite difficult to explain to people what
>   compaction does. It's kind of a black box at the moment.
> Manu Sporny:  So, what's the plan here? Make the language
>   simpler?
> Markus Lanthaler:  We should consider IRI compaction algorithm
>   and term ranking algorithm when simplifying.
> Markus Lanthaler:  Pseudo-code in the issue is easier to
>   understand.
> Markus Lanthaler:  Gregg disagrees, and Dave needs more time to
>   look at it.
> Manu Sporny:  Dave Longley's concern is that all the algorithms,
>   because we're focused on corner cases, are getting difficult to
>   understand. Perhaps what we should do is simplify greatly, and
>   ignore corner cases. One way we could do this is say that if
>   there is ever a term conflict, that we should just throw an error
>   and have the error callback handle the selection of the proper
>   term. The problem with that approach is that developers may
>   choose the wrong way to select the term (or at the very least,
>   it's non-interoperable - or they have to publish their
>   algorithm). To get around that, we could publish the "proper"
>   term matching algorithm along with the JSON-LD API and that can
>   be the default for the .compact() option for the error handler.
>   The problem with that is that we end up having the same amount of
>   complexity in there that we do today.
> Manu Sporny:  The other option is that we can explain the
>   algorithm better, but that doesn't remove the complexity of the
>   algorithm. [scribe assist by Niklas Lindström]
> Manu Sporny:  we could explain the algorithm like this: the
>   algorithm picks the most specific term; but there are
>   complications for this in the edge cases. [scribe assist by
>   Niklas Lindström]
> Manu Sporny:  so should we simplify it, or can we settle for
>   explaining it better? [scribe assist by Niklas Lindström]
> Markus Lanthaler:  What do you mean by conflict?
> Manu Sporny:  Two terms that have the same IRI, but one of them
>   has a datatype - which one is picked?
> Niklas Lindström:  I haven't had time recently to grasp the
>   current algorithm. I hope that we could simplify it to some
>   extent.
> Niklas Lindström:  There are many edge cases, are there test
>   cases?
> Manu Sporny:  yes, lots of test cases.
> Niklas Lindström:  Perhaps having different terms for date vs.
>   datetime. Author name (dc:creator with a string) vs. with a URI
>   reference. Those would be good to keep.
> Niklas Lindström:  Not having spent too much time on this
>   recently, I hope that we could make some sort of binary check -
>   either it's a perfect match, or if there is a term for it, use
>   that. So, we don't have multiple steps for checking (to see if
>   there is something matching)
> Manu Sporny:  the current algorithm is a multistep process; it
>   ranks the terms. We do have test cases for them. [scribe assist
>   by Niklas Lindström]
> Manu Sporny:  There are multiple ways of implementing it. The
>   selection algorithm is very complex because it deals with all the
>   corner cases. [scribe assist by Niklas Lindström]
> Manu Sporny:  Dave Longely proposes to deal with less corner
>   cases, and raise an error if there's a corner case conflict. That
>   has advantages and disadvantages. [scribe assist by Niklas
>   Lindström]
> Manu Sporny:  The big issue is figuring out, when there is a
>   corner case, which term gets picked.
> Niklas Lindström:  If I have a property 'age' and a value that is
>   an integer, that would be straight forward to pick - that
>   property and three terms - if one of them was coerced to an
>   integer, that one would be picked. If a term was coerced to a
>   list, it wouldn't be picked.
> Manu Sporny:  The issue is that the algorithm to do that is
>   complex.
> Niklas Lindström:  I haven't actually implemented that algorithm
>   yet - I'm about to.
> Niklas Lindström:  I'd map the property IRI to an object that
>   itself has a type dictionary, a container dictionary, or a
>   default property IRI mapping.
> Niklas Lindström:  I can see there is a certain complexity
>   involved if you are looking for something that is both coerced to
>   a datatype and it has a certain container (ie: has multiple
>   values)... I don't understand why you need to rank items.
> Markus Lanthaler:  Gregg wrote it, so he'd know best.
> Markus Lanthaler:  I didn't implement it as it is in the spec, I
>   couldn't figure out how to implement it from the spec. The idea
>   is that you have a number of terms or complex IRIs
>   (prefix/suffixes), or even the full IRI, and you assign a number
>   to them (to the IRI/value pair) which expresses how well it
>   matches.
> Markus Lanthaler:  So, for example, if you have just one term
>   with one IRI a 1, but you have something that has a datatype and
>   it matches, that gets a value of 2 and wins, etc.
> Markus Lanthaler:
>   http://json-ld.org/spec/latest/json-ld-api/#term-rank-algorithm

> 0 and term is ... you don't know how the numbers were created.
>   It's difficult to understand what's going on by looking at the
>   numbers.
> Markus Lanthaler: My proposal is this...
>   https://github.com/json-ld/json-ld.org/issues/113#issuecomment-5567976

> Manu Sporny:  I think we should try and remove all the numbers in
>   the term ranking algorithm as a way of simplifying the way it is
>   explained. Perhaps we need to implement it as a map-reduce step
>   that always results in 0 or 1 term picked as a result. So, you
>   give the algorithm a list of potential terms that can be matched,
>   and a value that is being considered for match against all the
>   terms. The algorithm then whittles the list of IRIs down to 1 (if
>   a term matched) or 0 (if none of the terms match). This way,
>   there is no weirdness like rank = rank - 2.
> Niklas Lindström: If you have this - [] dc:created
>   "2012-01-01T00:00:00"^^xsd:dateTime
> Niklas Lindström: and this term: "created": "dc:created"
> Niklas Lindström:  Let me see if I understand this correctly...
> Niklas Lindström: and this term: "dc:created": {"@type":
>   "xsd:dateTime"}
> Niklas Lindström:  What it we order the list so that you just go
>   down and ignore each item in the list until a selection is made?
> Niklas Lindström: "createdTimeSet": {"@type": "xsd:dateTime",
>   "@container": "@set"}
> Niklas Lindström:  So, we could simplify by throwing out choices
>   that we don't want to make.... like given the choice between
>   terms and curies, throw out all the curies from the decision
>   before you make the decision?
> Manu Sporny:  The issue is that people might be surprised by
>   this, because the more accurate term wouldn't be selected.
> Niklas Lindström:  Then they should only use terms, or only use
>   CURIEs.
> Niklas Lindström:  If you don't want the terms to be picked, you
>   should be able to manage your own context in that scenario,
>   anyway.
> Niklas Lindström:  If we try to support that use case, I'm not
>   really sure if we're supporting that usage of @context anyway -
>   it's a complex usage of terms and CURIEs.
> Manu Sporny:  Perhaps we can do this map-reduce in 3 iterations,
>   instead? First removes @set/@list, second matches against
>   datatype/language, third picks by lexicographical value. That may
>   be easier for folks to understand?
> Markus Lanthaler:  Maybe we pick @set/@list first, then
>   @datatype/@language, then last step checks lexicographical/prefix
>   value?
> Markus Lanthaler:  Maybe it's enough to specify how the internal
>   inverse-context is sorted? Then we just go down the list of
>   internal inverse-context values and pick an item or skip it?
> Niklas Lindström:  Maybe we should investigate that - we cover
>   most of the needs - it's more direct/natural.
> Manu Sporny:  Okay, so loose consensus - we have a function that
>   takes in a list of terms and a value to match... the function
>   whittles down the list to one item by the end. The way it
>   whittles could be performed in 3 iterations, where each iteration
>   removes imperfect matches leaving 1 or 0 matches at the end. The
>   other way it could be whittled down is to sort the list of
>   potential term matches in some way, and then searches for an
>   "exact" match.
> Error: (IRC nickname not recognized)[10:56]     <mlnt>  termA: @list,
>   typeA | termB: @list, typeB --> list: val1/typeA, val2/typeB,
>   val3/typeC
> Markus Lanthaler: I would say this should choose typeA (lexigr.
>   least)
> Markus Lanthaler: for list: val1/typeA, val2/typeB, val3/type
> Manu Sporny:  So, the approach could be less cognitively complex
>   and more algorithmically complex?
> Niklas Lindström:  Yeah, but only because we need to be more
>   accurate than we are now.
> Manu Sporny:  Dave Longley is concerned that when we chose the
>   word 'compact' that it was the wrong decision. The reason is that
>   people think it's supposed to end up with the least number of
>   bytes for the document. In reality, it's supposed to give back an
>   easy-to-use data structure for developers to use. So, when
>   compacting, we should ensure that we don't compact something that
>   shouldn't really be compacted. Like a list with mixed values
>   being compacted to a list of @datatypes that are xsd:integers
>   that would be the wrong thing to do.
> Niklas Lindström:  Yes, for lists, it either matches exactly
>   (every item in the list), or there is no match.
> Niklas Lindström:  It should always be crystal clear when
>   something applies...
> Manu Sporny:  The issue with cornercases is it makes it too
>   complex. The choice is - don't deal with the corner cases, or
>   deal with them. Dealing with the corner cases leads to very
>   complex algorithms. Not dealing with the corner cases has two
>   possible outcomes; 1) Interoperability problems that contain data
>   in the corner cases - people might think JSON-LD sucks because it
>   gives back bad data when you .compact(), 2) Forcing people to
>   mark their data up in a specific way, which removes corner-cases
>   from JSON-LD data because that data doesn't work well with the
>   API. The first is bad, the second is good. No idea which one will
>   happen if we choose to ignore corner cases.
> Niklas Lindström:  Irregular data where you have mixed types with
>   the same terms are not compact-able, unless you have different
>   terms for different types used. It's obvious from looking at the
>   context that the context is written for irregular data.
> Manu Sporny:  Okay - maybe Markus and I need to write the
>   pseudocode for what we've discussed today, then we look at it as
>   a group, then decide what we want to go with and include it in
>   the spec.
>
> Topic: ISSUE-140: Consider objectify/link API method
>
> https://github.com/json-ld/json-ld.org/issues/140

> Manu Sporny:  This issue is about whether or not we should add a
>   link(), .graphify(), method to the API
> Manu Sporny:  I'm concerned that we don't have an algorithm to do
>   this yet... time issue for 1.0
> Niklas Lindström:  I'm concerned about timing - need to write
>   something in the wiki about this - perhaps I should collaborate
>   with Gregg and write this in a sibling specification.
> Manu Sporny:  I agree, I don't think we have the time to put this
>   in 1.0, but we should start working on it immediately.
> Niklas Lindström:  I took your jsonld.js implementation and took
>   out the framing part - needed a smaller code size - and I don't
>   think we need to do anything in the spec. It should be possible
>   to add things later on in a simple way. I don't think we have to
>   add anything in the API document for that.
> Niklas Lindström:  The .link() / .graphify() mechanism could be
>   extended in the same way the browsers are expanded - you just
>   extend as needed via an 'add-on' API.
> Niklas Lindström:  We have had a bunch of different names for
>   this - I've been using .connect() recently. I think we all agree
>   that .objectify() wasn't working... .graphify() might be a little
>   too odd.
> Manu Sporny:  I don't think we need to pick the name now... we
>   can wait until the spec goes to LC, even.
> Niklas Lindström:  We might want to add some sort of "indexing"
>   mechanism - something that allows you to index JSON-LD documents.
> Manu Sporny:  Something like a .view() call that is dynamically
>   updated.
> Manu Sporny:  There is a lot of potential for .graphify() /
>   .connect() and .index() / .view() - but the ideas are floating
>   out there right now... not finalized.
> Niklas Lindström:  There are a bunch of these sorts of libraries
>   for RDF - they all use the Class mechanism to define short names
>   bound to IRIs/coercions, which is exactly what the JSON-LD
>   context does in a language-agnostic way.
> Niklas Lindström:  To use a @context as a "lens" to access a live
>   RDF graph to act as if it is something live in memory (it could
>   come from a database backend over the Web/WebSockets)
> Niklas Lindström:  It makes it much easier to throw RDF into an
>   arbitrary templating systems.
> Manu Sporny:  I think we're saying that all of these things are
>   important, but we can't do it by JSON-LD 1.0.
> Markus Lanthaler:  I'm concerned that if we don't have .frame() /
>   .objectify() that people can't process these documents in an
>   arbitrary way.
> Manu Sporny:  Well they can, it just won't be 'standardized' -
>   jsonld.js still has .frame(), so does the Ruby implementation.
> Niklas Lindström:  Can we include a separate .graphify() 1.0,
>   that in 1.1 could evolve?
> Manu Sporny:  I'm concerned that we don't have any idea how these
>   APIs are going to evolve.
> Niklas Lindström:  We could always implement the core - then we
>   could add more indexes in the future? Maybe have a callback to do
>   your own indexes.
> Manu Sporny:  I think somebody needs to volunteer to write the
>   .graphify() / .index() spec - that will ensure that we know what
>   we're getting into if we have a stripped down version of the call
>   in the JSON-LD 1.0 API spec.
>
> Topic: Timeframe?
>
> Markus Lanthaler:  Is there a timeframe for publication?
> Niklas Lindström:
>   https://github.com/json-ld/json-ld.org/tree/master/spec/latest

> Manu Sporny:  Technically, we have to publish every 3-6 months.
>   RDF WG charter ends in January 2013 - so, ideally, we'd be at REC
>   in that time frame.
> David I. Lehn:  That is going to be very difficult to do.
> Manu Sporny:  I'll talk to the chairs about it.
>
> -- manu
>
> --
> Manu Sporny (skype: msporny, twitter: manusporny)
> President/CEO - Digital Bazaar, Inc.
> blog: HTML5 and RDFa 1.1
> http://manu.sporny.org/2012/html5-and-rdfa/

>
Received on Tuesday, 18 September 2012 17:54:40 UTC