JSON-LD Telecon Minutes for 2012-09-18

The minutes from today's call are now available here:

http://json-ld.org/minutes/2012-09-18/

Full text of the discussion follows including a link to the audio
transcript:

--------------------
JSON-LD Community Group Telecon Minutes for 2012-09-18

Agenda:
   http://lists.w3.org/Archives/Public/public-linked-json/2012Sep/0006.html
Topics:
   1. ISSUE-113: IRI compaction algorithm
   2. ISSUE-140: Consider objectify/link API method
   3. Timeframe?
Chair:
   Manu Sporny
Scribe:
   Manu Sporny
Present:
   Manu Sporny, Markus Lanthaler, Niklas Lindström, David I. Lehn
Audio:
   http://json-ld.org/minutes/2012-09-18/audio.ogg

Manu Sporny is scribing.
Manu Sporny:  Gregg, Francois not here today. Dave Lehn will be
   here shortly. Let's discuss the approach for solving these
   issues.
Manu Sporny:  Any additions/changes to the Agenda?
No changes

Topic: ISSUE-113: IRI compaction algorithm

https://github.com/json-ld/json-ld.org/issues/113
Markus Lanthaler:  The problem was that we never defined how
   we're going to do IRI compaction, but that has been since
   corrected, though not ideally the way we wanted it to be.
Markus Lanthaler:  Gregg updated the spec - currently, there is
   an algorithm that is not understandable without implementing it.
   It isn't explained how the numbers were generated. If you don't
   implement it, you have a difficult time understanding what the
   algorithm is doing.
Markus Lanthaler:  It's just a very difficult to understand
   algorithm. It makes it quite difficult to explain to people what
   compaction does. It's kind of a black box at the moment.
Manu Sporny:  So, what's the plan here? Make the language
   simpler?
Markus Lanthaler:  We should consider IRI compaction algorithm
   and term ranking algorithm when simplifying.
Markus Lanthaler:  Pseudo-code in the issue is easier to
   understand.
Markus Lanthaler:  Gregg disagrees, and Dave needs more time to
   look at it.
Manu Sporny:  Dave Longley's concern is that all the algorithms,
   because we're focused on corner cases, are getting difficult to
   understand. Perhaps what we should do is simplify greatly, and
   ignore corner cases. One way we could do this is say that if
   there is ever a term conflict, that we should just throw an error
   and have the error callback handle the selection of the proper
   term. The problem with that approach is that developers may
   choose the wrong way to select the term (or at the very least,
   it's non-interoperable - or they have to publish their
   algorithm). To get around that, we could publish the "proper"
   term matching algorithm along with the JSON-LD API and that can
   be the default for the .compact() option for the error handler.
   The problem with that is that we end up having the same amount of
   complexity in there that we do today.
Manu Sporny:  The other option is that we can explain the
   algorithm better, but that doesn't remove the complexity of the
   algorithm. [scribe assist by Niklas Lindström]
Manu Sporny:  we could explain the algorithm like this: the
   algorithm picks the most specific term; but there are
   complications for this in the edge cases. [scribe assist by
   Niklas Lindström]
Manu Sporny:  so should we simplify it, or can we settle for
   explaining it better? [scribe assist by Niklas Lindström]
Markus Lanthaler:  What do you mean by conflict?
Manu Sporny:  Two terms that have the same IRI, but one of them
   has a datatype - which one is picked?
Niklas Lindström:  I haven't had time recently to grasp the
   current algorithm. I hope that we could simplify it to some
   extent.
Niklas Lindström:  There are many edge cases, are there test
   cases?
Manu Sporny:  yes, lots of test cases.
Niklas Lindström:  Perhaps having different terms for date vs.
   datetime. Author name (dc:creator with a string) vs. with a URI
   reference. Those would be good to keep.
Niklas Lindström:  Not having spent too much time on this
   recently, I hope that we could make some sort of binary check -
   either it's a perfect match, or if there is a term for it, use
   that. So, we don't have multiple steps for checking (to see if
   there is something matching)
Manu Sporny:  the current algorithm is a multistep process; it
   ranks the terms. We do have test cases for them. [scribe assist
   by Niklas Lindström]
Manu Sporny:  There are multiple ways of implementing it. The
   selection algorithm is very complex because it deals with all the
   corner cases. [scribe assist by Niklas Lindström]
Manu Sporny:  Dave Longely proposes to deal with less corner
   cases, and raise an error if there's a corner case conflict. That
   has advantages and disadvantages. [scribe assist by Niklas
   Lindström]
Manu Sporny:  The big issue is figuring out, when there is a
   corner case, which term gets picked.
Niklas Lindström:  If I have a property 'age' and a value that is
   an integer, that would be straight forward to pick - that
   property and three terms - if one of them was coerced to an
   integer, that one would be picked. If a term was coerced to a
   list, it wouldn't be picked.
Manu Sporny:  The issue is that the algorithm to do that is
   complex.
Niklas Lindström:  I haven't actually implemented that algorithm
   yet - I'm about to.
Niklas Lindström:  I'd map the property IRI to an object that
   itself has a type dictionary, a container dictionary, or a
   default property IRI mapping.
Niklas Lindström:  I can see there is a certain complexity
   involved if you are looking for something that is both coerced to
   a datatype and it has a certain container (ie: has multiple
   values)... I don't understand why you need to rank items.
Markus Lanthaler:  Gregg wrote it, so he'd know best.
Markus Lanthaler:  I didn't implement it as it is in the spec, I
   couldn't figure out how to implement it from the spec. The idea
   is that you have a number of terms or complex IRIs
   (prefix/suffixes), or even the full IRI, and you assign a number
   to them (to the IRI/value pair) which expresses how well it
   matches.
Markus Lanthaler:  So, for example, if you have just one term
   with one IRI a 1, but you have something that has a datatype and
   it matches, that gets a value of 2 and wins, etc.
Markus Lanthaler:
   http://json-ld.org/spec/latest/json-ld-api/#term-rank-algorithm
0 and term is ... you don't know how the numbers were created.
   It's difficult to understand what's going on by looking at the
   numbers.
Markus Lanthaler: My proposal is this...
   https://github.com/json-ld/json-ld.org/issues/113#issuecomment-5567976
Manu Sporny:  I think we should try and remove all the numbers in
   the term ranking algorithm as a way of simplifying the way it is
   explained. Perhaps we need to implement it as a map-reduce step
   that always results in 0 or 1 term picked as a result. So, you
   give the algorithm a list of potential terms that can be matched,
   and a value that is being considered for match against all the
   terms. The algorithm then whittles the list of IRIs down to 1 (if
   a term matched) or 0 (if none of the terms match). This way,
   there is no weirdness like rank = rank - 2.
Niklas Lindström: If you have this - [] dc:created
   "2012-01-01T00:00:00"^^xsd:dateTime
Niklas Lindström: and this term: "created": "dc:created"
Niklas Lindström:  Let me see if I understand this correctly...
Niklas Lindström: and this term: "dc:created": {"@type":
   "xsd:dateTime"}
Niklas Lindström:  What it we order the list so that you just go
   down and ignore each item in the list until a selection is made?
Niklas Lindström: "createdTimeSet": {"@type": "xsd:dateTime",
   "@container": "@set"}
Niklas Lindström:  So, we could simplify by throwing out choices
   that we don't want to make.... like given the choice between
   terms and curies, throw out all the curies from the decision
   before you make the decision?
Manu Sporny:  The issue is that people might be surprised by
   this, because the more accurate term wouldn't be selected.
Niklas Lindström:  Then they should only use terms, or only use
   CURIEs.
Niklas Lindström:  If you don't want the terms to be picked, you
   should be able to manage your own context in that scenario,
   anyway.
Niklas Lindström:  If we try to support that use case, I'm not
   really sure if we're supporting that usage of @context anyway -
   it's a complex usage of terms and CURIEs.
Manu Sporny:  Perhaps we can do this map-reduce in 3 iterations,
   instead? First removes @set/@list, second matches against
   datatype/language, third picks by lexicographical value. That may
   be easier for folks to understand?
Markus Lanthaler:  Maybe we pick @set/@list first, then
   @datatype/@language, then last step checks lexicographical/prefix
   value?
Markus Lanthaler:  Maybe it's enough to specify how the internal
   inverse-context is sorted? Then we just go down the list of
   internal inverse-context values and pick an item or skip it?
Niklas Lindström:  Maybe we should investigate that - we cover
   most of the needs - it's more direct/natural.
Manu Sporny:  Okay, so loose consensus - we have a function that
   takes in a list of terms and a value to match... the function
   whittles down the list to one item by the end. The way it
   whittles could be performed in 3 iterations, where each iteration
   removes imperfect matches leaving 1 or 0 matches at the end. The
   other way it could be whittled down is to sort the list of
   potential term matches in some way, and then searches for an
   "exact" match.
Error: (IRC nickname not recognized)[10:56] <mlnt> termA: @list,
   typeA | termB: @list, typeB --> list: val1/typeA, val2/typeB,
   val3/typeC
Markus Lanthaler: I would say this should choose typeA (lexigr.
   least)
Markus Lanthaler: for list: val1/typeA, val2/typeB, val3/type
Manu Sporny:  So, the approach could be less cognitively complex
   and more algorithmically complex?
Niklas Lindström:  Yeah, but only because we need to be more
   accurate than we are now.
Manu Sporny:  Dave Longley is concerned that when we chose the
   word 'compact' that it was the wrong decision. The reason is that
   people think it's supposed to end up with the least number of
   bytes for the document. In reality, it's supposed to give back an
   easy-to-use data structure for developers to use. So, when
   compacting, we should ensure that we don't compact something that
   shouldn't really be compacted. Like a list with mixed values
   being compacted to a list of @datatypes that are xsd:integers
   that would be the wrong thing to do.
Niklas Lindström:  Yes, for lists, it either matches exactly
   (every item in the list), or there is no match.
Niklas Lindström:  It should always be crystal clear when
   something applies...
Manu Sporny:  The issue with cornercases is it makes it too
   complex. The choice is - don't deal with the corner cases, or
   deal with them. Dealing with the corner cases leads to very
   complex algorithms. Not dealing with the corner cases has two
   possible outcomes; 1) Interoperability problems that contain data
   in the corner cases - people might think JSON-LD sucks because it
   gives back bad data when you .compact(), 2) Forcing people to
   mark their data up in a specific way, which removes corner-cases
   from JSON-LD data because that data doesn't work well with the
   API. The first is bad, the second is good. No idea which one will
   happen if we choose to ignore corner cases.
Niklas Lindström:  Irregular data where you have mixed types with
   the same terms are not compact-able, unless you have different
   terms for different types used. It's obvious from looking at the
   context that the context is written for irregular data.
Manu Sporny:  Okay - maybe Markus and I need to write the
   pseudocode for what we've discussed today, then we look at it as
   a group, then decide what we want to go with and include it in
   the spec.

Topic: ISSUE-140: Consider objectify/link API method

https://github.com/json-ld/json-ld.org/issues/140
Manu Sporny:  This issue is about whether or not we should add a
   link(), .graphify(), method to the API
Manu Sporny:  I'm concerned that we don't have an algorithm to do
   this yet... time issue for 1.0
Niklas Lindström:  I'm concerned about timing - need to write
   something in the wiki about this - perhaps I should collaborate
   with Gregg and write this in a sibling specification.
Manu Sporny:  I agree, I don't think we have the time to put this
   in 1.0, but we should start working on it immediately.
Niklas Lindström:  I took your jsonld.js implementation and took
   out the framing part - needed a smaller code size - and I don't
   think we need to do anything in the spec. It should be possible
   to add things later on in a simple way. I don't think we have to
   add anything in the API document for that.
Niklas Lindström:  The .link() / .graphify() mechanism could be
   extended in the same way the browsers are expanded - you just
   extend as needed via an 'add-on' API.
Niklas Lindström:  We have had a bunch of different names for
   this - I've been using .connect() recently. I think we all agree
   that .objectify() wasn't working... .graphify() might be a little
   too odd.
Manu Sporny:  I don't think we need to pick the name now... we
   can wait until the spec goes to LC, even.
Niklas Lindström:  We might want to add some sort of "indexing"
   mechanism - something that allows you to index JSON-LD documents.
Manu Sporny:  Something like a .view() call that is dynamically
   updated.
Manu Sporny:  There is a lot of potential for .graphify() /
   .connect() and .index() / .view() - but the ideas are floating
   out there right now... not finalized.
Niklas Lindström:  There are a bunch of these sorts of libraries
   for RDF - they all use the Class mechanism to define short names
   bound to IRIs/coercions, which is exactly what the JSON-LD
   context does in a language-agnostic way.
Niklas Lindström:  To use a @context as a "lens" to access a live
   RDF graph to act as if it is something live in memory (it could
   come from a database backend over the Web/WebSockets)
Niklas Lindström:  It makes it much easier to throw RDF into an
   arbitrary templating systems.
Manu Sporny:  I think we're saying that all of these things are
   important, but we can't do it by JSON-LD 1.0.
Markus Lanthaler:  I'm concerned that if we don't have .frame() /
   .objectify() that people can't process these documents in an
   arbitrary way.
Manu Sporny:  Well they can, it just won't be 'standardized' -
   jsonld.js still has .frame(), so does the Ruby implementation.
Niklas Lindström:  Can we include a separate .graphify() 1.0,
   that in 1.1 could evolve?
Manu Sporny:  I'm concerned that we don't have any idea how these
   APIs are going to evolve.
Niklas Lindström:  We could always implement the core - then we
   could add more indexes in the future? Maybe have a callback to do
   your own indexes.
Manu Sporny:  I think somebody needs to volunteer to write the
   .graphify() / .index() spec - that will ensure that we know what
   we're getting into if we have a stripped down version of the call
   in the JSON-LD 1.0 API spec.

Topic: Timeframe?

Markus Lanthaler:  Is there a timeframe for publication?
Niklas Lindström:
   https://github.com/json-ld/json-ld.org/tree/master/spec/latest
Manu Sporny:  Technically, we have to publish every 3-6 months.
   RDF WG charter ends in January 2013 - so, ideally, we'd be at REC
   in that time frame.
David I. Lehn:  That is going to be very difficult to do.
Manu Sporny:  I'll talk to the chairs about it.

-- manu

-- 
Manu Sporny (skype: msporny, twitter: manusporny)
President/CEO - Digital Bazaar, Inc.
blog: HTML5 and RDFa 1.1
http://manu.sporny.org/2012/html5-and-rdfa/

Received on Tuesday, 18 September 2012 17:26:33 UTC