- From: Manu Sporny <msporny@digitalbazaar.com>
- Date: Tue, 18 Sep 2012 13:26:06 -0400
- To: Linked JSON <public-linked-json@w3.org>
- CC: RDF WG <public-rdf-wg@w3.org>
The minutes from today's call are now available here:
http://json-ld.org/minutes/2012-09-18/
Full text of the discussion follows including a link to the audio
transcript:
--------------------
JSON-LD Community Group Telecon Minutes for 2012-09-18
Agenda:
http://lists.w3.org/Archives/Public/public-linked-json/2012Sep/0006.html
Topics:
1. ISSUE-113: IRI compaction algorithm
2. ISSUE-140: Consider objectify/link API method
3. Timeframe?
Chair:
Manu Sporny
Scribe:
Manu Sporny
Present:
Manu Sporny, Markus Lanthaler, Niklas Lindström, David I. Lehn
Audio:
http://json-ld.org/minutes/2012-09-18/audio.ogg
Manu Sporny is scribing.
Manu Sporny: Gregg, Francois not here today. Dave Lehn will be
here shortly. Let's discuss the approach for solving these
issues.
Manu Sporny: Any additions/changes to the Agenda?
No changes
Topic: ISSUE-113: IRI compaction algorithm
https://github.com/json-ld/json-ld.org/issues/113
Markus Lanthaler: The problem was that we never defined how
we're going to do IRI compaction, but that has been since
corrected, though not ideally the way we wanted it to be.
Markus Lanthaler: Gregg updated the spec - currently, there is
an algorithm that is not understandable without implementing it.
It isn't explained how the numbers were generated. If you don't
implement it, you have a difficult time understanding what the
algorithm is doing.
Markus Lanthaler: It's just a very difficult to understand
algorithm. It makes it quite difficult to explain to people what
compaction does. It's kind of a black box at the moment.
Manu Sporny: So, what's the plan here? Make the language
simpler?
Markus Lanthaler: We should consider IRI compaction algorithm
and term ranking algorithm when simplifying.
Markus Lanthaler: Pseudo-code in the issue is easier to
understand.
Markus Lanthaler: Gregg disagrees, and Dave needs more time to
look at it.
Manu Sporny: Dave Longley's concern is that all the algorithms,
because we're focused on corner cases, are getting difficult to
understand. Perhaps what we should do is simplify greatly, and
ignore corner cases. One way we could do this is say that if
there is ever a term conflict, that we should just throw an error
and have the error callback handle the selection of the proper
term. The problem with that approach is that developers may
choose the wrong way to select the term (or at the very least,
it's non-interoperable - or they have to publish their
algorithm). To get around that, we could publish the "proper"
term matching algorithm along with the JSON-LD API and that can
be the default for the .compact() option for the error handler.
The problem with that is that we end up having the same amount of
complexity in there that we do today.
Manu Sporny: The other option is that we can explain the
algorithm better, but that doesn't remove the complexity of the
algorithm. [scribe assist by Niklas Lindström]
Manu Sporny: we could explain the algorithm like this: the
algorithm picks the most specific term; but there are
complications for this in the edge cases. [scribe assist by
Niklas Lindström]
Manu Sporny: so should we simplify it, or can we settle for
explaining it better? [scribe assist by Niklas Lindström]
Markus Lanthaler: What do you mean by conflict?
Manu Sporny: Two terms that have the same IRI, but one of them
has a datatype - which one is picked?
Niklas Lindström: I haven't had time recently to grasp the
current algorithm. I hope that we could simplify it to some
extent.
Niklas Lindström: There are many edge cases, are there test
cases?
Manu Sporny: yes, lots of test cases.
Niklas Lindström: Perhaps having different terms for date vs.
datetime. Author name (dc:creator with a string) vs. with a URI
reference. Those would be good to keep.
Niklas Lindström: Not having spent too much time on this
recently, I hope that we could make some sort of binary check -
either it's a perfect match, or if there is a term for it, use
that. So, we don't have multiple steps for checking (to see if
there is something matching)
Manu Sporny: the current algorithm is a multistep process; it
ranks the terms. We do have test cases for them. [scribe assist
by Niklas Lindström]
Manu Sporny: There are multiple ways of implementing it. The
selection algorithm is very complex because it deals with all the
corner cases. [scribe assist by Niklas Lindström]
Manu Sporny: Dave Longely proposes to deal with less corner
cases, and raise an error if there's a corner case conflict. That
has advantages and disadvantages. [scribe assist by Niklas
Lindström]
Manu Sporny: The big issue is figuring out, when there is a
corner case, which term gets picked.
Niklas Lindström: If I have a property 'age' and a value that is
an integer, that would be straight forward to pick - that
property and three terms - if one of them was coerced to an
integer, that one would be picked. If a term was coerced to a
list, it wouldn't be picked.
Manu Sporny: The issue is that the algorithm to do that is
complex.
Niklas Lindström: I haven't actually implemented that algorithm
yet - I'm about to.
Niklas Lindström: I'd map the property IRI to an object that
itself has a type dictionary, a container dictionary, or a
default property IRI mapping.
Niklas Lindström: I can see there is a certain complexity
involved if you are looking for something that is both coerced to
a datatype and it has a certain container (ie: has multiple
values)... I don't understand why you need to rank items.
Markus Lanthaler: Gregg wrote it, so he'd know best.
Markus Lanthaler: I didn't implement it as it is in the spec, I
couldn't figure out how to implement it from the spec. The idea
is that you have a number of terms or complex IRIs
(prefix/suffixes), or even the full IRI, and you assign a number
to them (to the IRI/value pair) which expresses how well it
matches.
Markus Lanthaler: So, for example, if you have just one term
with one IRI a 1, but you have something that has a datatype and
it matches, that gets a value of 2 and wins, etc.
Markus Lanthaler:
http://json-ld.org/spec/latest/json-ld-api/#term-rank-algorithm
0 and term is ... you don't know how the numbers were created.
It's difficult to understand what's going on by looking at the
numbers.
Markus Lanthaler: My proposal is this...
https://github.com/json-ld/json-ld.org/issues/113#issuecomment-5567976
Manu Sporny: I think we should try and remove all the numbers in
the term ranking algorithm as a way of simplifying the way it is
explained. Perhaps we need to implement it as a map-reduce step
that always results in 0 or 1 term picked as a result. So, you
give the algorithm a list of potential terms that can be matched,
and a value that is being considered for match against all the
terms. The algorithm then whittles the list of IRIs down to 1 (if
a term matched) or 0 (if none of the terms match). This way,
there is no weirdness like rank = rank - 2.
Niklas Lindström: If you have this - [] dc:created
"2012-01-01T00:00:00"^^xsd:dateTime
Niklas Lindström: and this term: "created": "dc:created"
Niklas Lindström: Let me see if I understand this correctly...
Niklas Lindström: and this term: "dc:created": {"@type":
"xsd:dateTime"}
Niklas Lindström: What it we order the list so that you just go
down and ignore each item in the list until a selection is made?
Niklas Lindström: "createdTimeSet": {"@type": "xsd:dateTime",
"@container": "@set"}
Niklas Lindström: So, we could simplify by throwing out choices
that we don't want to make.... like given the choice between
terms and curies, throw out all the curies from the decision
before you make the decision?
Manu Sporny: The issue is that people might be surprised by
this, because the more accurate term wouldn't be selected.
Niklas Lindström: Then they should only use terms, or only use
CURIEs.
Niklas Lindström: If you don't want the terms to be picked, you
should be able to manage your own context in that scenario,
anyway.
Niklas Lindström: If we try to support that use case, I'm not
really sure if we're supporting that usage of @context anyway -
it's a complex usage of terms and CURIEs.
Manu Sporny: Perhaps we can do this map-reduce in 3 iterations,
instead? First removes @set/@list, second matches against
datatype/language, third picks by lexicographical value. That may
be easier for folks to understand?
Markus Lanthaler: Maybe we pick @set/@list first, then
@datatype/@language, then last step checks lexicographical/prefix
value?
Markus Lanthaler: Maybe it's enough to specify how the internal
inverse-context is sorted? Then we just go down the list of
internal inverse-context values and pick an item or skip it?
Niklas Lindström: Maybe we should investigate that - we cover
most of the needs - it's more direct/natural.
Manu Sporny: Okay, so loose consensus - we have a function that
takes in a list of terms and a value to match... the function
whittles down the list to one item by the end. The way it
whittles could be performed in 3 iterations, where each iteration
removes imperfect matches leaving 1 or 0 matches at the end. The
other way it could be whittled down is to sort the list of
potential term matches in some way, and then searches for an
"exact" match.
Error: (IRC nickname not recognized)[10:56] <mlnt> termA: @list,
typeA | termB: @list, typeB --> list: val1/typeA, val2/typeB,
val3/typeC
Markus Lanthaler: I would say this should choose typeA (lexigr.
least)
Markus Lanthaler: for list: val1/typeA, val2/typeB, val3/type
Manu Sporny: So, the approach could be less cognitively complex
and more algorithmically complex?
Niklas Lindström: Yeah, but only because we need to be more
accurate than we are now.
Manu Sporny: Dave Longley is concerned that when we chose the
word 'compact' that it was the wrong decision. The reason is that
people think it's supposed to end up with the least number of
bytes for the document. In reality, it's supposed to give back an
easy-to-use data structure for developers to use. So, when
compacting, we should ensure that we don't compact something that
shouldn't really be compacted. Like a list with mixed values
being compacted to a list of @datatypes that are xsd:integers
that would be the wrong thing to do.
Niklas Lindström: Yes, for lists, it either matches exactly
(every item in the list), or there is no match.
Niklas Lindström: It should always be crystal clear when
something applies...
Manu Sporny: The issue with cornercases is it makes it too
complex. The choice is - don't deal with the corner cases, or
deal with them. Dealing with the corner cases leads to very
complex algorithms. Not dealing with the corner cases has two
possible outcomes; 1) Interoperability problems that contain data
in the corner cases - people might think JSON-LD sucks because it
gives back bad data when you .compact(), 2) Forcing people to
mark their data up in a specific way, which removes corner-cases
from JSON-LD data because that data doesn't work well with the
API. The first is bad, the second is good. No idea which one will
happen if we choose to ignore corner cases.
Niklas Lindström: Irregular data where you have mixed types with
the same terms are not compact-able, unless you have different
terms for different types used. It's obvious from looking at the
context that the context is written for irregular data.
Manu Sporny: Okay - maybe Markus and I need to write the
pseudocode for what we've discussed today, then we look at it as
a group, then decide what we want to go with and include it in
the spec.
Topic: ISSUE-140: Consider objectify/link API method
https://github.com/json-ld/json-ld.org/issues/140
Manu Sporny: This issue is about whether or not we should add a
link(), .graphify(), method to the API
Manu Sporny: I'm concerned that we don't have an algorithm to do
this yet... time issue for 1.0
Niklas Lindström: I'm concerned about timing - need to write
something in the wiki about this - perhaps I should collaborate
with Gregg and write this in a sibling specification.
Manu Sporny: I agree, I don't think we have the time to put this
in 1.0, but we should start working on it immediately.
Niklas Lindström: I took your jsonld.js implementation and took
out the framing part - needed a smaller code size - and I don't
think we need to do anything in the spec. It should be possible
to add things later on in a simple way. I don't think we have to
add anything in the API document for that.
Niklas Lindström: The .link() / .graphify() mechanism could be
extended in the same way the browsers are expanded - you just
extend as needed via an 'add-on' API.
Niklas Lindström: We have had a bunch of different names for
this - I've been using .connect() recently. I think we all agree
that .objectify() wasn't working... .graphify() might be a little
too odd.
Manu Sporny: I don't think we need to pick the name now... we
can wait until the spec goes to LC, even.
Niklas Lindström: We might want to add some sort of "indexing"
mechanism - something that allows you to index JSON-LD documents.
Manu Sporny: Something like a .view() call that is dynamically
updated.
Manu Sporny: There is a lot of potential for .graphify() /
.connect() and .index() / .view() - but the ideas are floating
out there right now... not finalized.
Niklas Lindström: There are a bunch of these sorts of libraries
for RDF - they all use the Class mechanism to define short names
bound to IRIs/coercions, which is exactly what the JSON-LD
context does in a language-agnostic way.
Niklas Lindström: To use a @context as a "lens" to access a live
RDF graph to act as if it is something live in memory (it could
come from a database backend over the Web/WebSockets)
Niklas Lindström: It makes it much easier to throw RDF into an
arbitrary templating systems.
Manu Sporny: I think we're saying that all of these things are
important, but we can't do it by JSON-LD 1.0.
Markus Lanthaler: I'm concerned that if we don't have .frame() /
.objectify() that people can't process these documents in an
arbitrary way.
Manu Sporny: Well they can, it just won't be 'standardized' -
jsonld.js still has .frame(), so does the Ruby implementation.
Niklas Lindström: Can we include a separate .graphify() 1.0,
that in 1.1 could evolve?
Manu Sporny: I'm concerned that we don't have any idea how these
APIs are going to evolve.
Niklas Lindström: We could always implement the core - then we
could add more indexes in the future? Maybe have a callback to do
your own indexes.
Manu Sporny: I think somebody needs to volunteer to write the
.graphify() / .index() spec - that will ensure that we know what
we're getting into if we have a stripped down version of the call
in the JSON-LD 1.0 API spec.
Topic: Timeframe?
Markus Lanthaler: Is there a timeframe for publication?
Niklas Lindström:
https://github.com/json-ld/json-ld.org/tree/master/spec/latest
Manu Sporny: Technically, we have to publish every 3-6 months.
RDF WG charter ends in January 2013 - so, ideally, we'd be at REC
in that time frame.
David I. Lehn: That is going to be very difficult to do.
Manu Sporny: I'll talk to the chairs about it.
-- manu
--
Manu Sporny (skype: msporny, twitter: manusporny)
President/CEO - Digital Bazaar, Inc.
blog: HTML5 and RDFa 1.1
http://manu.sporny.org/2012/html5-and-rdfa/
Received on Tuesday, 18 September 2012 17:26:33 UTC