JSON-LD Telecon Minutes for 2012-09-11

Thanks to François for scribing! The minutes from last week's call are
now available here:

http://json-ld.org/minutes/2012-09-11/

Full text of the discussion follows including a link to the audio
transcript:

--------------------
JSON-LD Community Group Telecon Minutes for 2012-09-11

Agenda:
   http://lists.w3.org/Archives/Public/public-linked-json/2012Sep/0004.html
Topics:
   1. ISSUE-159: Add specifying @language to expanded form
Chair:
   Manu Sporny
Scribe:
   François Daoust
Present:
   François Daoust, Manu Sporny, Markus Lanthaler, Niklas Lindström,
   Stéphane Corlosquet, Lin Clark
Audio:
   http://json-ld.org/minutes/2012-09-11/audio.ogg

François Daoust: [Manu going through the agenda. A couple of
   issues may not be resolved today as there are too many proposals
   on the table]
François Daoust is scribing.

Topic: ISSUE-159: Add specifying @language to expanded form

Manu Sporny: https://github.com/json-ld/json-ld.org/issues/159
Manu Sporny:  Issue has to do with round-tripping language-map
   stuff.
   ... We added support for Drupal community and Wikidata
   community.
   ... No context in expanded form, otherwise we'd have to
   interpret this in very weird ways.
   ... Question I asked the Wikidata community was "Why not work
   in compact form?"
   ... Having languages as keys gives direct access to data
   ... The problem is now to define how the expanded form is
   generated from the compact form so that we can get back to the
   compact form afterwards.
Markus Lanthaler:
   https://github.com/json-ld/json-ld.org/issues/159#issuecomment-8455585
Markus Lanthaler:  If you have @language in expanded form, there
   might be collisions with @language that are already there or with
   properties that are of other types and do not accept @language.
   ... See comment in github issue
   ... One option to solve this would be to keep a @context in
   expanded form, but not what we'd like to have.
Niklas Lindström:  Precedence is good in any case. Even in
   compact form.
Manu Sporny:  Yes. If we have precedence, does it address your
   concern Markus?
Stéphane Corlosquet: are you guys saying that in any case, any
   typed value could not have a language?
Markus Lanthaler:  for a plain literal, it wouldn't because you
   cannot add @language to a plain literal.
Niklas Lindström:  we understand we're diverging from RDF here
   [scribe assist by Stéphane Corlosquet]
   ... It's strange to have language information in expanded
   form. The only way to describe this is RDF is to have a named
   graph.
   ... (scribe missed details)
Manu Sporny: "term": { "@language": {"en": ..., "de": ...}}
Manu Sporny: "http://foo.bar/vocab#term": { "@language": {"en":
   ..., "de": ...}}
Manu Sporny:  wondering if we could do something like the snippet
   I just pasted
Markus Lanthaler:  The problem is that we're trying to express
   data that is not there. It's metadata.
Niklas Lindström:  The expanded form is an abstract triple
   representation and what we do with language maps (and id-maps for
   that matter) is just reify indexing.
   ... Only if we stay within JSON-LD and expand/compact would
   you get round-tripping.
Manu Sporny:  The concern in the Drupal community is that you
   could get something different out.
Niklas Lindström:  The only thing expanded are terms. That's the
   only expansion we've talked about. Perhaps that's a good concept.
Manu Sporny:  I don't know if ends up becoming a different type
   of form for JSON-LD.
Stéphane Corlosquet:  Niklas, you were talking about
   round-tripping in RDF.
   ... It wouldn't be a concern in Drupal because it's never used
   internally.
   ... Our goal is not necessarily to output RDF in the end.
   ... What we'd like to do is use the compact form, expand it
   and process it.
   ... We just want to have the language in the expanded form.
   ... Getting the same data from compaction is not exactly our
   use case.
   ... You guys may want to recompact it again and get the same
   data, but not exactly what we need in practice.
Niklas Lindström:  I can understand your use case. I touched upon
   it during a RDFa to JSON-LD workshop.
   ... If we want to support it, we should do it via the notion
   of term expansion, not full expansion.
Manu Sporny:  Just a quick explanation about the Drupal use case.
   Every Drupal site has a slightly different context.
   ... Tags can have different information associated with them
   across Drupal sites.
Stéphane Corlosquet: can be anything, 'tags' is just an example
Manu Sporny:  Those tags are kind of b-nodes.
   ... When two Drupal sites share data, one of them is going to
   export data as JSON-LD, using its context, probably expanding it.
   ... The targeted Drupal site will process the received data,
   using the expanded form as input and compacting using the target
   context.
   ... The idea that we need to reconstruct the language map is a
   pretty strong requirement.
   ... I also think that both Niklas and Markus have very strong
   points.
Manu Sporny: "http://foo.bar/vocab#term": { "@language": {"en":
   ..., "de": ...}}
   ... The only solution that I can see working that doesn't have
   the issue Markus raised in the beginning is the idea I share on
   IRC
   ... I don't see any issue with this, but I may miss something.
Markus Lanthaler: alternative: { "@context": { "langmap":
   "example.com/vocab/term#" }, "langmap:de": ..., "langmap:en": ...
   }
Markus Lanthaler: perhaps additionaly define "langmap:de": {
   "@language": "de" } in context or add context inline
Markus Lanthaler:  I don't see an issue with that but proposing
   another alternative on IRC for Stéphane.
Lin Clark: hey all, I'm on call now as well
Markus Lanthaler: langmap:de - example.com/vocab/term#de
Markus Lanthaler: langmap:it - example.com/vocab/term#it
Markus Lanthaler: example.com/vocab/term#it
Markus Lanthaler:  basically, you'd have different terms for
   different properties.
Stéphane Corlosquet:  How would you re-compact this in the end?
Markus Lanthaler: { "@context": { "langmap":
   "example.com/vocab/term#" }
Markus Lanthaler: langmap:LANGUAGE
Markus Lanthaler:  with the context just pasted on IRC, you would
   just re-generate the initial data
Lin Clark:  That sounds a lot like the proposal Manu had made
   initially
Manu Sporny:  There's a downside (missed by scribe) to that that
   explains why we had left the idea in the end.
   ... The only reason why we want it in expanded form is to be
   able to recompact it in lossless form.
   ... This idea of being able to tell whether something came
   from a language map is to reconstruct the same structure in the
   end.
   ... There may be times that you express values in expanded
   form where you didn't want them to be necessarily put back in
   language maps.
Niklas Lindström:  The question is whether data coming from
   language-based data can be reconstructing. Any deviation from
   that should not use language maps to compact because that would
   always give weird results.
   ... If you start mixing from various sources, you may have
   titles in English but description in Italian, then properties
   would fall in different buckets if you use language maps.
Manu Sporny: I'm proposing this: "http://foo.bar/vocab#term": {
   "@language": {"en": ..., "de": ...}} because... 1-to-1 mapping
term
[Markus and Manu discussing examples of expansion/compaction]
Niklas Lindström:  I wonder if the expanded form you're proposing
   here would solve the problem of combining two sources.
   ... It seems to require things from the compaction algorithm.
Manu Sporny:  Let's say you have two documents that use the same
   IRI term and you expand.
   ... Without a flag and with the rank algorithm that we have,
   there wouldn't be any problem.
   ... The term with the language map would be separated from the
   term without the language map.
   ... That's for when we don't flatten.
   ... If we do flatten, (scribe missed that), that would address
   the issue.
Niklas Lindström:  I'd rather we put information in the different
   buckets in expanded form so that compaction be done
   deterministically
Manu Sporny:  and it's a fairly expensive operation when the data
   gets bigger. I agree with you Niklas. If we could simplify, we
   should.
   ... It turns out that, each time we need to look into details,
   we end up with things that are fairly complex. The ranking
   algorithm is a good example of this. It becomes impossible to
   know what will happen without understanding the algorithm itself.
   ... All that to say that I agree in principle, but I'm worried
   about the algorithm will become more complex than expressing a
   1-to-1 mapping with language maps.
Niklas Lindström:  The problem is that we're trying to express
   something that we cannot even express in our data model.
Lin Clark:  Are there differences between RDF data model and
   JSON-LD data model?
   ... I saw discussions from Gregg
Manu Sporny:  This is kind of corner state. We don't make use of
   the differences for the time being, although there is a tiny
   difference, indeed.
   ... We just have to be very careful if we say JSON-LD uses RDF
   data model since that's not entirely true.
Niklas Lindström: {'@language': 'en', '@id':
   'http://example.com/tags/foo', 'label': ' Foo'}
Niklas Lindström: {'@language': 'de', '@id':
   'http://example.com/tags/baz', 'label': ' Baz'}
Niklas Lindström:  Example on IRC. Different resources because
   different IDs.
   ... The node themselves have not, in RDF terms, any language
   expressed.
Niklas Lindström: {'@language': 'en', '@id':
   'http://example.com/tags/foo', 'label': ' Foo'}
Niklas Lindström: {'@language': 'de', '@id':
   'http://example.com/tags/foo', 'label': ' Foo'}
   ... You can infer that "Foo" seems to be in English.
   ... but that's all.
   ... Now consider the second example, where IDs are the same.
Niklas Lindström: {'dc:language': 'en', '@id':
   'http://example.com/tags/foo', 'label': ' Foo'}
   ... We have a problem here because it's not clear whether we
   want to reify the language. Do we want to say that the node is
   somehow intrinsically associated with English, then you should
   use 'dc:language'.
Manu Sporny: "term": {"en": "Foo", "de": "Bar"}
   ... That is quite different from that there is an English
   label about this.
Manu Sporny:  On the opposite, we need to account for very simple
   examples such as the one I just pasted.
Niklas Lindström: {'dc:language': 'en', '@id':
   'http://example.com/tags/foo', 'label': ' Foo'}
Niklas Lindström: {'@language': 'en', '@id':
   'http://example.com/tags/foo', 'label': ' Foo'}
Niklas Lindström: {'@language': 'de', '@id':
   'http://example.com/tags/foo', 'label': ' Foo'}
Niklas Lindström:  actually, that's simple and straightforward.
Stéphane Corlosquet:  I just wanted to jump on Niklas comments.
   ... When you use 'dc:language', you say that the resource is
   in English
   ... (scribe missed description because of noise)
Niklas Lindström:  you have two different resources, one being a
   translation of the other.
Lin Clark:  No, they don't want to have separate graphs.
   ... Different properties in different languages.
   ... You would have an author field on the node. That field
   count point to Stéphane for the French version and to myself in
   the English version.
   ... I understand that in the RDF model, it would be understood
   as two different graphs.
   ... If we start to introduce complex syntax, people will get
   lost, and it's just not worth it for the 2-3 people that
   understand this.
Manu Sporny:  We understand the need to have simple ways of
   accessing the data.
Niklas Lindström:  I object to this. This has nothing to do with
   simplicity of accessing the data, but with simplicity of modeling
   the data.
Manu Sporny:  I don't think it applies to the Drupal use case. I
   don't think they should have to change data modeling for this.
Niklas Lindström:  I am not a fundamentalist here, we have to
   find a pragmatical solution to the issue.
Lin Clark:  The translation to us is not a different resource.
Niklas Lindström:  but there are two different translations.
Manu Sporny: aside - "http://foo.bar/vocab#term": { "@language":
   {"en": ..., "de": ...}} also allows the Drupal folks to work w/
   expanded form, if they need to.
Niklas Lindström: ... {"@id": "/resource", translation": {"en":
   {"author": {"@id": "/lin"}}, "de": {"author": {"@id":
   "/stephane"}}}
Niklas Lindström:  you don't have to describe the translation in
   any more detail than in the code I just pasted.
Markus Lanthaler: alternative {"@id": "/resource", "en":
   {"author": {"@id": "/lin"}}, "de": {"author": {"@id":
   "/stephane"}}}
Markus Lanthaler: where "en" is a property like
   example.com/vocab/languages/en
   ... You can have a property that combines translations
Markus Lanthaler:  along the same lines as Niklas
Markus Lanthaler: alternative {"@id": "/resource", "en":
   {"author": {"@id": "/lin"}}, "de": {"author": {"@id":
   "/stephane"}}}
Lin Clark:  I actually suggested that to our multilingual
   initiative, but they put so much work in it and it's already
   almost done that I don't think that we can or we should change
   our data model at this point.
Niklas Lindström:  From an implementation perspective, it's more
   or less the same.
Lin Clark:  They're doing a lot of stuff in the multilingual
   initiative that I'm not involved with, so I can't speak
   particularly to all the details.
   ... I don't think we can convince everyone that it's worth it
   because of JSON-LD.
Markus Lanthaler:  how would it help to turn the structure
   around? (assuming we coud) [scribe assist by Stéphane Corlosquet]
Markus Lanthaler:  but you wouldn't mind if "en" would not expand
   to full IRIs in expanded form.
Markus Lanthaler: {"@id": "/resource", {"author": { "en" {"@id":
   "/lin"}}, "de": {"@id": "/stephane"}}}
Markus Lanthaler: {"@id": "/resource", {"author": {
   "http://example.com/en" {"@id": "/lin"}},
   "http://example.com/de": {"@id": "/stephane"}}}
Markus Lanthaler:  Something like this would work for you, right?
   ... No big deal if it becomes something like this in expanded
   form, right?
Lin Clark:  Then can it compact back to the other form?
Markus Lanthaler:  yes, you wouldn't event need language map for
   that.
Manu Sporny:  The one concern is that we're going to have terms
   for each language.
Markus Lanthaler:  is that really an issue?
Manu Sporny:  If you're expressing languages as predicates, the
   data is jammed.
Markus Lanthaler:  right, but that's you have. It's a predicate,
   not a language.
Manu Sporny:  My only concern is that if Drupal wants to move to
   RDF in the future, then that direction might be problematic
   longer term.
Stéphane Corlosquet:  probably not a real concern for the time
   being.
Markus Lanthaler: Here's how the example could work today -
   completely round-trippable: http://bit.ly/P8i7h7
Lin Clark:  When we come to that, we could update what's needed
   to move things to the RDF data modeling
Lin Clark: I got bumped
Manu Sporny:  OK, it definitely works. I don't know if it's good
   to model data in that way. I feel uneasy about it.
   ... The other concern I have is that if it works for Drupal
   folks, and if that works as well for Wikidata folks, then there's
   a question about supporting language maps in the end.
Manu Sporny:  it didn't pick up first try [scribe assist by Lin
   Clark]
Markus Lanthaler:  I wonder if language map couldn't be
   restricted to simple values such as "title.en" resolves to the
   English title
Lin Clark: now it is busy
Markus Lanthaler: Wikidata apparantly just uses simple language
   maps: http://meta.wikimedia.org/wiki/Wikidata/Data_model_in_JSON
Manu Sporny:  Ok, we spent an hour on this. We should step back
   and think a bit more about it.
Lin Clark: dialing in
   ... We have two fairly proposals on the table.
   ... 1) Languages become IRIs, 2) 1-to-1 between
   compact/expanded form for language maps.
Niklas Lindström:  I wonder where the title would end up in the
   example Markus wrote up. Would there be a similar map for each
   thing or would we want to group them in language buckets?
Markus Lanthaler:  that's what I suggested initially but Lin
   suggested they would rather have properties before languages.
Manu Sporny:  any objection to move on to next issue and track
   this up in github comments?
Markus Lanthaler:  Do we want to support complex language maps?
Manu Sporny:  My gut feeling is that, if we're going to support
   language maps, we need to support all of Drupal's needs. I don't
   know if it's worth the complexity to add language maps for
   literal values only.
   ... We could associate the language with a term in the
   context. If we go with the approach Markus proposed, I don't
   think we need language maps in the end. In the context, you would
   have term definitions for languages.
Manu Sporny: "en": {"@id": "http://purl.org/bcp47#en",
   "@language": "en"}
   ... That would expand to:
Manu Sporny: "en": "Foo" - "http://purl.org/bcp47#en": {"@value":
   "Foo", "@language": en}
Manu Sporny:  The way we're modeling this does not really map to
   RDF, that's what I'm concerned about.
Niklas Lindström:  I do think that things such as freebase may
   benefit from data exported by Drupal sites
Stéphane Corlosquet:  I don't think we should be blocking things
   here. We could create IRIs for each translations and so on if we
   really need to.
Lin Clark: hmm, I can't hear what was said but Crell specifically
   requested we not create IRIs for each translation
Lin Clark:  was talking about how to handle things in RDF [scribe
   assist by Stéphane Corlosquet]
Stéphane Corlosquet: not in JSON-LD
Lin Clark: what we've discussed before is that you lose the
   language handling for objects that are resources
Lin Clark: I don't think we want to have different subject IRIs
   between JSON-LD and other RDF formats
Manu Sporny:  I'm not convinced that we need to model the data in
   the way Markus and Niklas are proposing. It works for Drupal
   folks but I don't think it's the right way to model it as RDF.
Lin Clark:  yes - we said we could discuss this outside the call
   [scribe assist by Stéphane Corlosquet]
Stéphane Corlosquet: we haven't decided or changed anything since
   when you dropped
   ... The other concern that I have is that JSON-LD should be
   able to cope with data as modeled, especially in cases such as
   Drupal when it's difficult to identify a right/wrong way of
   modeling data.
Lin Clark: yeah, I was able to get back in now
Manu Sporny: I think these are the options available to us right
   now:
Manu Sporny: 1) Ask Drupal to change the data model
   (non-starter),
Manu Sporny: 2) Adopt a 1-to-1 mapping between compact/expanded
   form for language maps, (adds complexity to syntax)
Manu Sporny: 3) Adopt a complex algorithm to reconstruct language
   maps from expanded form, (adds complexity to API, and may be
   non-deterministic)
Manu Sporny: 4) Model the data using BCP47 language code IRIs.
   (problematic from an RDF data model standpoint)
Manu Sporny: each has annoying down-sides.

-- manu

-- 
Manu Sporny (skype: msporny, twitter: manusporny)
President/CEO - Digital Bazaar, Inc.
blog: Which is better - RDFa Lite or Microdata?
http://manu.sporny.org/2012/mythical-differences/

Received on Tuesday, 18 September 2012 17:24:46 UTC