JSON-LD Telecon Minutes for 2012-12-18 from Manu Sporny on 2012-12-18 (public-rdf-wg@w3.org from December 2012)

From: Manu Sporny <msporny@digitalbazaar.com>
Date: Tue, 18 Dec 2012 13:18:03 -0500
To: Linked JSON <public-linked-json@w3.org>
CC: RDF WG <public-rdf-wg@w3.org>
Message-ID: <50D0B35B.7040901@digitalbazaar.com>
The minutes from today's telecon are now available. Thanks to Niklas for
scribing!

http://json-ld.org/minutes/2012-12-18/

Full text of the discussion follows including a link to the audio
transcript:

--------------------
JSON-LD Community Group Telecon Minutes for 2012-12-18

Agenda:
   http://lists.w3.org/Archives/Public/public-linked-json/2012Dec/0030.html
Topics:
   1. Schedule for telecons and publication
   2. JSON-LD Test Suite
   3. Renaming of blank nodes
   4. ISSUE-203: Validate IRIs and language tags
   5. ISSUE-109: Add flatten() method to JSON-LD API
   6. ISSUE-206: Clarify that the algorithms operate a copy of
      the input
Resolutions:
   1. Rename all blank node identifiers when doing expansion.
   2. JSON-LD Processors MAY issue validation warnings for
      malformed IRIs and BCP47 language strings, but they MUST NOT
      attempt to correct validation errors.
   3. Add a .flatten() method to the JSON-LD API, which returns
      all data in flattened, compact form. Remove the flatten flag from
      the .expand() and .compact() methods. Ensure that the .flatten()
      method preserves data in named graphs.
   4. Any input to JSON-LD API methods MUST NOT be modified.
Chair:
   Manu Sporny
Scribe:
   Niklas Lindström
Present:
   Niklas Lindström, Manu Sporny, Gregg Kellogg, Markus Lanthaler,
   Dave Longley, David I. Lehn
Audio:
   http://json-ld.org/minutes/2012-12-18/audio.ogg

Niklas Lindström is scribing.

Two more Agenda items suggested by Gregg: the test suite and
   pervasive renaming of bnodes

Topic: Schedule for telecons and publication

Manu Sporny:  next two telecons cancelled due to holidays. at
   least one telecon before last call at end of january
   … when we go to last call, we must include text to specify
   that the algorithms may need to change to address bugs and those
   changes may be significant based on the severity of the issue. We
   want to do this to ensure that an annoying corner-case bug won't
   make us have to go through another Last Call. We're fairly
   certain what these algorithms should be doing, and no matter how
   many times we've reviewed them, we'll find issues that we have to
   fix w/ the algorithm through LC and CR.

Topic: JSON-LD Test Suite

Gregg Kellogg:  a couple of things to do: how to deal with
   options and callback behavior
   … e.g. option for RDF to use native types
   … options for context given in option for use in expansion,
   etc.
   … more concerning: the granularity of tests
   … each test tests some particular aspect of an algorithm, but
   does so in many parallel ways
   … add as small a test as possible, to make it easy to detect
   what causes an error
Manu Sporny:  I agree
   … same problems in the early days of the RDFa test suite
   … we may want to different suites, one for the syntax, one for
   the api
   … the latter may benefit from a real JS test runner
   ... or else we may end up with a meta language to control
   flags etc.
   ... so we should simplify the tests to make them more atomic
Gregg Kellogg:  for the RDFa tests we used (e.g.) query
   parameters to set options/flags
   ... we might be able to use that
   ... problem with js test framework is that it only works for
   js
Markus Lanthaler: ok.. just a sec
Markus Lanthaler:  I agree that we should define the tests to be
   independent of the implementation language
   ... we could use JSON to set options
   … we should have minimal tests, but we also need some complex
   input data to test corner cases
   … sometimes things work in separation, but certain things only
   happen when combined
Gregg Kellogg:  yes, there is a need for those complex things as
   well. We might be able to separate them within the numbering of
   tests
   … if someone passes all the simple tests, we should attempt to
   find the smallest example which triggers a problem with
   combinations
   … we could put all the complex tests starting with 1000
Manu Sporny:  so more than one feature is an integration test,
   starting at 1000
Gregg Kellogg:  even one feature, like IRI expansion, needs to
   test many variants
   … but we need to find the simplest possible input data for
   those as well
Markus Lanthaler:  what are the requirements from W3C regarding
   tests?
Manu Sporny:  what's needed is an implementation report showing
   at least two independent interoperable implementations
   … but test suites makes that much simpler to measure
Gregg Kellogg:  not always though; automated test runners aren't
   always the best; it's useful to have independent test runners
   generating EARL reports, which we can collate and put into the
   report
   … I'm not against it, but it was a complicated setup for RDFa
Discussion about the balance between test suite runner
   implementation difficulties vs. getting reports from
   implementations in general.
Manu Sporny:  main reason for automated test runner is not to
   block us when implementations develop and need to be verified
Manu Sporny:  It would be very good to have it running completely
   in the browser
   … I'll make an attempt in the coming month
Gregg Kellogg:  leveraging the rdfa runner may be feasible
Manu Sporny:  so, we want to make the test suite more atomic, and
   separate unit and integration tests (starting on 1000)
   … and attempt to make an online test runner

Topic: Renaming of blank nodes

Gregg Kellogg:  Markus wrote a test to verify that blank nodes
   are renamed.
   … the use of bnodes in expansion, for e.g. property generators
   and node defs not containing an id; so that duplication doesn't
   create a new node.
Markus Lanthaler: Discussion about this was here:
   https://github.com/json-ld/json-ld.org/issues/160#issuecomment-11046185
   … problem is that if you pick a bnode identifier it mustn't
   collide with an existing one. one solution is to rename all of
   them.
   …. but that may create problems for implementations, e.g. it
   happens right now for the wikimedia stuff
Manu Sporny:  so you propose that we use a very unique prefix,
   which hopefully doesn't collide with an existing bnode id?
Gregg Kellogg:  that, or scan through existing use and then pick
   something unique
Manu Sporny:  the scan through prevents stream-based processing,
   although that may already be out
Niklas Lindström:  reserving bnode id prefixes causes problems
   when expansion has been run; the input would use those at that
   point
Markus Lanthaler:  not sure what the problem here is? bnode id:s
   are local/internal, so we should be able to change them if we
   want to.
Gregg Kellogg:  so far, we try to keep the json form consistent
   with what is written; so that bnode id:s use some internal
   pattern
   … while it's formally very bad (especially from an RDF
   perspective), this can be useful for handing JSON
   … I wouldn't vote against renaming if it's necessary; but do
   we always need to do it?
   … it's a big change fairly late in the process
Markus Lanthaler:  could you use other identifiers?
Gregg Kellogg:  a bit tricky with deployed code right now.
   … previously, we didn't change bnode ids on
   expansion/compaction
Manu Sporny:  why can't we instead track already used bnodes, and
   ensure that generated ones aren't used?
   ... ofcourse, subsequently encountered ones are problematic
Manu Sporny:  keep track of both generated and encountered
   bnodes, and if an overlap occurs, start renaming only those that
   are already encountered/generated
Markus Lanthaler:  is this the final code for wikia?
Gregg Kellogg:  the plan is to use URIs, but the scrum process
   hasn't gotten there; we currently use article id:s locally
Markus Lanthaler:  the flag for property generators could also
   disable bnode renaming
Manu Sporny:  if we can ensure that only if property generators
   are used, renaming occurs..
Markus Lanthaler:  the property generators could be used for DoS
   attacks, I will support such a flag
Gregg Kellogg:  I'd prefer to avoid renaming if property
   generators aren't used
Markus Lanthaler:  I sttil think bnodes are dangerous to
   preserve, since they should not be used
Manu Sporny:  it's a good point. But some users don't want to
   change the raw data.
   … it is a large change, but it's still before LC, and makes a
   good point

PROPOSAL: Rename all blank node identifiers when doing expansion.

Markus Lanthaler: +1
Gregg Kellogg: +0.1
Manu Sporny: +1
Niklas Lindström:  +0.5
Dave Longley:  +0.3 [scribe assist by Manu Sporny]
David I. Lehn: +0

RESOLUTION: Rename all blank node identifiers when doing
   expansion.

Markus Lanthaler: filed the resolution under ISSUE-160

Topic: ISSUE-203: Validate IRIs and language tags

Manu Sporny: https://github.com/json-ld/json-ld.org/issues/203
Markus Lanthaler:  the question is whether processors should
   validate IRIs and language tags fully, or just assume they work
   … Richard made the point that language tags have to be
   normalized, and validated(?)
Gregg Kellogg:  compare to Turtle, it includes a simplest form of
   BCP 47. A full validation needs much more logic.
   … it's complicated to get it exactly correct.
   … And normalization doesn't require full validation. Same
   thing with URIs. Most libraries detect simple problems, but full
   checking requires much more complexity.
   … it's better to not include in the core algorithm. As
   François said, there's a difference between a processor and a
   validator
Manu Sporny:  I agree with all of that.
Markus Lanthaler:  so, what do we say specifically?
Manu Sporny:  we don't say should/must not; all we say is that
   it's not required to do full validation
Manu Sporny:  it's strange to have a discussion about this but
   not say anything in the spec
Niklas Lindström:  Could we say something to the effect of "it's
   not users of processors might not expect that all processors are
   fully validating processors"? That is, invalid input data might
   lead to different results depending on the level of validation
   for the processor. [scribe assist by Manu Sporny]
Niklas Lindström:  So, basically - the output may not be the same
   for corner cases. [scribe assist by Manu Sporny]
Manu Sporny:  or we could say that processors may issue warnings
   about data which is not valid, but processors must not modify
   data to correct it
Niklas Lindström:  I think that might work. [scribe assist by
   Manu Sporny]
Markus Lanthaler:  I agree, no validation. And not include any
   language about it in the spec..
   … we already say that algorithms are only specified for
   well-formed input
Gregg Kellogg:  we do say that to be valid, these must be valid
   BCP 47 tags / IRIs
Manu Sporny: What about this for a proposal? JSON-LD Processors
   MAY issue validating warnings for malformed IRIs and BCP47
   language strings, but they MUST NOT attempt to correct validation
   errors and MUST only perform normalization on IRIs and BCP47
   language strings.
   … we shouldn't say whether processors should tolerate invalid
   values for that.. We need to compare with e.g. Turtle spec.

PROPOSAL: JSON-LD Processors MAY issue validation warnings for
   malformed IRIs and BCP47 language strings, but they MUST NOT
   attempt to correct validation errors.

Manu Sporny: +1
Gregg Kellogg: +1
Markus Lanthaler: +0.5 (would also be fine with being silent
   about it)
Niklas Lindström:  +0.9 (unless something much different is done
   in e.g. the turtle spec)

RESOLUTION: JSON-LD Processors MAY issue validation warnings for
   malformed IRIs and BCP47 language strings, but they MUST NOT
   attempt to correct validation errors.

Topic: ISSUE-109: Add flatten() method to JSON-LD API

Manu Sporny: https://github.com/json-ld/json-ld.org/issues/109
Manu Sporny:  we're signaling that there's an easy way of
   flattening (apart from the flag)
Markus Lanthaler:  I'd suggest to drop the flags then
   … and that it should also return all the graphs (currently
   just the default graph?)
   … i.e. drop the 'merged'/'default' and return all graphs
Manu Sporny:  yes, we don't want lossy algorithms
Markus Lanthaler: so the signature would be flatten(input,
   context, callback, options)

PROPOSAL: Add a .flatten() method to the JSON-LD API, which
   returns all data in flattened, compact form. Remove the flatten
   flag from the .expand() and .compact() methods. Ensure that the
   .flatten() method preserves data in named graphs.

Manu Sporny: +1
Markus Lanthaler: +1
Gregg Kellogg: +1
Niklas Lindström:  +0.75 (not entirely sure about how people not
   knowing this stuff in detail will get the meaning of "flatten")

RESOLUTION: Add a .flatten() method to the JSON-LD API, which
   returns all data in flattened, compact form. Remove the flatten
   flag from the .expand() and .compact() methods. Ensure that the
   .flatten() method preserves data in named graphs.

Topic: ISSUE-206: Clarify that the algorithms operate a copy of the input

Manu Sporny:  we want to clarify that implementors mustn't modify
   the input data in-place
Gregg Kellogg:  the fact that the algorithms speak of
   serializations does imply that there is no modification. It may
   be good to say that the algorithms operate of a live data
   structure, and hence need to create copies.
Gregg Kellogg:  implementations MAY operate on native data
   structures, and if so, they must generate new data structures

PROPOSAL: Any input to JSON-LD API methods MUST NOT be modified.

Markus Lanthaler: +1
Manu Sporny: +1
Niklas Lindström: +1
Gregg Kellogg: +1

RESOLUTION: Any input to JSON-LD API methods MUST NOT be
   modified.

-- manu

-- 
Manu Sporny (skype: msporny, twitter: manusporny)
President/CEO - Digital Bazaar, Inc.
blog: HTML5 and RDFa 1.1
http://manu.sporny.org/2012/html5-and-rdfa/
Received on Tuesday, 18 December 2012 18:18:30 UTC