JSON-LD Algorithms from Markus Lanthaler on 2013-01-17 (public-linked-json@w3.org from January 2013)

From: Markus Lanthaler <markus.lanthaler@gmx.net>
Date: Thu, 17 Jan 2013 18:57:40 +0100
To: <public-linked-json@w3.org>
Message-ID: <01d901cdf4dc$25c36fc0$714a4f40$@lanthaler@gmx.net>
I've spend some more time thinking about our algorithms and how we should
describe them in the spec. I do a agree with Gregg that the algorithms
should be more descriptive so that they can be understood without actually
implementing them. My intention when introducing the notion of a inverse
context was just that. Unfortunately it seems that the inverse context
failed to make IRI compaction easier to understand and at least I still
believe that the Term Ranking algorithm is not much better in that regard.
In fact I had problems understanding it myself without actually implementing
it (the original version; the current one is probably even more complex due
property generators).

I think the main problem with both algorithms is the level of abstraction we
have chosen. It might be possible to describe the algorithm in a much
simpler way by slightly increasing the level of abstraction. So let's first
look at the problem we are trying to solve to ensure we are all on the same
page.


--- Problem Description ---

IRI compaction tries to find the best matching term or compact IRI in an
active context for a) just an IRI or b) an IRI-value pair.

Case a) is just used when compacting values of @type (previously also for
@id). The result will either be a term, a compact IRI or @vocab-abbreviated
IRI (it's not @vocab relative), or, if nothing matches, the original IRI (we
don't want to compact this to relative IRIs, do we?)

Case b) is a bit more complex as, apart from the IRI itself, a term's
definition also has to correspond to the passed value (container,
type/language). The result will either be a single term, compact IRI, or
relative/absolute IRI or if a property generator matches, a list of
candidate property generators with a single a single term, compact IRI, or
relative/absolute IRI to fall back to if it turns out that no property
generator matches.


--- Proposed Algorithm ---

Lets again consider a) and b) separately.

Case a) is rather trivial:

- Filter the active context's term definitions by IRI and
  if the result contains more than one term
    - Choose the shortest and lexicographically least

- Otherwise, for every term in the active context, check if
  it's IRI partially matches the target IRI, if so, construct
  a compact IRI. Out of all potential compact IRIs, choose the
  shortest and lexicographically least

- Otherwise, if @vocab is set, use it to shorten the IRI if possible

- Otherwise, return the IRI as is


Case b) is a bit more complex:

- Look at value and extract the target container and type/language mapping.
  I won't spare you the details. The result should be something like
  container=@list, type=http://... or container=@set, language=en

Then we have to filter the active context multiple times. First looking for
perfect matches, then falling back to weaker matches using relaxed filter
criterions:

- Filter the active context by container
   - Filter the active context by type or language if set
     - If property generators are in the result set,
       order them (shortest and lexicographically least first),
       and append them to the candidates
     - If terms which aren't property generators are in the result set,
       append shortest and lexicographically least to the candidates
       and return all candidates (if no term was found, continue we
       need to add all potential property generators till a term is
       found)
     - Retry, but this time look for terms without type/language
  - Relax container criteria and try again

- If no candidates have been found, try to create a compact IRI or use
@vocab just as for case a)

Relaxing the container criteria means to use the following container
fallback rules:

@list -> no container
@language -> @set
@annotation -> @set
@set -> no container

------


Is this easier to understand then the two current algorithms we have?
Does someone has another idea how to describe those algorithms in simple
"prose"?
Are there other algorithms that are currently difficult to understand?


Thanks,
Markus



--
Markus Lanthaler
@markuslanthaler
Received on Thursday, 17 January 2013 17:58:11 UTC