RE: JSON-LD Algorithms

On Thursday, January 17, 2013 9:15 PM, Gregg Kellogg wrote:

> On Jan 17, 2013, at 6:57 PM, Markus Lanthaler wrote:
>
> > --- Proposed Algorithm ---
> >
> > Lets again consider a) and b) separately.
> >
> > Case a) is rather trivial:
> >
> > - Filter the active context's term definitions by IRI and
> >  if the result contains more than one term
> >    - Choose the shortest and lexicographically least
> >
> > - Otherwise, for every term in the active context, check if
> >  it's IRI partially matches the target IRI, if so, construct
> >  a compact IRI. Out of all potential compact IRIs, choose the
> >  shortest and lexicographically least
> >
> > - Otherwise, if @vocab is set, use it to shorten the IRI if possible
> 
> Meaning, that if the result doesn't match a term in the active context.

Right. As soon as you find a match you return it (that's what I meant with
"choose...").

 
> > - Otherwise, return the IRI as is
> >
> >
> > Case b) is a bit more complex:
> >
> > - Look at value and extract the target container and type/language
> mapping.
> >  I won't spare you the details. The result should be something like
> >  container=@list, type=http://... or container=@set, language=en
> >
> > Then we have to filter the active context multiple times. First
> looking for
> > perfect matches, then falling back to weaker matches using relaxed
> filter
> > criterions:
> >
> > - Filter the active context by container
> 
> By this, I presume you mean filter terms in the active context where
> the container is exactly, or is compatible with that of the value.

Yes, filter entries in the active context where the container matches
*exactly*. You fall back to weaker matches in the next round (see fallback
rules below).


> >   - Filter the active context by type or language if set
> 
> By this, I presume you mean filter terms which exactly match either the
> type or the language.

Right


> Need to take into consideration the default
> language too, I suppose. Preumably, this has some weight for scalar
> values too.

Yes, that's something that I realized just after pushing the send button. It
adds some complexity (terms with no language mapping vs. terms with language
mapping set to null) but hopefully not too much.


> >     - If property generators are in the result set,
> >       order them (shortest and lexicographically least first),
> >       and append them to the candidates
> 
> By result set, i presume you mean the list of terms that have so far
> passed through the filter.

Exactly


> This would be the first time we've added anything to candidates.

Right. 


> >     - If terms which aren't property generators are in the result
> set,
> >       append shortest and lexicographically least to the candidates
> >       and return all candidates (if no term was found, continue we
> >       need to add all potential property generators till a term is
> >       found)
> 
> All potential property generators, or all potential terms, including
> property generator terms?

We first append all potential property generators (sorted; if any) and then
a single term (the shortest and lexicogr. Least if there's one).


> >     - Retry, but this time look for terms without type/language
> >  - Relax container criteria and try again
> >
> > - If no candidates have been found, try to create a compact IRI or
> use
> > @vocab just as for case a)
> >
> > Relaxing the container criteria means to use the following container
> > fallback rules:
> >
> > @list -> no container
> > @language -> @set
> > @annotation -> @set
> > @set -> no container
> >
> > ------
> >
> >
> > Is this easier to understand then the two current algorithms we have?
> > Does someone has another idea how to describe those algorithms in
> simple
> > "prose"?
> > Are there other algorithms that are currently difficult to
> understand?
> 
> I think this is a promising direction.

Great!


> Of course, we've made the
> problem particularly difficult by anticipating that there may be more
> than one term that matches a property IRI. If this wasn't the case, the
> whole algorithm would be trivial. I wonder how important it is that we
> have such complex procedures to deal with multiple matches
> deterministically, where we could instead just prohibit this behavior,
> or create simpler rules to deal with duplicates. For example, if we
> simply took the first term based on length and lexical order and used
> that, it would be simple and consistent. I don't know if these
> multiple-term matches are real use cases, or just ones we've come up
> with ourselves.

I don't know but I think being able to use two different terms for two
different types/languages has some value. The most trivial example being
label_de vs. label_en. What definitely adds a lot of complexity are property
generators since you also have to check if the duplicates exist.


> Of the other algorithms, the context expansion is difficult, and as a
> result the IRI expansion, due to the potential recursive nature. We
> could simplify this by being more vague about how to resolve to
> absolute IRIs and advise against looping changes.

Agreed. At least for the IRI expansion algorithm is a bit complex due to its
recursive nature as you state. The context expansion algorithm is long, but
not very complex I think. I'm a bit worried of being more vague in these two
algorithms because these are the most fundamental algorithms we have.
Basically everything depends on them.



--
Markus Lanthaler
@markuslanthaler

Received on Saturday, 19 January 2013 14:37:34 UTC