- From: Markus Lanthaler <markus.lanthaler@gmx.net>
- Date: Thu, 17 Jan 2013 18:57:40 +0100
- To: <public-linked-json@w3.org>
I've spend some more time thinking about our algorithms and how we should describe them in the spec. I do a agree with Gregg that the algorithms should be more descriptive so that they can be understood without actually implementing them. My intention when introducing the notion of a inverse context was just that. Unfortunately it seems that the inverse context failed to make IRI compaction easier to understand and at least I still believe that the Term Ranking algorithm is not much better in that regard. In fact I had problems understanding it myself without actually implementing it (the original version; the current one is probably even more complex due property generators). I think the main problem with both algorithms is the level of abstraction we have chosen. It might be possible to describe the algorithm in a much simpler way by slightly increasing the level of abstraction. So let's first look at the problem we are trying to solve to ensure we are all on the same page. --- Problem Description --- IRI compaction tries to find the best matching term or compact IRI in an active context for a) just an IRI or b) an IRI-value pair. Case a) is just used when compacting values of @type (previously also for @id). The result will either be a term, a compact IRI or @vocab-abbreviated IRI (it's not @vocab relative), or, if nothing matches, the original IRI (we don't want to compact this to relative IRIs, do we?) Case b) is a bit more complex as, apart from the IRI itself, a term's definition also has to correspond to the passed value (container, type/language). The result will either be a single term, compact IRI, or relative/absolute IRI or if a property generator matches, a list of candidate property generators with a single a single term, compact IRI, or relative/absolute IRI to fall back to if it turns out that no property generator matches. --- Proposed Algorithm --- Lets again consider a) and b) separately. Case a) is rather trivial: - Filter the active context's term definitions by IRI and if the result contains more than one term - Choose the shortest and lexicographically least - Otherwise, for every term in the active context, check if it's IRI partially matches the target IRI, if so, construct a compact IRI. Out of all potential compact IRIs, choose the shortest and lexicographically least - Otherwise, if @vocab is set, use it to shorten the IRI if possible - Otherwise, return the IRI as is Case b) is a bit more complex: - Look at value and extract the target container and type/language mapping. I won't spare you the details. The result should be something like container=@list, type=http://... or container=@set, language=en Then we have to filter the active context multiple times. First looking for perfect matches, then falling back to weaker matches using relaxed filter criterions: - Filter the active context by container - Filter the active context by type or language if set - If property generators are in the result set, order them (shortest and lexicographically least first), and append them to the candidates - If terms which aren't property generators are in the result set, append shortest and lexicographically least to the candidates and return all candidates (if no term was found, continue we need to add all potential property generators till a term is found) - Retry, but this time look for terms without type/language - Relax container criteria and try again - If no candidates have been found, try to create a compact IRI or use @vocab just as for case a) Relaxing the container criteria means to use the following container fallback rules: @list -> no container @language -> @set @annotation -> @set @set -> no container ------ Is this easier to understand then the two current algorithms we have? Does someone has another idea how to describe those algorithms in simple "prose"? Are there other algorithms that are currently difficult to understand? Thanks, Markus -- Markus Lanthaler @markuslanthaler
Received on Thursday, 17 January 2013 17:58:11 UTC