- From: Markus Lanthaler <markus.lanthaler@gmx.net>
- Date: Sat, 19 Jan 2013 15:37:03 +0100
- To: "'Gregg Kellogg'" <gregg@greggkellogg.net>
- Cc: <public-linked-json@w3.org>
On Thursday, January 17, 2013 9:15 PM, Gregg Kellogg wrote: > On Jan 17, 2013, at 6:57 PM, Markus Lanthaler wrote: > > > --- Proposed Algorithm --- > > > > Lets again consider a) and b) separately. > > > > Case a) is rather trivial: > > > > - Filter the active context's term definitions by IRI and > > if the result contains more than one term > > - Choose the shortest and lexicographically least > > > > - Otherwise, for every term in the active context, check if > > it's IRI partially matches the target IRI, if so, construct > > a compact IRI. Out of all potential compact IRIs, choose the > > shortest and lexicographically least > > > > - Otherwise, if @vocab is set, use it to shorten the IRI if possible > > Meaning, that if the result doesn't match a term in the active context. Right. As soon as you find a match you return it (that's what I meant with "choose..."). > > - Otherwise, return the IRI as is > > > > > > Case b) is a bit more complex: > > > > - Look at value and extract the target container and type/language > mapping. > > I won't spare you the details. The result should be something like > > container=@list, type=http://... or container=@set, language=en > > > > Then we have to filter the active context multiple times. First > looking for > > perfect matches, then falling back to weaker matches using relaxed > filter > > criterions: > > > > - Filter the active context by container > > By this, I presume you mean filter terms in the active context where > the container is exactly, or is compatible with that of the value. Yes, filter entries in the active context where the container matches *exactly*. You fall back to weaker matches in the next round (see fallback rules below). > > - Filter the active context by type or language if set > > By this, I presume you mean filter terms which exactly match either the > type or the language. Right > Need to take into consideration the default > language too, I suppose. Preumably, this has some weight for scalar > values too. Yes, that's something that I realized just after pushing the send button. It adds some complexity (terms with no language mapping vs. terms with language mapping set to null) but hopefully not too much. > > - If property generators are in the result set, > > order them (shortest and lexicographically least first), > > and append them to the candidates > > By result set, i presume you mean the list of terms that have so far > passed through the filter. Exactly > This would be the first time we've added anything to candidates. Right. > > - If terms which aren't property generators are in the result > set, > > append shortest and lexicographically least to the candidates > > and return all candidates (if no term was found, continue we > > need to add all potential property generators till a term is > > found) > > All potential property generators, or all potential terms, including > property generator terms? We first append all potential property generators (sorted; if any) and then a single term (the shortest and lexicogr. Least if there's one). > > - Retry, but this time look for terms without type/language > > - Relax container criteria and try again > > > > - If no candidates have been found, try to create a compact IRI or > use > > @vocab just as for case a) > > > > Relaxing the container criteria means to use the following container > > fallback rules: > > > > @list -> no container > > @language -> @set > > @annotation -> @set > > @set -> no container > > > > ------ > > > > > > Is this easier to understand then the two current algorithms we have? > > Does someone has another idea how to describe those algorithms in > simple > > "prose"? > > Are there other algorithms that are currently difficult to > understand? > > I think this is a promising direction. Great! > Of course, we've made the > problem particularly difficult by anticipating that there may be more > than one term that matches a property IRI. If this wasn't the case, the > whole algorithm would be trivial. I wonder how important it is that we > have such complex procedures to deal with multiple matches > deterministically, where we could instead just prohibit this behavior, > or create simpler rules to deal with duplicates. For example, if we > simply took the first term based on length and lexical order and used > that, it would be simple and consistent. I don't know if these > multiple-term matches are real use cases, or just ones we've come up > with ourselves. I don't know but I think being able to use two different terms for two different types/languages has some value. The most trivial example being label_de vs. label_en. What definitely adds a lot of complexity are property generators since you also have to check if the duplicates exist. > Of the other algorithms, the context expansion is difficult, and as a > result the IRI expansion, due to the potential recursive nature. We > could simplify this by being more vague about how to resolve to > absolute IRIs and advise against looping changes. Agreed. At least for the IRI expansion algorithm is a bit complex due to its recursive nature as you state. The context expansion algorithm is long, but not very complex I think. I'm a bit worried of being more vague in these two algorithms because these are the most fundamental algorithms we have. Basically everything depends on them. -- Markus Lanthaler @markuslanthaler
Received on Saturday, 19 January 2013 14:37:34 UTC