- From: Peter F. Patel-Schneider <pfpschneider@gmail.com>
- Date: Thu, 7 Jan 2016 09:29:08 -0800
- To: Iovka Boneva <iovka.boneva@univ-lille1.fr>, public-data-shapes-wg@w3.org

Yes, this appears to work. I think that you can do better, though. The proof of Theorem 5 in [2] can be improved. The first condition for I(E*) can be removed if the second is replaced by [min(1,min I(E)), \inf] if I(E) /= \null Similarly I(E+) can be improved. I think that SORBEs can be augmented to allow arbitrary cardinalities and still have the same complexity, using a combination of the rules for I(E*), I(E+), and I(a[n,m]). I think that this would eliminate the need to expand out cardinalities in the complexity proof for ShEx. peter On 01/07/2016 05:04 AM, Iovka Boneva wrote: > Dear all, > > this mail is very technical, and might be interesting only for very few people. > > Le 04/01/2016 19:40, Peter F. Patel-Schneider a écrit : >> On 01/04/2016 08:13 AM, Iovka Boneva wrote: >>> Indeed, I've answered too quickly: representing the numbers in unary is not >>> enough to capture the size increase due to the unfolding of cardinality >>> expressions. The two things I called equivalent are actually not equivalent. >>> >>> Back to your question: for showing that validation is in NP, we need to >>> consider that a cardinality expression is a syntactic shortcut for the >>> corresponding unfolded expression (a syntactic shortcut that can be >>> exponentially smaller than the actual unfolded expression). We define the size >>> of the expression based on the unfolded one, and validation for ShEx w/o >>> arbitrary cardinalities (only *, +, ?) is in NP. I think that you agree with >>> this last statement. >> I think that this needs to be shown. I also think that you need [1,1] as a >> cardinality. > Sure, but if you need [1,1], then you won't use the repetition construct at all. >> >> Consider something like >> >> { sub1, sub2 } * >> >> the size of the obvious guess could be arbitrarily large in the size of the >> expression as the number of repetitions of the subexpressions could be very >> large. > What do you mean by the "obvious guess" ? > Maybe you mean the guess of the candidate solution in the NP algorithm ? If > yes, then actually we do not guess the number of repetitions. > > Follows a sketch of an NP algorithm of validation. > I think you agree that the hard part is checking whether the neighbourhood of > the focus node satisfies "locally" the shape expression, disregarding whether > the references to other shapes (@ S) are satisfied. So, we show that, given N > a set of triples that is the neighbourhood of the focus node, and E a shape > expression, checking whether N locally satisfies E is in NP. > > Consider the ShEx expression as an (unordered) regular expression on the > alphabet of the triple constraints that appear in the expression. Note that we > consider "pure" regular expressions, in the sense of the textbooks on regular > languages. The allowed constructs are epsilon, letters of the alphabet, > disjunction, concatenation, and Kleene star. Remark that a ShEX expression w/o > arbitrary repetitions (only +, ?, *) can be transformed in such "pure" regular > expression which size remains polynomial in the size of the initial expression. > > For example: > ((:a . | :b .), :a .) * > has 3 triple constraints (c1) :a . (c2) :b . (c3) :a . > Note that the two occurrences of :a . are considered distinct. > The corresponding (unordered) regular expression is then > ((c1 | c2), c3)* where the comma stands for concatenation. > Note that such expression is single-occurrence (SORBE) as defined in [2], and > in that same paper we showed that the membership problem for SORBE expressions > is in NP. I'll give some details on it in the sequel. > > Now, for the NP algorithm, a candidate solution is: for every triple in N, we > guess the triple constraint to which it is mapped. The size of such candidate > solution is polynomial in the size of N and E. > > For example, consider these triples and the mapping (i.e. the triple > constraint among c1, c2, c3 to which they were mapped). > <n> :a 1 . -> c1 > <n> :b 2 . -> c2 > <n> :b 3 . -> c2 > <n> :a 4 . -> c3 > <n> :a 5 . -> c3 > <n> :a 6 . -> c3 > > Note that such mapping yields a bag of symbols, in the above example it is {{ > c1, c2, c2, c3, c3, c3}}. > > Then, using the SORBE algorithm from [2, Figure 4], we can check in polynomial > time whether the candidate solution is indeed a solution. That is, check > whether the bag of symbols belongs to the language of the unordered regular > expression E. Few words about this SORBE algorithm: it traverses the > structure of E and constructs an interval (which is technical) based on the > expression and on the bag which is being tested for membership. Note that the > numbers that appear in this interval and in all the intermediate computations > are bounded by the size of N. The algorithm works in a single traversal of E > because, intuitively, the single-occurrence criterion discards the need of > non-deterministic choices. > > Does this sound convincing to you ? > > Alternatively, for the NP proof we can use a result from [1], that shows that > the membership problem for a vector (bag) to a semilinear set (Parikh set of a > regular expression, which is basically an alternative representation of an > unordered regular expression) is in NP, but for that proof I would need to > argue that the size of the Parikh set is polynomial in the size of the > expression, probably using arguments on solving linear equations, so the above > sketch seems easier to explain. > > [1] E. Kopczynski and A. To. Parikh images of grammars: Complexity and > applications. In > LICS, pages 80–89, 2010. > > [2] S. Staworko, I. Boneva, J. E. Labra Gayo, S. Hym, E. G. Prud’hom- > meaux, and H. Solbrig. Complexity and Expressiveness of ShEx for RDF. > In International Conference on Database Theory (ICDT) 2015. > >>> The proof for NP-completeness is indeed made for the case w/o arbitrary >>> cardinalities (except on triple constraints, so no folding of arbitrary >>> cardinalities). >>> >>> Once again, we are talking here about a border line case, which would probably >>> never occur in practice. I am personally not concerned about the high >>> complexity of validation for this particular case. >> I am concerned. Unexpected jumps in complexity are often indications that >> algorithms are incorrect. >> >>> Iovka >> peter >> >> >>> Le 04/01/2016 14:31, Peter F. Patel-Schneider a écrit : >>>> I am not convinced, particularly because of your explanation. >>>> >>>> The unfolding will result in an exponential size increase if numbers are >>>> represented in unary. >>>> >>>> Consider >>>> >>>> { ... { p @ a [n,n] } ... } [n,n] >>>> >>>> where the embedding depth is m. >>>> >>>> The expansion of this would need to either represent the number n^m or have >>>> n^m pieces. Both of these would need size exponential in the size of the >>>> expression above. >>>> >>>> The ability to embed cardinalities inside cardinalities provides a way of >>>> efficiently representing large numbers even if numbers in expressions have to >>>> be written in unary. >>>> >>>> peter >>>> >>>> >>>> On 01/04/2016 04:16 AM, Iovka Boneva wrote: >>>>> Le 18/12/2015 18:00, Peter F. Patel-Schneider a écrit : >>>>>> With this construct I'm having some difficulty determining whether ShEx >>>>>> validation is in NP. It seems to me that the outer numbers multiply the >>>>>> effort required to make the inner partitions, leading to an exponential >>>>>> number >>>>>> of choices. Maybe there is some way to do this bottom up, but it >>>>>> certainly >>>>>> doesn't seem obvious to me. >>>>>> >>>>>> Is there an NP algorithm for ShEx validation with this feature? >>>>> Indeed, very good question. And a technical answer. >>>>> There is an NP algorithm if we consider that the min and max bounds >>>>> (1,2,3,4,6,7 in your example) are coded in unary (and not in binary) for >>>>> determining the size of the schema. >>>>> From a complexity point of view, it is equivalent to consider that >>>>> p@a [1,2] >>>>> is a syntactic shortcut for >>>>> p@a , p@a? >>>>> i.e. equivalent to "unfolding" the multiplicities. >>>>> >>>>> Does it answer your question ? >>>>> >>>>> As Eric explained, such cases are not supposed to occur very often. >>>>> >>>>> Iovka >>>>> >>>>>> peter >>>>>> >>>>>> >>>>>> On 12/18/2015 01:12 AM, Eric Prud'hommeaux wrote: >>>>>>> * Peter F. Patel-Schneider <pfpschneider@gmail.com> [2015-12-17 >>>>>>> 22:20-0800] >>>>>>>> It appears to me that ShEx allows embeddings of cardinalities, as in >>>>>>>> >>>>>>>> twofoureight { >>>>>>>> { { p @ a [2,2] } [2,2] } [2,2] >>>>>>>> } >>>>>>>> >>>>>>>> or >>>>>>>> >>>>>>>> lots { >>>>>>>> { p @a [2,4], p @b [1,3] } [6,7] >>>>>>>> } >>>>>>>> >>>>>>>> is this correct? >>>>>>> In principle, yes, but apart from optional groups, it's rare that one >>>>>>> wants to make sure that there are as many a's as there are b's. Optional >>>>>>> is quite reasonable, e.g. >>>>>>> >>>>>>> <X> { >>>>>>> :submitter @<UserShape>, :submissionDate xsd:dateTime, >>>>>>> (:resolvedBy @<HackerShape>, :resolutionDate xsd:dateTime)? >>>>>>> } >>>>>>> >>>>>>> The other place this frequently comes up is in repetitions of choices: >>>>>>> >>>>>>> <Y> { (:observation .|:analysis .|:diagnosis .)+ } >>>>>>> >>>>>>> which can be compiled to >>>>>>> >>>>>>> <Y> { >>>>>>> (:observation .|:analysis .|:diagnosis .), >>>>>>> :observation .*, >>>>>>> :analysis .*, >>>>>>> :diagnosis .* >>>>>>> } >>>>>>> >>>>>>> with either a partion or a greedy semantics. >>>>>>> >>>>>>> >>>>>>>> peter >>>>>>>> >>>>>>>> >>>>>>>> >>> > >

Received on Thursday, 7 January 2016 17:29:41 UTC