Re: question about ShEx from Peter F. Patel-Schneider on 2016-01-04 (public-data-shapes-wg@w3.org from January 2016)

From: Peter F. Patel-Schneider <pfpschneider@gmail.com>
Date: Mon, 4 Jan 2016 10:40:11 -0800
To: Iovka Boneva <iovka.boneva@univ-lille1.fr>, public-data-shapes-wg@w3.org
Message-ID: <568ABC8B.6020104@gmail.com>
On 01/04/2016 08:13 AM, Iovka Boneva wrote:
> Indeed, I've answered too quickly: representing the numbers in unary is not
> enough to capture the size increase due to the unfolding of cardinality
> expressions. The two things I called equivalent are actually not equivalent.
> 
> Back to your question: for showing that validation is in NP, we need to
> consider that a cardinality expression is a syntactic shortcut for the
> corresponding unfolded expression (a syntactic shortcut that can be
> exponentially smaller than the actual unfolded expression). We define the size
> of the expression based on the unfolded one, and validation for ShEx w/o
> arbitrary cardinalities (only *, +, ?) is in NP. I think that you agree with
> this last statement.

I think that this needs to be shown.  I also think that you need [1,1] as a
cardinality.


Consider something like

{ sub1, sub2 } *

the size of the obvious guess could be arbitrarily large in the size of the
expression as the number of repetitions of the subexpressions could be very large.

> The proof for NP-completeness is indeed made for the case w/o arbitrary
> cardinalities (except on triple constraints, so no folding of arbitrary
> cardinalities).
> 
> Once again, we are talking here about a border line case, which would probably
> never occur in practice. I am personally not concerned about the high
> complexity of validation for this particular case.

I am concerned.  Unexpected jumps in complexity are often indications that
algorithms are incorrect.

> 
> Iovka

peter


> 
> Le 04/01/2016 14:31, Peter F. Patel-Schneider a écrit :
>> I am not convinced, particularly because of your explanation.
>>
>> The unfolding will result in an exponential size increase if numbers are
>> represented in unary.
>>
>> Consider
>>
>> { ... { p @ a [n,n] } ... } [n,n]
>>
>> where the embedding depth is m.
>>
>> The expansion of this would need to either represent the number n^m or have
>> n^m pieces.  Both of these would need size exponential in the size of the
>> expression above.
>>
>> The ability to embed cardinalities inside cardinalities provides a way of
>> efficiently representing large numbers even if numbers in expressions have to
>> be written in unary.
>>
>> peter
>>
>>
>> On 01/04/2016 04:16 AM, Iovka Boneva wrote:
>>> Le 18/12/2015 18:00, Peter F. Patel-Schneider a écrit :
>>>> With this construct I'm having some difficulty determining whether ShEx
>>>> validation is in NP.  It seems to me that the outer numbers multiply the
>>>> effort required to make the inner partitions, leading to an exponential
>>>> number
>>>> of choices.   Maybe there is some way to do this bottom up, but it certainly
>>>> doesn't seem obvious to me.
>>>>
>>>> Is there an NP algorithm for ShEx validation with this feature?
>>> Indeed, very good question. And a technical answer.
>>> There is an NP algorithm if we consider that the min and max bounds
>>> (1,2,3,4,6,7 in your example) are coded in unary (and not in binary) for
>>> determining the size of the schema.
>>>  From a complexity point of view, it is equivalent to consider that
>>> p@a [1,2]
>>> is a syntactic shortcut for
>>> p@a , p@a?
>>> i.e. equivalent to "unfolding" the multiplicities.
>>>
>>> Does it answer your question ?
>>>
>>> As Eric explained, such cases are not supposed to occur very often.
>>>
>>> Iovka
>>>
>>>> peter
>>>>
>>>>
>>>> On 12/18/2015 01:12 AM, Eric Prud'hommeaux wrote:
>>>>> * Peter F. Patel-Schneider <pfpschneider@gmail.com> [2015-12-17 22:20-0800]
>>>>>> It appears to me that ShEx allows embeddings of cardinalities, as in
>>>>>>
>>>>>> twofoureight {
>>>>>>     { { p @ a [2,2] } [2,2] } [2,2]
>>>>>>    }
>>>>>>
>>>>>> or
>>>>>>
>>>>>> lots {
>>>>>>     { p @a [2,4], p @b [1,3] } [6,7]
>>>>>> }
>>>>>>
>>>>>> is this correct?
>>>>> In principle, yes, but apart from optional groups, it's rare that one
>>>>> wants to make sure that there are as many a's as there are b's. Optional
>>>>> is quite reasonable, e.g.
>>>>>
>>>>>     <X> {
>>>>>       :submitter @<UserShape>, :submissionDate xsd:dateTime,
>>>>>       (:resolvedBy @<HackerShape>, :resolutionDate xsd:dateTime)?
>>>>>     }
>>>>>
>>>>> The other place this frequently comes up is in repetitions of choices:
>>>>>
>>>>>     <Y> { (:observation .|:analysis .|:diagnosis .)+ }
>>>>>
>>>>> which can be compiled to
>>>>>
>>>>>     <Y> {
>>>>>       (:observation .|:analysis .|:diagnosis .),
>>>>>       :observation .*,
>>>>>       :analysis .*,
>>>>>       :diagnosis .*
>>>>>     }
>>>>>
>>>>> with either a partion or a greedy semantics.
>>>>>
>>>>>
>>>>>> peter
>>>>>>
>>>>>>
>>>>>>
>>>
> 
>
Received on Monday, 4 January 2016 18:40:42 UTC