Re: Thoughts about log:collectAllIn from Patrick Hochstenbach on 2022-11-29 (public-n3-dev@w3.org from November 2022)

From: Patrick Hochstenbach <Patrick.Hochstenbach@UGent.be>
Date: Tue, 29 Nov 2022 05:50:59 +0000
To: Doerthe Arndt <doerthe.arndt@tu-dresden.de>
CC: William Van Woensel <william.vanwoensel@gmail.com>, "public-n3-dev@w3.org" <public-n3-dev@w3.org>
Message-ID: <DB4PR09MB5943FCEF36E0E6659AF632C1EE129@DB4PR09MB5943.eurprd09.prod.outlook.com>
Hi Doerthe ,

I agree totally. To clarify.. but I think we are thinking the same.. in both of the ABC/ACB examples you provided the data doesn't contain any lists. From the perspective of an RDF processor:
:a a :A.
:b a :B.
:c a :C.

and

:a a :A.
:c a :C.
:b a :B.

is the same data. From this perspective, it is not surprising that putting this data into a list will create a different list depending on the context/application. A program shouldn't be forced into parse triples in a particular order. Any added ordering would be artificial.

If  log:collectAllIn graph:List  can be modelled as a multiset (such as been added by Jos in eye), it has best of both worlds. It has set qualities, it is backwards compatible and lists are very practical because N3 has lots of tools to work with lists.

Patrick

________________________________
From: Doerthe Arndt <doerthe.arndt@tu-dresden.de>
Sent: 28 November 2022 18:43
To: Patrick Hochstenbach <Patrick.Hochstenbach@UGent.be>
Cc: William Van Woensel <william.vanwoensel@gmail.com>; public-n3-dev@w3.org <public-n3-dev@w3.org>
Subject: Re: Thoughts about log:collectAllIn

Dear all,

Thank you all for your input. I will answer to William and Patrick in-line.

@Jos: so you’d  suggest to understand the output of collectAllIn as a multiset (even though it is in list format) and then postulate the different reasoners would output multisets which are log:multisetEqualTo each other? I think that is what Prolog does as well: it claims that the output is a bag (multiset). This is maybe the most practical solution, I am not sure whether it is the best, but I could live with that. The documentation of the built-in should then point out the possible problem of different outputs or reasoners.

Am 26.11.2022 um 16:22 schrieb Patrick Hochstenbach <Patrick.Hochstenbach@UGent.be<mailto:Patrick.Hochstenbach@UGent.be>>:

My main use-cases for log:collectAllIn are for cardinality checks (how often does a triple pattern occur in a context.

I see, I still think that there could be use cases in which people need to do something on the output lists or maybe simply want to compare these. We should be explicit here.


I think in RDF context I am already used to the fact that no order can be assumed unless an explicit list is provided in the data. It depends not only on indexation but also on the serialisation of the input data.

Yes, but lists have orders by nature. So yes, you should be used to the fact that triples have no order, but lists need to be understood as ordered objects and in a sense we are misusing the list-notation here. But I also would like to keep the list as it is a good format to further process the result of CollectAllIn (for example by counting).


Two n3 reasoners shouldn't assume that any two input RDF documents are different just based on the ordering of triples.

No, but triples are only the same if their subjects predicates and objects are equal. So, :s :p (:o1 :o2). Is not the same as :s :p (:o2 :o1).


The same discussion is also relevant for the graph:list , I assume?

Good point, yes. Whatever we do with collectAllIn should also be done with graph:list. So either we are explicit, that there could be different results but that they need to be equal when compared as multistep or we impose some ordering in the output itself, that would then be something we do for all multisets.


Patrick
________________________________
From: William Van Woensel <william.vanwoensel@gmail.com<mailto:william.vanwoensel@gmail.com>>
Sent: 26 November 2022 14:43
To: Doerthe Arndt <doerthe.arndt@tu-dresden.de<mailto:doerthe.arndt@tu-dresden.de>>
Cc: public-n3-dev@w3.org<mailto:public-n3-dev@w3.org> <public-n3-dev@w3.org<mailto:public-n3-dev@w3.org>>
Subject: Re: Thoughts about log:collectAllIn

Hi Doerthe, everyone,

With non-deterministic, do you mean the Prolog definition<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.swi-prolog.org%2Fpldoc%2Fman%3Fsection%3Ddeterminism&data=05%7C01%7CPatrick.Hochstenbach%40ugent.be%7C5c8f6caabeff4b4103c308dad168358c%7Cd7811cdeecef496c8f91a1786241b99c%7C1%7C0%7C638052542765166281%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=owOPZy8fSpAv8r%2FBf7h%2B3eoTnUys3NDFlv2B2zTFiS8%3D&reserved=0> or the case where an algorithm may return different results for the same input?

Yes, in the case of collectAllIn, it would mean that the reasoner should output all permutations of our output list which is not really desirable. But that would be the most accurate solution. As this will quickly explode and there is no real use for it, we should not go for that solution.


Speaking for myself, none of my use cases rely on a pre-defined order of results. (Since, for jen3, this would not only depend on parsing order, but also how triples are indexed.) From that viewpoint, I’d rather propose defining the builtin as not imposing any particular result ordering.

In my opinion that could lead to problems if we for example exchange proofs since proofs containing collectAllIn will come with an instantiated list which might differ from the list „checker“ uses, so maybe we say that the solution are all permutations but we expect the reasoner to only output one (that is, in my example :we :get (:A :C :B). and :we :get (:A :B :C). are both correct, but reasoning only produces one of them.

Or, more accurately, the same ordering is guaranteed as with rule triple patterns in general: e.g., in case of basic graph patterns, no ordering is expected, but for builtin statements (such as your list:in example), the list order would be followed.

OK, that makes it more complicated. Then we would need to have some order-preserving predicates define or make some set of list properties or something like that where order is fixed.


Do you (or Jos, or others) know any current use cases that rely on a parsing (or other) order dependency?

I think I could have some where I compare the results of collectAllIn  (but would need to check), but for comparison, the log:multisetEqualTo would solve the problem.

Kind regards,
Dörthe



Regards,

William

On Nov 25, 2022, at 11:33 AM, Doerthe Arndt <doerthe.arndt@tu-dresden.de<mailto:doerthe.arndt@tu-dresden.de>> wrote:

Dear all,

Motvated by our latest discussion on built-ins and their specification, I further thought about the nature of log:collectAllIn. It is very interesting that in EYE  the output of the built-in depends on the order the reasoner reads in the triples. As an example compare the following two examples in the editor:
ABC: http://ppr.cs.dal.ca:3002/n3/editor/s/eTcoBQb2<https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fppr.cs.dal.ca%3A3002%2Fn3%2Feditor%2Fs%2FeTcoBQb2&data=05%7C01%7CPatrick.Hochstenbach%40ugent.be%7C5c8f6caabeff4b4103c308dad168358c%7Cd7811cdeecef496c8f91a1786241b99c%7C1%7C0%7C638052542765166281%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=eY1iPCUkHyIeQOEPVxicZRP9v3rDGWIwH7264XSypA4%3D&reserved=0>
ACB: http://ppr.cs.dal.ca:3002/n3/editor/s/w2LhPbCZ<https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fppr.cs.dal.ca%3A3002%2Fn3%2Feditor%2Fs%2Fw2LhPbCZ&data=05%7C01%7CPatrick.Hochstenbach%40ugent.be%7C5c8f6caabeff4b4103c308dad168358c%7Cd7811cdeecef496c8f91a1786241b99c%7C1%7C0%7C638052542765166281%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=gbrvXr7i4eRvgGUP7%2FtT0H7%2B7Xm43pEFFkM58bSBu%2BU%3D&reserved=0>

By this parsing-order dependency we might also get some implementation dependency and I think that we should avoid that. One solution for the problem would be to simply make log:collectAllIn non-deterministic, but I think that this would break many of our applications. So, an alternative idea would be to fix the order in some other way which we than also communicate (for example string-order of the full uri or literal).  But If we do so, we should specify that clearly. As we already have a set predicate, we could say that we give the output as a set.

The problem with that solution is, that it is quite handy that in the current implementation the reasoner keeps for example the order of a list if used as input. Consider for example the following:  http://ppr.cs.dal.ca:3002/n3/editor/s/GfsbVdjC<https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fppr.cs.dal.ca%3A3002%2Fn3%2Feditor%2Fs%2FGfsbVdjC&data=05%7C01%7CPatrick.Hochstenbach%40ugent.be%7C5c8f6caabeff4b4103c308dad168358c%7Cd7811cdeecef496c8f91a1786241b99c%7C1%7C0%7C638052542765166281%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=A2qBYBy%2FyRlnYTdnFspdErVPOHmnXxSYa6eEHC1zS4c%3D&reserved=0>

So, should we introduce an extra predicate for use cases like the last one?

How would you define the built-in in order to guarantee interoperability?

Kind regards,
Dörthe
Received on Tuesday, 29 November 2022 05:52:30 UTC