- From: Aidan Hogan <aidhog@gmail.com>
- Date: Sun, 5 Jul 2020 00:43:37 -0400
- To: thomas lörtsch <tl@rat.io>
- Cc: Antoine Zimmermann <antoine.zimmermann@emse.fr>, Eric Prud'hommeaux <eric@w3.org>, semantic-web@w3.org
Hi Thomas, On 2020-07-03 16:50, thomas lörtsch wrote: > While of course I’ve followed the whole thread, I’d like to come back to this point. I really like the categories you introduce: > > situations > implicits > existentials > characteristics > locality > existentiality > > I wonder if the list of situations is reasonably complete because if it is the discussion might also take a different route, away from blaming blank nodes for our misery. > > In my understanding your "implicits" category covers all usages of blank nodes to encode structures that span more then one triple: "multi-part objects, n-ary relations and arrays" as David puts it. I would call them "structural". AFAICT this type of blank nodes probably works pretty well as your example about graph union/merge below suggests. I like to think of those structural blank nodes as throw away plastic bags: just enough identity to keep stuff connected but nothing more (fancy imprints, ecological considerations etc notwithstanding). They don’t need a label as long as syntactic sugar is provided - an essential convenience and IMO Turtle provides the perfect balance. The skolemization algorithm that you present in a later mail can convert them to IRIs that are equally subdued (at least almost). It seems to me that problems with structural blank nodes mostly occurr when SPARQL doesn’t handle them with special care. How hard those are to solve I can’t tell but apart from that it seems to me that structural blank nodes do pretty well. > > The second situation, "existentials", seems to be the more problematic one. I guess it is rather niche in practice but also not well understood and therfore even more annoying. > There are easy solutions. One can always invent an IRI if need be or just model differently. The example about which I expressed my initial frustration > :Bob :has _:b1 . > :Bob :has _:b2 . > and where I insisted that _:b1 and _:b2 should indeed refer to different "somethings" could easily be rewritten to > :Bob :has ( _:b1, _:b2 ) . > and everything is fine. > However the problem wasn’t just silly me that didn’t get existentials in FOL and ignored the NUNA (although of course I would have asked differently or not at all if I’d been a little less ignorant). The deeper problem was that I did soemthing which I guess is not that uncommon: I assumed a closed world. > In my head the situation wasn’t one of data being exchanged on the open web. Rather I was imagining myself in my study at night, carefully crafting triples like text, like a poem even. In this situation one existential and another existential are indeed two very different extistentials, pointing to two distinct and equally precious existances. And as silly and Biedermeier as that situation may sound it is also very much the situation in which applications are developed: closed world, tightly controlled data, unique names, semantics buried in application code. I understand this view. We did a straw poll on this issue: http://aidanhogan.com/docs/blank_nodes_jws.pdf [Table 4] Many agree with this idea that _:b1 and _:b2 might refer to two different things in an RDF graph (and many do not). I think that RDF semantics does not contradict this assumption (it states that Bob has at least one thing, which does not contradict him having two or more things). However the semantics is weaker than what some might want (Bob has at least two things, or Bob has exactly two things). Like you say, other semantics would give those stronger meanings ... but not without leading to other undesirable cases, like if you parse the document twice, using different parsers or from different locations, now you might end up assuming that Bob has (at least) four things, six things, eight things (unless you now lean blank nodes). The semantics of RDF is very conservative and arguably weak: it assumes as little as possible. This, I think, is the way to go for the Web. We conclude that Bob has at least one thing. Maybe not everyone is happy with that, but it should not fundamentally break anyone's stuff. What if you want to see Bob as having two of __? Well in RDF you can still say that his node (:Bob) is associated with two (blank) nodes (_:b1, _:b2) in the graph with the property :has. This is what SPARQL does. It counts nodes, not things. You still have that option with RDF. You don't have to lean blank nodes if you don't want. So in summary, in RDF, you can have your blank nodes, and lean them too. > RDF is not only used in the scenario for which it was developed and which necessitated some very unsusal design decisions: the open world of the web. RDF is also prepared, consumed, processed in closed applications. RDF drives applications and services internally. It is used in all kinds of ways that are not open but decidedly closed. My guess is that better understanding and making explicit this tension between two very different usage scenarios and its practical consequences is the key to undestanding and resolving a lot of the frustrations with RDF that manifest themselves in reoccurring threads and megathreads on this list. I just thought that these megathreads were the result of the members of the list enjoying an esoteric flame war or two. :) > Maybe we should more seriously collect anecdotes like the one Holger gave, all the brain-dead situations and unpleasant surprises, and check them more thoroughly for underlying patterns. Just beacuse they often have to do with blank nodes doesn't necessarily mean that blank nodes are the culprit. I’m not talking about usability studies. Going through the last ten years of this list would be probably be sufficient ;-) I’m quite convinced that many problems stem from applications working under a closed world assupmtion but running into RDF’s open world mechanisms or vice versa. In the case of blank nodes, we did that exercise: http://aidanhogan.com/docs/blank_nodes_jws.pdf See "Blank nodes in the standards" > I have that hammer "Named Graphs with denotational semantics" which makes this particular problem look like another very pretty nail: just demarcate data that is to be interpreted under the OWA from "internal" data where names are unique, existentials behave like nominals and lists are always wellformed. Put them in different graphs with equally well defined semantics (for the open world we already have that, for applications we should define a common set of properties), make sure everybody is aware (just a small matter of education) and transform irritation, frustration even to elucidated enlightenment. Interesting idea (if I understand correctly, interpret different graphs under different semantics?). I think it looks potentially very complex though, particularly when thinking about how to interpret combinations of graphs under different semantics. On the plus side, I can see several esoteric flame wars that might results from this for us to enjoy. :) Best, Aidan >> On 30. Jun 2020, at 01:33, Aidan Hogan <aidhog@gmail.com> wrote: >> >> For what it is worth, we started working on the topic of blank nodes some time ago similarity convinced of the fact that the RDF semantics of blank nodes was unintuitive, and that a better semantics could be found. A couple of papers and several years later, I was/am more or less convinced that the semantics of blank nodes is as it should be in RDF. >> >> >> As a summary: >> >> Blank nodes are typically useful in two situations: >> >> (1) Implicit nodes: you don't have to name blank nodes but rather blank node labels can be generated automatically. This allows for shortcuts like lists ":abc :has ( a b c ) ." in Turtle. >> >> (2) Existential variables: for example ":Bob :murderedBy _:b ."; we know Bob was murdered but we don't know by whom he was murdered. >> >> >> How blank nodes are defined in RDF has two main characteristics: >> >> (A) Locality: _:b in document D1 is not the same as _:b in document D2. >> >> (B) Existentiality: in the graph ":Bob :murderedBy _:b1 , _:b2 .", this states that "Bob was murdered by someone", "Bob was murdered by someone", which is equivalent to ":Bob :murderedBy _:b1 ." >> >> >> Obviously for (2) we need to define blank nodes existentially, but I will try to argue that this is the best solution even just for (1). So in the context of (1) we'll look at changing (A) locality and (B) existentiality, and see what happens. >> >> >> (A) We could think about removing locality, and make blank node global, but now in the context of (1), a parser has to take care of generating a term that is globally unique, which will require something like a "base IRI" and some non-trivial conventions regarding what to do when parsing something from the same base URI multiple times (also considering that the document might have changed). This would greatly complicate simply parsing an RDF document. >> >> (B) We could think about keeping locality and removing existentiality, but let's say we take two RDF graphs G1 and G2 parsed from the same Turtle document: >> >> ":abc :has ( a b c ) ." >> >> by two different parsers that generate different blank nodes to represent the list. If blank nodes are not existential, which is to say that if blank nodes denote a resource in a similar manner to IRIs, then we lose the formal relation between G1 and G2, even though they represent the same data; more specifically, we would consider that G1 associates :abc with one list, and that G2 associates :abc with a *potentially* different list. To me, this behaviour is undesirable. >> >> Under existential semantics, I can say that G1 and G2 are formally equivalent. If I union/merge G1 and G2, the existential semantics tells me that in the resulting graph, there exists one list, not that :abc potentially has two lists, allowing me to "lean" the graph and keep just one list (if I want). >> >> Put more simply perhaps, without an existential semantics of blank nodes, every time I parse a document and generate different blank nodes, I would be "creating" resources that are potentially different each time to serve as the referents of the blank nodes. >> >> If I'm not okay with blank nodes being existentials, there is still the option of simply (<- no pun intended) not leaning the data, and/or of using skolemisation to generate IRIs to replace blank nodes. >> >> Best, >> Aidan >> >> On 2020-06-29 18:33, thomas lörtsch wrote: >>>> On 29. Jun 2020, at 21:57, Antoine Zimmermann <antoine.zimmermann@emse.fr> wrote: >>>> >>>> Le 29/06/2020 à 20:33, thomas lörtsch a écrit : >>>>>> On 23. Jun 2020, at 14:10, Eric Prud'hommeaux <eric@w3.org> wrote: >>>>>> >>>>>> On Tue, Jun 23, 2020 at 01:11:32PM +0200, Antoine Zimmermann wrote: >>>>>>> >>>>>>> >>>>>>> Le 21/06/2020 à 15:35, angin scribe a écrit : >>>>>>>> Hi everyone, >>>>>>>> >>>>>>>> Is the standard semantics of blank nodes in RDF still the same as >>>>>>>> existentially quantified variables? >>>>>>> >>>>>>> Yes. >>>>>>> >>>>>>> >>>>>>>> Let "_:b1" and "_:b2" be blank nodes, In the current standard semantics >>>>>>>> of RDF, is it still true that the graph below does not necessarily mean >>>>>>>> that Bob has two different things? >>>>>>>> >>>>>>>> Bob has _:b1 >>>>>>>> Bob has _:b2 >>>>>>> >>>>>>> Indeed. This graph would be equivalent to saying: >>>>>>> >>>>>>> "Bob has something. Bob has something." >>>>>>> >>>>>>> We can't conclude that Bob has 2 things. >>>>> I’m sorry but this is so frustratingly counter intuitive that I’d like to ask for an explanation: what constraints in the smenatics of RDF make it impossible to provide a tighter definition? >>>> >>>> The semantics of RDF is designed to make it possible to express the mere notion of existence as in First Order Logic or other logics, by the use of blank nodes. Considering that they behave exactly like the hundred-year-old existential variables of said logics, they could be deemed rather intuitive. Logicians, mathematians, philosophers, engineers, computer scientists, have relied on FOL as a basis of tons of fundamental concepts. >>> I definitely don’t feel like smashing any tables in the temple of FOL. By any account FOL is very useful and expressive, but it requires skills and careful crafting - not everybody finds that intuitive. It’s well known that people struggle with Modus Tollens just as they struggle with understanding exponential growth rates, no matter how useful, old etc those concepts are. >>> However my main point is maybe a bit different. I think it has to do with the situation of the author, maybe the social dimension of writing down some triples. >>>> What may seem unintuitive is the peculiar representation that RDF uses. While in a formula like: >>>> >>>> ∃x∃y (has(Bob,x) ∧ has(Bob,y)) >>>> >>>> it is clear and explicit that x and y are existentially quantified, and what the scope of the quantification is, >>>> in RDF, on the contrary, quantification is implicit because bnodes can only be used existentially (not universally) and their scope is always that of the RDF graph under consideration. >>>> If b1 and b2 are blank nodes, then the RDF graph: >>>> >>>> { (<Bob>, <has>, b1), (<Bob>, <has>, b2) } is exactly equivalent to a FOL formula: >>>> >>>> ∃x∃y (Triple(<Bob>,<has>,x) ∧ Triple(<Bob>,<has>,y)) >>> I had this line in mind and it gives two existentials, x and y. To me two existentials are intuitively not the same as one existential. >>>> which is itself exactly FOL-equivalent to: >>>> >>>> ∃z (Triple(<Bob>,<has>,z)) >>> I didn’t see that coming (and it’s the core of my problem). >>>> which is also FOL-equivalent to: >>>> >>>> ∃a∃b∃c (Triple(<Bob>,<has>,a) ∧ Triple(<Bob>,<has>,b) ∧ Triple(<Bob>,<has>,c)) >>> Or even that. I have only a vague understanding about the reasons for such simplifications and the consequences they have. I’m sure they are very useful in logic just like it can be useful in mathematics to get rid of variables. However... >>> ...if I envision a situation where the initial statements "Bob has something (x)." and "Bob has something (y)." materialize out of the blue, without futher attributions, leaning x and y to z sure makes sense and is indeed intuitive. It’s like imagining somebody saying "Bob has something" over and over again, maybe absent-minded or even disturbed. All "Bob has something" utterings ammount to the same one statement. >>> However a more realistic (or "social", if I may) scenario is that somebody authored those two statements and added them both to the graph to express that there are two things that Bob has and that are worth mentioning. Maybe other statements are to be added later, like the small and big attributes in your example below. Maybe the statements are just a beginning, a stub for more to come. That’s the scenario I had in mind and in that scenario it is not intuitive at all that the two blank nodes get leaned. It might make me yell at my computer "What do you think I’m doing here? Do you think I typed those TWO statements just for fun? etc etc". >>> The way that FOL uses existentials is not necessarily the only way they can be understood and this is where intuition can very well break. I don’t want to develop too many theories just yet but I suspect that one could argue that FOL presupposes some conditions that are not necessarily a given in normal human communications, or even run counter a normal authoring process like e.g an unfolding text with place holders and vague but distinct references. And then they become counter intuitive, no matter how logical and sound they are within the closed system of FOL. This is not to say that one way is more right than the other. Logic has certain powers just as composing, but they have different rules - and they may clash when RDF is authored. >>>> An existential variable (or a blank node) does not identify anything. It only mentions the existence of a thing. If I say that there exist a person that lived more than 10 years, I'm not referring to anyone in particular. I'm just stating the existence of such a thing. >>>> Now, as in FOL, it is necessary to have infinitely many variables, because I can qualify more precisely the things of which I'm stating the existence. I may say: >>>> >>>> "Bob has something big. Bob has something small." >>>> >>>> which is not the same as saying: >>>> >>>> "Bob has something that is big and small." >>>> >>>> >>>> In RDF, compare: >>>> >>>> <Bob> <has> [ <is> <Small> ] . >>>> <Bob> <has> [ <is> <Big> ] . >>>> >>>> and: >>>> >>>> <Bob> <has> [ <is> <Small>, <Big> ] . >>>> >>>> In the first case, I need two blank nodes, because, although the second graph entails the first, they are not equivalent. According to the first graph, it is still possible that it describes a world where a small thing is never big and vice versa. >>> Now imagine a secnario where the small/big attributions are not made _yet_. FOL will lean away what the author might have meant to merely hint at or explain in more detail later. >>> I only now realized that in my initial mail I had made the implicit assumption the example >>>>>>>> Bob has _:b1 >>>>>>>> Bob has _:b2 >>> is just a starting point to which later statements like >>> _:b1 is Small >>> _:b2 is Big >>> may be added. >>> At that point RDF wouldn’t lean those two bnodes into one anymore, right? >>>> Note that in these latter examples, I do not even need a bnode identifier, because I merely state the existence of a thing, I do not identify anything. But due to the limitation of digital representations, we have to serialise every graphs as a string of character, such that it becomes necessary, in some cases, to introduce back references in the form of bnode identifiers. bnode id are not names for things. They are just tools that allow a linear representation of arbitrary graphs. >>> I (think I) did and do understand what you say about bnodes. >>>> If you can draw your graphs on surfaces, you can reuse the same symbol all the time for every blank node, such as an empty ellipse of constant size. >>> The different x/y coordinates on that surface disambiguate the circles. Circles in different positions stand for different somethings. The position itself carries no meaning. >>> Thank you for the very thorough explanation! I hope my counter-argument makes some sense now. >>> Thomas >>>> --AZ >>>> >>>>> My intuition is that two different identifiers point to two different things. I would rather translate the above to: >>>>> "Bob has some x-thing. Bob has some y-thing." >>>>> Sure, we don’t know for certain that x and y are distinct until some statement to that effect is made, but per default different existentials should refer to different things. Otherwise what’s the point in having different extistentials? Why not just one "something" symbol instead of indefinitely many blank nodes? >>>>> Thomas >>>>>> Reducing this to "Bob has _:b3" is called "graph leaning". This is a >>>>>> behavior of RDF semantics, upon which is built RDFS and OWL, but >>>>>> interestingly not the SPARQL 1.1 RDFS entailment regime. >>>>>> >>>>>> RDF 1.1 Semantics says "Blank nodes are treated as simply indicating >>>>>> the existence of a thing, without using an IRI to identify any >>>>>> particular thing." It follows from there that two statements: >>>>>> [] a :Barn . >>>>>> [] :color :red . >>>>>> might be talking about the same thing. You can't know without some >>>>>> inverse functional properties or other application logic. AFAICT, while >>>>>> architects may take this in mind when designing data models, no tool >>>>>> uses RDF semantics on its own. There have been tool chains that count on >>>>>> graph leaning, but the ones I saw were stand-alone processing steps, not >>>>>> features intrinsic to generic RDF processors. >>>>>> >>>>>> SPARQL is really a graph query language; it doesn't do any sort of >>>>>> graph leaning. `SELECT * { <s> <p> ?o }` will give you two bindings >>>>>> ┌──────┐ ┌────────┐ >>>>>> │ ?o │ or │ ?o │ >>>>>> │ _:b1 │ │ _:abcd │ >>>>>> │ _:b2 │ │ _:efgh │ >>>>>> └──────┘ └────────┘ >>>>>> (Those bindings may have any distinct labels; there's no assurance >>>>>> that blank node labels are preserved.) >>>>>> >>>>>> You could argue that a carefully constructed SPARQL query could allow >>>>>> you to deduce that the response you got could be leaned, but everyone >>>>>> I know of who wants counting semantics treats them as distinct >>>>>> individuals. I think this accounts for 95+% of the work done with RDF. >>>>>> >>>>>> RDFS only allows you to infer new stuff so it can't do any sort of >>>>>> leaning. OWL would allow you to specifically infer that they were the >>>>>> same individual but it can do that with IRIs as well so there doesn't >>>>>> seem to be much of an observable difference between them other than >>>>>> that some parts of OWL axioms require BNodes instead of IRIs to >>>>>> eliminate the effects of coreferences. >>>>>> >>>>>> I guess you could characterize it this way: >>>>>> >>>>>> 1. Graph semantics treat BNodes as individuals. >>>>>> test: insert { <s> <p> _:a , _b } and find two triples. >>>>>> >>>>>> 2. SPARQL (unextended) semantics likewise treat BNodes as individuals. >>>>>> test: SELECT * { <s> <p> ?o } >>>>>> >>>>>> 3. SPARQL RDF semantics still treat BNodes as individuals. >>>>>> >>>>>> 4. RDF Entailment implies lean-able graphs. >>>>>> >>>>>> 5. OWL can unify BNodes and IRIs. >>>>>> >>>>>> >>>>>>>> I.e., two syntactically different blank nodes do not necessarily mean >>>>>>>> that they are two different entities. >>>>>>>> >>>>>>>> I know that there has been a lot of discussion on blank nodes in the >>>>>>>> past, cf. [1, 2, 3]. I just want to make sure that there are no recent >>>>>>>> changes on the semantics of blank nodes that I missed. Please let me >>>>>>>> know if I miss some recent updates in this area. Many thanks! >>>>>>> >>>>>>> In standardising Web technologies, the W3C is extremely cautious about >>>>>>> backward compatibility. If something was defined in some way in a version of >>>>>>> a W3C standard, it is likely to work the same in later versions. Sometimes, >>>>>>> features get deprecated, but they still work the same, if used. Other times, >>>>>>> features get added, but they do not change the way prior features work. >>>>>>> Obviously, there are exceptions, even in RDF. For instance, the way literals >>>>>>> and datatypes work in RDF 1.1 is different from RDF 1.0, but the practical >>>>>>> consequences are almost insignificant. >>>>>>> >>>>>>> >>>>>>> --AZ >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> Cheers, >>>>>>>> A >>>>>>>> >>>>>>>> [1] M. Arenas, M. Consens. A. Mallea. Revisiting Blank Nodes in RDF to >>>>>>>> Avoid the Semantic Mismatch with SPARQL. >>>>>>>> https://www.w3.org/2009/12/rdf-ws/papers/ws23 >>>>>>>> >>>>>>>> [2] A. Hogan, M. Arenas, A. Mallea, A. Polleres. Everything You Always >>>>>>>> Wanted to Know About Blank Nodes. Journal of Web Semantics. 2014. >>>>>>>> >>>>>>>> [3] A. Mallea, M. Arenas, A. Hogan, A. Polleres. On Blank Nodes. ISWC 2011. >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Antoine Zimmermann >>>>>>> Institut Henri Fayol >>>>>>> École des Mines de Saint-Étienne >>>>>>> 158 cours Fauriel >>>>>>> CS 62362 >>>>>>> 42023 Saint-Étienne Cedex 2 >>>>>>> France >>>>>>> Tél:+33(0)4 77 42 66 03 >>>>>>> Fax:+33(0)4 77 42 66 66 >>>>>>> http://www.emse.fr/~zimmermann/ >>>>>>> Member of team Connected Intelligence, Laboratoire Hubert Curien >>>>>>> >>>>>> >>>> >>>> >>>> -- >>>> Antoine Zimmermann >>>> ISCOD / LSTI - Institut Henri Fayol >>>> École Nationale Supérieure des Mines de Saint-Étienne >>>> 158 cours Fauriel >>>> 42023 Saint-Étienne Cedex 2 >>>> France >>>> Tél:+33(0)4 77 42 66 03 >>>> Fax:+33(0)4 77 42 66 66 >>>> http://zimmer.aprilfoolsreview.com/ >> >
Received on Sunday, 5 July 2020 04:43:56 UTC