- From: Aidan Hogan <aidhog@gmail.com>
- Date: Tue, 30 Jun 2020 19:20:47 -0400
- To: Jiří Procházka <ojirio@gmail.com>
- Cc: semantic-web <semantic-web@w3.org>
Hi Jiří, On 2020-06-30 18:47, Jiří Procházka wrote: > Aidan, thank you very much. For a long time I've just accepted blank > nodes need to be in RDF semantics for reasons I don't understand as I > (still) prefer using generated IRIs instead for most use cases, but > you've clearly explained the reasons and the use cases where they are > necessary. Great to hear! :) > Adding to the explanation I think I'm not being wrong in saying that for > both situations the locality and existentiality of blank node give the > information a sort of immutability, safety against modification of the > intended meaning when merging graphs (external data). The other graphs > cannot modify the list, or narrow down the specifics of who killed Bob, > while erasing the information that at some point (in some graph) the > specifics were unknown and we just knew, that someone killed Bob. Interesting point, yes! This feature can indeed be a useful one, and I *believe* it was one of the main reasons why in the RDF serialisation of OWL (2) DL, they require using blank nodes in definitions that require multiple triples: because, like you say, a well-formed DL definition using blank nodes in RDF can be verified for a local document and will remain well-formed when RDF documents are merged (aka. ontologies are imported). For example, the list of classes in a union definition cannot suddenly grow "branches" when merged with another RDF document/ontology. (In the other direction, going from the other syntaxes of OWL to RDF, blank nodes are of course useful to fill in those implicit nodes that are needed for n-ary relations and lists.) In fact, something I didn't think about before, but were it not for locality, then the existential semantics would not be possible in the same sense, because in the case of: :Bob :murderedBy _:b1 , _:b2 . If _:b1 and _:b2 were not local, this could not be equivalent to: :Bob :murderedBy _:b1 . As we could always later encounter an external document elsewhere with: _:b1 :name "Alice" . _:b2 :name "Carol" . And clearly there is a difference between: :Bob :murderedBy _:b1 , _:b2 . _:b1 :name "Alice" . _:b2 :name "Carol" . And ... :Bob :murderedBy _:b1 . _:b1 :name "Alice" . _:b2 :name "Carol" . So non-local existential semantics becomes almost ill-defined. Best, Aidan > On 6/30/20 1:33 AM, Aidan Hogan wrote: >> For what it is worth, we started working on the topic of blank nodes >> some time ago similarity convinced of the fact that the RDF semantics of >> blank nodes was unintuitive, and that a better semantics could be found. >> A couple of papers and several years later, I was/am more or less >> convinced that the semantics of blank nodes is as it should be in RDF. >> >> >> As a summary: >> >> Blank nodes are typically useful in two situations: >> >> (1) Implicit nodes: you don't have to name blank nodes but rather blank >> node labels can be generated automatically. This allows for shortcuts >> like lists ":abc :has ( a b c ) ." in Turtle. >> >> (2) Existential variables: for example ":Bob :murderedBy _:b ."; we know >> Bob was murdered but we don't know by whom he was murdered. >> >> >> How blank nodes are defined in RDF has two main characteristics: >> >> (A) Locality: _:b in document D1 is not the same as _:b in document D2. >> >> (B) Existentiality: in the graph ":Bob :murderedBy _:b1 , _:b2 .", this >> states that "Bob was murdered by someone", "Bob was murdered by >> someone", which is equivalent to ":Bob :murderedBy _:b1 ." >> >> >> Obviously for (2) we need to define blank nodes existentially, but I >> will try to argue that this is the best solution even just for (1). So >> in the context of (1) we'll look at changing (A) locality and (B) >> existentiality, and see what happens. >> >> >> (A) We could think about removing locality, and make blank node global, >> but now in the context of (1), a parser has to take care of generating a >> term that is globally unique, which will require something like a "base >> IRI" and some non-trivial conventions regarding what to do when parsing >> something from the same base URI multiple times (also considering that >> the document might have changed). This would greatly complicate simply >> parsing an RDF document. >> >> (B) We could think about keeping locality and removing existentiality, >> but let's say we take two RDF graphs G1 and G2 parsed from the same >> Turtle document: >> >> ":abc :has ( a b c ) ." >> >> by two different parsers that generate different blank nodes to >> represent the list. If blank nodes are not existential, which is to say >> that if blank nodes denote a resource in a similar manner to IRIs, then >> we lose the formal relation between G1 and G2, even though they >> represent the same data; more specifically, we would consider that G1 >> associates :abc with one list, and that G2 associates :abc with a >> *potentially* different list. To me, this behaviour is undesirable. >> >> Under existential semantics, I can say that G1 and G2 are formally >> equivalent. If I union/merge G1 and G2, the existential semantics tells >> me that in the resulting graph, there exists one list, not that :abc >> potentially has two lists, allowing me to "lean" the graph and keep just >> one list (if I want). >> >> Put more simply perhaps, without an existential semantics of blank >> nodes, every time I parse a document and generate different blank nodes, >> I would be "creating" resources that are potentially different each time >> to serve as the referents of the blank nodes. >> >> If I'm not okay with blank nodes being existentials, there is still the >> option of simply (<- no pun intended) not leaning the data, and/or of >> using skolemisation to generate IRIs to replace blank nodes. >> >> Best, >> Aidan >> >> On 2020-06-29 18:33, thomas lörtsch wrote: >>> >>> >>>> On 29. Jun 2020, at 21:57, Antoine Zimmermann >>>> <antoine.zimmermann@emse.fr> wrote: >>>> >>>> Le 29/06/2020 à 20:33, thomas lörtsch a écrit : >>>>>> On 23. Jun 2020, at 14:10, Eric Prud'hommeaux <eric@w3.org> wrote: >>>>>> >>>>>> On Tue, Jun 23, 2020 at 01:11:32PM +0200, Antoine Zimmermann wrote: >>>>>>> >>>>>>> >>>>>>> Le 21/06/2020 à 15:35, angin scribe a écrit : >>>>>>>> Hi everyone, >>>>>>>> >>>>>>>> Is the standard semantics of blank nodes in RDF still the same as >>>>>>>> existentially quantified variables? >>>>>>> >>>>>>> Yes. >>>>>>> >>>>>>> >>>>>>>> Let "_:b1" and "_:b2" be blank nodes, In the current standard >>>>>>>> semantics >>>>>>>> of RDF, is it still true that the graph below does not >>>>>>>> necessarily mean >>>>>>>> that Bob has two different things? >>>>>>>> >>>>>>>> Bob has _:b1 >>>>>>>> Bob has _:b2 >>>>>>> >>>>>>> Indeed. This graph would be equivalent to saying: >>>>>>> >>>>>>> "Bob has something. Bob has something." >>>>>>> >>>>>>> We can't conclude that Bob has 2 things. >>>>> I’m sorry but this is so frustratingly counter intuitive that I’d >>>>> like to ask for an explanation: what constraints in the smenatics of >>>>> RDF make it impossible to provide a tighter definition? >>>> >>>> The semantics of RDF is designed to make it possible to express the >>>> mere notion of existence as in First Order Logic or other logics, by >>>> the use of blank nodes. Considering that they behave exactly like the >>>> hundred-year-old existential variables of said logics, they could be >>>> deemed rather intuitive. Logicians, mathematians, philosophers, >>>> engineers, computer scientists, have relied on FOL as a basis of tons >>>> of fundamental concepts. >>> >>> I definitely don’t feel like smashing any tables in the temple of FOL. >>> By any account FOL is very useful and expressive, but it requires >>> skills and careful crafting - not everybody finds that intuitive. It’s >>> well known that people struggle with Modus Tollens just as they >>> struggle with understanding exponential growth rates, no matter how >>> useful, old etc those concepts are. >>> However my main point is maybe a bit different. I think it has to do >>> with the situation of the author, maybe the social dimension of >>> writing down some triples. >>> >>>> What may seem unintuitive is the peculiar representation that RDF >>>> uses. While in a formula like: >>>> >>>> ∃x∃y (has(Bob,x) ∧ has(Bob,y)) >>>> >>>> it is clear and explicit that x and y are existentially quantified, >>>> and what the scope of the quantification is, >>> >>>> in RDF, on the contrary, quantification is implicit because bnodes >>>> can only be used existentially (not universally) and their scope is >>>> always that of the RDF graph under consideration. >>> >>> >>>> If b1 and b2 are blank nodes, then the RDF graph: >>>> >>>> { (<Bob>, <has>, b1), (<Bob>, <has>, b2) } is exactly equivalent to a >>>> FOL formula: >>>> >>>> ∃x∃y (Triple(<Bob>,<has>,x) ∧ Triple(<Bob>,<has>,y)) >>> >>> I had this line in mind and it gives two existentials, x and y. To me >>> two existentials are intuitively not the same as one existential. >>> >>>> which is itself exactly FOL-equivalent to: >>>> >>>> ∃z (Triple(<Bob>,<has>,z)) >>> >>> I didn’t see that coming (and it’s the core of my problem). >>> >>>> which is also FOL-equivalent to: >>>> >>>> ∃a∃b∃c (Triple(<Bob>,<has>,a) ∧ Triple(<Bob>,<has>,b) ∧ >>>> Triple(<Bob>,<has>,c)) >>> >>> Or even that. I have only a vague understanding about the reasons for >>> such simplifications and the consequences they have. I’m sure they are >>> very useful in logic just like it can be useful in mathematics to get >>> rid of variables. However... >>> >>> ...if I envision a situation where the initial statements "Bob has >>> something (x)." and "Bob has something (y)." materialize out of the >>> blue, without futher attributions, leaning x and y to z sure makes >>> sense and is indeed intuitive. It’s like imagining somebody saying >>> "Bob has something" over and over again, maybe absent-minded or even >>> disturbed. All "Bob has something" utterings ammount to the same one >>> statement. >>> However a more realistic (or "social", if I may) scenario is that >>> somebody authored those two statements and added them both to the >>> graph to express that there are two things that Bob has and that are >>> worth mentioning. Maybe other statements are to be added later, like >>> the small and big attributes in your example below. Maybe the >>> statements are just a beginning, a stub for more to come. That’s the >>> scenario I had in mind and in that scenario it is not intuitive at all >>> that the two blank nodes get leaned. It might make me yell at my >>> computer "What do you think I’m doing here? Do you think I typed those >>> TWO statements just for fun? etc etc". >>> >>> The way that FOL uses existentials is not necessarily the only way >>> they can be understood and this is where intuition can very well >>> break. I don’t want to develop too many theories just yet but I >>> suspect that one could argue that FOL presupposes some conditions that >>> are not necessarily a given in normal human communications, or even >>> run counter a normal authoring process like e.g an unfolding text with >>> place holders and vague but distinct references. And then they become >>> counter intuitive, no matter how logical and sound they are within the >>> closed system of FOL. This is not to say that one way is more right >>> than the other. Logic has certain powers just as composing, but they >>> have different rules - and they may clash when RDF is authored. >>> >>>> An existential variable (or a blank node) does not identify anything. >>>> It only mentions the existence of a thing. If I say that there exist >>>> a person that lived more than 10 years, I'm not referring to anyone >>>> in particular. I'm just stating the existence of such a thing. >>> >>>> Now, as in FOL, it is necessary to have infinitely many variables, >>>> because I can qualify more precisely the things of which I'm stating >>>> the existence. I may say: >>>> >>>> "Bob has something big. Bob has something small." >>>> >>>> which is not the same as saying: >>>> >>>> "Bob has something that is big and small." >>>> >>>> >>>> In RDF, compare: >>>> >>>> <Bob> <has> [ <is> <Small> ] . >>>> <Bob> <has> [ <is> <Big> ] . >>>> >>>> and: >>>> >>>> <Bob> <has> [ <is> <Small>, <Big> ] . >>>> >>>> In the first case, I need two blank nodes, because, although the >>>> second graph entails the first, they are not equivalent. According to >>>> the first graph, it is still possible that it describes a world where >>>> a small thing is never big and vice versa. >>> >>> Now imagine a secnario where the small/big attributions are not made >>> _yet_. FOL will lean away what the author might have meant to merely >>> hint at or explain in more detail later. >>> I only now realized that in my initial mail I had made the implicit >>> assumption the example >>>>>>>> Bob has _:b1 >>>>>>>> Bob has _:b2 >>> is just a starting point to which later statements like >>> _:b1 is Small >>> _:b2 is Big >>> may be added. >>> At that point RDF wouldn’t lean those two bnodes into one anymore, right? >>> >>>> Note that in these latter examples, I do not even need a bnode >>>> identifier, because I merely state the existence of a thing, I do not >>>> identify anything. But due to the limitation of digital >>>> representations, we have to serialise every graphs as a string of >>>> character, such that it becomes necessary, in some cases, to >>>> introduce back references in the form of bnode identifiers. bnode id >>>> are not names for things. They are just tools that allow a linear >>>> representation of arbitrary graphs. >>> >>> I (think I) did and do understand what you say about bnodes. >>> >>>> If you can draw your graphs on surfaces, you can reuse the same >>>> symbol all the time for every blank node, such as an empty ellipse of >>>> constant size. >>> >>> The different x/y coordinates on that surface disambiguate the >>> circles. Circles in different positions stand for different >>> somethings. The position itself carries no meaning. >>> >>> >>> Thank you for the very thorough explanation! I hope my >>> counter-argument makes some sense now. >>> Thomas >>> >>> >>>> --AZ >>>> >>>>> My intuition is that two different identifiers point to two >>>>> different things. I would rather translate the above to: >>>>> "Bob has some x-thing. Bob has some y-thing." >>>>> Sure, we don’t know for certain that x and y are distinct until some >>>>> statement to that effect is made, but per default different >>>>> existentials should refer to different things. Otherwise what’s the >>>>> point in having different extistentials? Why not just one >>>>> "something" symbol instead of indefinitely many blank nodes? >>>>> Thomas >>>>>> Reducing this to "Bob has _:b3" is called "graph leaning". This is a >>>>>> behavior of RDF semantics, upon which is built RDFS and OWL, but >>>>>> interestingly not the SPARQL 1.1 RDFS entailment regime. >>>>>> >>>>>> RDF 1.1 Semantics says "Blank nodes are treated as simply indicating >>>>>> the existence of a thing, without using an IRI to identify any >>>>>> particular thing." It follows from there that two statements: >>>>>> [] a :Barn . >>>>>> [] :color :red . >>>>>> might be talking about the same thing. You can't know without some >>>>>> inverse functional properties or other application logic. AFAICT, >>>>>> while >>>>>> architects may take this in mind when designing data models, no tool >>>>>> uses RDF semantics on its own. There have been tool chains that >>>>>> count on >>>>>> graph leaning, but the ones I saw were stand-alone processing >>>>>> steps, not >>>>>> features intrinsic to generic RDF processors. >>>>>> >>>>>> SPARQL is really a graph query language; it doesn't do any sort of >>>>>> graph leaning. `SELECT * { <s> <p> ?o }` will give you two bindings >>>>>> ┌──────┐ ┌────────┐ >>>>>> │ ?o │ or │ ?o │ >>>>>> │ _:b1 │ │ _:abcd │ >>>>>> │ _:b2 │ │ _:efgh │ >>>>>> └──────┘ └────────┘ >>>>>> (Those bindings may have any distinct labels; there's no assurance >>>>>> that blank node labels are preserved.) >>>>>> >>>>>> You could argue that a carefully constructed SPARQL query could allow >>>>>> you to deduce that the response you got could be leaned, but everyone >>>>>> I know of who wants counting semantics treats them as distinct >>>>>> individuals. I think this accounts for 95+% of the work done with RDF. >>>>>> >>>>>> RDFS only allows you to infer new stuff so it can't do any sort of >>>>>> leaning. OWL would allow you to specifically infer that they were the >>>>>> same individual but it can do that with IRIs as well so there doesn't >>>>>> seem to be much of an observable difference between them other than >>>>>> that some parts of OWL axioms require BNodes instead of IRIs to >>>>>> eliminate the effects of coreferences. >>>>>> >>>>>> I guess you could characterize it this way: >>>>>> >>>>>> 1. Graph semantics treat BNodes as individuals. >>>>>> test: insert { <s> <p> _:a , _b } and find two triples. >>>>>> >>>>>> 2. SPARQL (unextended) semantics likewise treat BNodes as individuals. >>>>>> test: SELECT * { <s> <p> ?o } >>>>>> >>>>>> 3. SPARQL RDF semantics still treat BNodes as individuals. >>>>>> >>>>>> 4. RDF Entailment implies lean-able graphs. >>>>>> >>>>>> 5. OWL can unify BNodes and IRIs. >>>>>> >>>>>> >>>>>>>> I.e., two syntactically different blank nodes do not necessarily >>>>>>>> mean >>>>>>>> that they are two different entities. >>>>>>>> >>>>>>>> I know that there has been a lot of discussion on blank nodes in the >>>>>>>> past, cf. [1, 2, 3]. I just want to make sure that there are no >>>>>>>> recent >>>>>>>> changes on the semantics of blank nodes that I missed. Please let me >>>>>>>> know if I miss some recent updates in this area. Many thanks! >>>>>>> >>>>>>> In standardising Web technologies, the W3C is extremely cautious >>>>>>> about >>>>>>> backward compatibility. If something was defined in some way in a >>>>>>> version of >>>>>>> a W3C standard, it is likely to work the same in later versions. >>>>>>> Sometimes, >>>>>>> features get deprecated, but they still work the same, if used. >>>>>>> Other times, >>>>>>> features get added, but they do not change the way prior features >>>>>>> work. >>>>>>> Obviously, there are exceptions, even in RDF. For instance, the >>>>>>> way literals >>>>>>> and datatypes work in RDF 1.1 is different from RDF 1.0, but the >>>>>>> practical >>>>>>> consequences are almost insignificant. >>>>>>> >>>>>>> >>>>>>> --AZ >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> Cheers, >>>>>>>> A >>>>>>>> >>>>>>>> [1] M. Arenas, M. Consens. A. Mallea. Revisiting Blank Nodes in >>>>>>>> RDF to >>>>>>>> Avoid the Semantic Mismatch with SPARQL. >>>>>>>> https://www.w3.org/2009/12/rdf-ws/papers/ws23 >>>>>>>> >>>>>>>> [2] A. Hogan, M. Arenas, A. Mallea, A. Polleres. Everything You >>>>>>>> Always >>>>>>>> Wanted to Know About Blank Nodes. Journal of Web Semantics. 2014. >>>>>>>> >>>>>>>> [3] A. Mallea, M. Arenas, A. Hogan, A. Polleres. On Blank Nodes. >>>>>>>> ISWC 2011. >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Antoine Zimmermann >>>>>>> Institut Henri Fayol >>>>>>> École des Mines de Saint-Étienne >>>>>>> 158 cours Fauriel >>>>>>> CS 62362 >>>>>>> 42023 Saint-Étienne Cedex 2 >>>>>>> France >>>>>>> Tél:+33(0)4 77 42 66 03 >>>>>>> Fax:+33(0)4 77 42 66 66 >>>>>>> http://www.emse.fr/~zimmermann/ >>>>>>> Member of team Connected Intelligence, Laboratoire Hubert Curien >>>>>>> >>>>>> >>>> >>>> >>>> -- >>>> Antoine Zimmermann >>>> ISCOD / LSTI - Institut Henri Fayol >>>> École Nationale Supérieure des Mines de Saint-Étienne >>>> 158 cours Fauriel >>>> 42023 Saint-Étienne Cedex 2 >>>> France >>>> Tél:+33(0)4 77 42 66 03 >>>> Fax:+33(0)4 77 42 66 66 >>>> http://zimmer.aprilfoolsreview.com/ >>> >>> >> >
Received on Tuesday, 30 June 2020 23:21:05 UTC