Re: N-ary Relations - Toward easier RDF: a proposal

Can someone point out a few good reference books that deal with the mathematics at the core of the layers of the Semantic Web stack?
Any additional reference that deals with comparative reviews of knowledge representation frameworks to create semantic information and that compares frameworks for ontology generation is also very welcome. Milton Ponson
GSM: +297 747 8280
PO Box 1154, Oranjestad
Aruba, Dutch Caribbean
Project Paradigm: Bringing the ICT tools for sustainable development to all stakeholders worldwide through collaborative research on applied mathematics, advanced modeling, software and standards development 

    On Friday, November 23, 2018 7:56 PM, Hans Teijgeler <hans.teijgeler@quicknet.nl> wrote:
 

  David,
Our software folks were delighted when I forwarded your email below.
We, in the ISO 15926 community, would like the concept of N-ary relations to be standardized in RDF, as well as workable Lists.
ISO 15926-7 templates are based on "Defining N-ary Relations on the Semantic Web", a W3C Working Group Note dated 12 April 2006, by Natasha Noy and Alan Rector. It would be helpful in case this would become a part of the RDF spec., although the link between RDF and OWL could be an impediment for that.
The upper ontology defined in ISO 15926-2 defines Relationship and ClassOfRelationship, each having two relations. An instance of Relationship can be classified (typed) with an instance of ClassOfRelationship. No clumsy reification required.
We also know the MultidimensionalObject with N relations, where those relations are defined with a ClassOfMultidimensionalObject.
Regards, Hans
15926.org
_______________________________________________________________________________________

---------- Oorspronkelijk bericht ----------

Van: David Booth <david@dbooth.org>
Aan: semantic-web <semantic-web@w3.org>
Cc: Dan Brickley <danbri@google.com>, "Sean B. Palmer" <sean@miscoranda.com>, Olaf Hartig <olaf.hartig@liu.se>, Axel Polleres <axel@polleres.net>
Datum: 21 november 2018 om 23:40
Onderwerp: Toward easier RDF: a proposalOn 10/18/2018 05:09 PM, Dan Brickley wrote:
There are serious frustrations that come with trying to use
RDF (and RDFS/OWL/SPARQL, JSON-LD, RDFa, Turtle, N-Triples
et al.), . . . [ . . . ] If there is to be value in having
continued SW/RDF groups around here, it's much more likely to
be around practical collaboration to make RDF less annoying
to work with, . . . .
Perfect lead-in! For many months I've been working up the
gumption to raise this topic on this list. I guess now is
the time. :)The value of RDF has been well proven, in many applications,
over the 20+ years since it was first created. At the
same time, a painful reality has emerged: RDF is too hard for
*average* developers. By "average developers" I mean those
in the middle 33 percent of ability. And by "RDF", I mean the
whole RDF ecosystem -- including SPARQL, OWL, tools, standards,
etc. -- everything that a developer touches when using RDF.For anyone who might be attempted to argue "But RDF is easy!",
please bear in mind that *you*, dear reader, are *not* average.
You are a member of an elite who grok RDF and can work around
its frustrations and bizarre subtleties. And for anyone who is
tempted to argue that we just need to better educate the world
about RDF: Sorry, but no. I and many others have been trying to
do exactly that for over 15 years, and it has not been enough.Using RDF is like programming in assembly language.
It is tedious, frustrating and error prone. Somehow, we
need to move up to a higher, easier, more productive level.
One bright light in our favor is that RDF already provides a
very solid foundation to build upon, based on formal logic.
Another is that graph databases -- though not specifically
RDF -- are now getting substantial commercial attention.Difficulty of use has caused RDF to be categorized as a niche
technology. This is unfortunate because it limits uptake and
prevents RDF from being a viable choice for many use cases that
would otherwise be an excellent fit. Use cases that depend
on broad uptake can *only* be achieved when RDF is usable by
*average* development teams.I've been puzzling this problem for several years. I spoke
about it at the US Semantic Technology Symposium (US2TS) early
this year[1], and Evan Wallace and I will lead a session at
the 2019 US2TS[2] in March to address it further. See also
excellent observations by Sean Palmer[3], Dan Brickley[4]
and Axel Polleres et al[5]. I have collected a few ideas,
but I do not have complete answers. I think it will take a
community effort -- and more new ideas -- to fix this problem.PROPOSAL:
To address RDF ease-of-use head-on, as a community effort.Guiding principles:   
   - The goal is to make RDF -- or some RDF-based successor --   
easy enough for *average* developers (middle 33%), who are   
new to RDF, to be consistently successful.
   - Solutions may involve anything in the RDF ecosystem:   
standards, tools, guidance, etc. All options are on the table.
   - Backward compatibility is highly desirable, but *less*   
important than ease of use.
SPECIFIC PROBLEMSThe rest of this message catalogs some of the biggest
difficulties that I have noticed in using RDF. YMMV. They
are not necessarily in priority order, and there may be
others that I missed. One goal should be to prioritize them.
Some have obvious potential fixes; others don't. I've also
included some potential solution ideas. I am interested
to hear your feedback, as well as any other problems
or solution ideas that you think should be considered.Please MAKE A NEW SUBJECT LINE if you reply about one of the
specific problems below, to help organize the discussion.   
   - Tools are scattered. How to find them? Which to use?   
Every team wastes time going through a similar research and   
selection process.
One idea: create a bundled release of RDF tools, analogous
to a standard LAMP stack, or Red Hat or Ubuntu; so that if
someone wants to use RDF all they have to do is install that
bundle and they're ready to go.   
   - IRI allocation. IRIs must be allocated for almost everything   
in RDF: things, concepts, properties, etc. -- both TBox   
(ontology/schema) and ABox (instance data). IRI allocation   
is easy in theory but hard in practice! "Cool IRIs" are   
dereferenceable http(s) IRIs, but domain registration costs   
money and is not permanent. Dereferenceable IRIs require a   
commitment that many RDF producers are not ready/able/willing   
to make. And even when the RDF producer is willing to use   
dereferenceable http(s) IRIs, how exactly should those IRIs   
be formed? There are many possible solutions, but no standard   
best practice. Again every team has to figure out its own path.
   - Blank nodes. They are an important convenience for RDF   
authors, but they cause insidious downstream complications.   
They have subtle, confusing semantics. (As Nathan Rixham   
once aptly put it, a blank node is "a name that is not   
a name".) Blank nodes are special second-class citizens   
in RDF. They cannot be used as predicates, and they are not   
stable identifiers. A blank node label cannot be used in   
a follow-up SPARQL query to refer to the same node, which   
is justifiably viewed as completely broken by RDF newbies.   
Blank nodes also cause duplicate triples (non-lean) when the   
same data is loaded more than once, which can easily happen   
when data is merged from different sources. And they cause   
difficulties with canonicalization, described next.
   - Lack of standard RDF canonicalization. Canonicalization   
is the ability to represent RDF in a consistent, predictable   
serialization. It is essential for diff and digital signatures.   
Developers expect to be able to diff two files, and source   
control systems rely on being able to do so. It is easy with   
most other data representations. Why not RDF? Answer: Blank   
nodes. Unrestricted blank nodes cause RDF canonicalization   
to be a "hard problem", equivalent in complexity to the graph   
isomorphism problem.[6]
Some recent good progress on canonicalization: JSON-LD
https://json-ld.github.io/normalization/spec/ . However, the
current JSON-LD canonicalization draft (called "normalization")
is focused only on the digital signatures use case, and
needs improvement to better address the diff use case, in
which small, localized graph changes should result in small,
localized differences in the canonicalized graph.   
   - SPARQL-friendly lists. It is very hard[7] to query RDF   
lists, using standard SPARQL, while returning item ordering.   
This inability to conveniently handle such a basic data   
construct seems brain-dead to developers who have grown to   
take lists for granted.
Apache Jena offers one potential (though non-standard)
way to ease this pain, by defining a list:index property:
https://jena.apache.org/documentation/query/rdf_lists.html
Another possibility would be to add lists as a fundamental
concept in RDF, as proposed by David Wood and James Leigh
prior to the RDF 1.1 work.[8]   
   - Standardized n-ary relations (and property graphs). Since   
RDF natively supports only binary relations, relations between   
more than two entities must be encoded using groups of triples.   
A W3C Working Group Note[9] describes some common patterns,   
but no standard has been defined for them. As a result,   
tools cannot reliably recognize and act on these groups of   
triples as the atomic units that they are intended to represent.
This deficiency has greater significance than it may appear,
because it is subtly related to the blank node problem:
a major use of blank nodes is to encode n-ary relations.
In other words, n-ary relations are a major contributor to
the blank node problem.Furthermore, standardized n-ary relations could also enable
direct support for property graphs[10], which have emerged as
a popular and convenient way to represent graph data, led by
Neo4J.[11] Property graphs add the ability to attach attributes
to relationships, which can be viewed as a special case of
n-ary relations. Olaf Hartig and Bryan Thompson have proposed
conventions for adding property graph support to RDF.[12]   
   - Literals as subjects. RDF should allow "anyone to say   
anything about anything", but RDF does not currently allow   
literals as subjects! (One work-around is to use -- you guessed   
it -- a blank node, which in turn is asserted to be owl:sameAs   
the literal.) This deficiency may seem unimportant relative   
to other RDF difficulties, but it is a peculiar anomaly that   
may have greater impact than we realize. Imagine an *average*   
developer, new to RDF, who unknowingly violates this rule and   
is puzzled when it doesn't work. Negative experiences like   
that drive people away. Even more insidiously, imagine this   
developer tries to CONSTRUCT triples using a SPARQL query,   
and some of those triples happen to have literals in the   
subject position. Per the SPARQL standard, those triples will   
be silently eliminated from the results,[13] which could lead   
to silently producing wrong answers from the application --   
the worst of all possible bugs.
   - Lack of a standard rules language. This is a big one.   
Inference is fundamental to the value proposition of RDF,   
and almost every application needs to perform some kind   
of application-specific inference. ("Inference" is used   
broadly herein to mean any rule or procedure that produces new   
assertions from existing assertions -- not just conventional   
inference engines or rules languages.) But paradoxically,   
we still do not have a *standard* RDF rules language.   
(See also Sean Palmer's apt observations about N3 rules.[14])   
Furthermore, applications often need to perform custom   
"inferences" (or data transformations) that are not convenient   
to express in available (non-standard) rules languages, such   
as RDF data transformations that are needed when merging data   
from independently developed sources having different data   
models and vocabularies. And merging independently developed   
data is the *most* fundamental use case of the Semantic Web.
One possibility for addressing this need might be to embed
RDF in a full-fledged programming language, so that complex
inference rules can be expressed using the full power and
convenience of that programming language. Another possibility
might be to provide a convenient, standard way to bind custom
inference rules to functions defined in a programming language.
A third possibility might be to standardize a sufficiently
powerful rules language.However, see also some excellent cautionary comments from Jesus
Barras(Neo4J) and MarkLogic on inference: "No one likes rules
engines --> horrible to debug / performance . . . Reasoning
with ontology languages quickly gets intractable/undecidable"
and "Inference is expensive. When considering it, you should:
1) run it over as small a dataset as possible 2) use only the
rules you need 3) consider alternatives."[15]   
   - Namespace proliferation. It's hard to manage all the   
namespaces involved in using RDF: FOAF, SKOS, DC and all the   
hundreds of specialized namespaces that are encountered when   
using external RDF. Namespaces can help organize IRIs into   
categories (typically based on the IRI's origin), but this   
fact is nowhere recognized in official RDF specs. Indeed,   
the official mantra is that IRIs are opaque, and there are   
very important design reasons for opacity.[16] But there is   
a cost: RDF is stuck in a flat, global naming space analogous   
to global variables of 1960's programming languages. Somehow,   
modern programming languages deal with namespaces much more   
conveniently than RDF does. Perhaps we can learn from them,   
without undermining the Web's design principles.
Related issue: the RDF model does not retain namespace info.
As such, namespaces are often lost when tools process RDF.
One partial solution might be to standardize RDF triples that
capture serialization-related information, such as namespaces,
and have tools retain them in a separate graph.   
   - IRI reuse and synonyms. In theory, RDF authors should reuse   
existing IRIs, rather than minting their own. But this makes   
for messy RDF and increases the up-front burden on developers.   
Consider a typical RDF project that integrates data from   
multiple sources, and needs to connect that data into its own   
vocabulary. The resulting data involves both the normalized   
vocabulary and the non-normalized source vocabularies,   
intermixed. The developers might be happy to adopt existing   
concepts like foaf:name (for a person's name) and dc:title (for   
a document title) into the project's normalized vocabulary.   
But by using those existing IRIs instead of minting their   
own IRIs in their own namespace (such as myapp:name and   
myapp:title), it becomes hard to distinguish IRIs of the normalized   
vocabulary from IRIs of the non-normalized source vocabularies.
Ideally a project should be able to use its own preferred names
(and namespaces), like myapp:name and myapp:title, while still
tying those names to existing external IRIs, such as foaf:name
and dc:title.owl:sameAs is not great for this. It is too heavyweight
for simple synonyms, and it is only for OWL individuals --
not classes. Furthermore, it provides no way to indicate
which IRI is locally preferred. It would be good to have a
simple standard way to rename IRIs or define IRI synonyms.   
   -       
      -          
         - -


Please USE A DIFFERENT SUBJECT LINE if you reply about a
specific problem/idea listed above, as opposed to replying
about the overall proposal of addressing RDF ease-of-use as
a community effort. As always, comments/suggestions/ideas
are welcome.Thanks!
David BoothReferences:   
   - "Toward Easier RDF", David Booth, slides from 2018 US   
Semantic Technology Symposium:   
https://goo.gl/H2vBYi   

   - US Semantic Technology Symposium (US2TS):   
http://www.us2ts.org/   

   - "What happened to the Semantic   
Web?" (general comments), Sean Palmer:   
https://lists.w3.org/Archives/Public/semantic-web/2017Oct/0024.html   

   - "Semantic Web Interest Group now closed",   
"RDF(-DEV), back to the future", Dan Brickley:   
https://lists.w3.org/Archives/Public/semantic-web/2018Oct/0086.html   
https://lists.w3.org/Archives/Public/semantic-web/2018Oct/0052.html   

   - "A More Decentralized Vision for Linked Data", Axel Polleres,   
Maulik R. Kamdar, Javier D. Fernandez, Tania Tudorache, and   
Mark A. Musen: https://openreview.net/pdf?id=H1lS_g81gX   

   - "Signing RDF Graphs", Jeremy Carroll   
http://www.hpl.hp.com/techreports/2003/HPL-2003-142.pdf   

   - "Is it possible to get the position of an element   
in an RDF Collection in SPARQL?", see Joshua   
Taylor's answer, "A Pure SPARQL 1.1 Solution":   
https://stackoverflow.com/questions/17523804/is-it-possible-to-get-the-position-of-an-element-in-an-rdf-collection-in-sparql   

   - "An Ordered RDF List", David Wood and James Leigh:   
https://www.w3.org/2009/12/rdf-ws/papers/ws14   

   - "Defining N-ary Relations on the Semantic Web", W3C Working Group:   
https://www.w3.org/TR/swbp-n-aryRelations/   

   - Property Graph, Wikipedia:   
https://en.wikipedia.org/wiki/Graph_database#Labeled-Property_Graph   

   - DB-Engines Ranking of Graph DBMS:   
https://db-engines.com/en/ranking/graph+dbms   

   - "Standards for storing RDF/OWL in a property graph?", Olaf Hartig:   
https://lists.w3.org/Archives/Public/semantic-web/2018Apr/0030.html   

   - "SPARQL 1.1 Query Language: CONSTRUCT":   
https://www.w3.org/TR/sparql11-query/#construct   

   - "What happened to the Semantic   
Web?" (SPARQL comments), Sean Palmer:   
https://lists.w3.org/Archives/Public/semantic-web/2017Oct/0045.html   
https://lists.w3.org/Archives/Public/semantic-web/2017Oct/0059.html   

   - "Debunking some 'RDF vs. Property Graph' Alternative Facts",   
Jesus Barras, slides 34 and 35:   
https://www.slideshare.net/neo4j/graphconnect-europe-2017-debunking-some-rdf-vs-property-graph-alternative-facts-neo4j   

   - "Universal Resource Identifiers: The Opacity Axiom", Tim   
Berners-Lee:   
https://www.w3.org/DesignIssues/Axioms.html#opaque   

   - "Notation3 (N3): A readable RDF syntax", W3C Team Submission,   
Tim Berners-Lee and Dan Connolly:   
https://www.w3.org/TeamSubmission/n3/   




  

   

Received on Saturday, 24 November 2018 16:34:19 UTC