Re: Well-formedness for option 3 from Andy Seaborne on 2024-02-28 (public-rdf-star-wg@w3.org from February 2024)

From: Andy Seaborne <andy@apache.org>
Date: Wed, 28 Feb 2024 16:09:08 +0000
To: public-rdf-star-wg@w3.org
Message-ID: <4c9b053d-0ffb-4107-97c6-1d5c4b7699bb@apache.org>
On 28/02/2024 09:03, Olaf Hartig wrote:
> Dear all,
> 
> Do we have an email or a document with a definition of well-formedness
> in the context of option 3? I couldn't find any, but perhaps I
> overlooked something.

I don't understand why the well-formedness / macros are in the semantics.

We have the occurrence and annotation syntax in Turtle.

We don't have occurrence and annotation syntax in N-Triples, only the 
direct triple forms.

A graph written in Turtle only using occurrence and annotation syntax 
will be "reification well-formed" but the definition isn't tied to the 
macros. It is having triple terms only in the object position of 
rdf:nameOf and that can be applied to a N-triples.

The macros do not allow a graph of just an occurrence because macros 
require that the occurrence be used somewhere. The "reification 
well-formed" condition does not. That is, a graph that is all 
"occurrences" (edges) does not fit the macros.

> The words “well-formed” and “well-formedness” were mentioned in recent
> calls that took place after the call in which we came to the consensus
> to focus on option 3. So, I assume that group members have an
> understanding what the notion of well-formedness for option 3 means.
> Yet, I couldn’t find any form of definition for it. The only definition
> that I found is the one of a “reification well-formed RDF graph” by
> Peter [1], but that one is focused on options 1 and 2, and not directly
> applicable to option 3.
> 
> So, what is your understanding of a well-formed RDF graph in the
> context of option 3?
> 
> Mine is as follows: An RDF graph is well formed iff it has all of the
> following properties.
> 
> - Property 0: None of the triples in the graph has a triple term [2] as
> its subject.
> (In my reading of option 3, triple terms in the subject are already
> ruled out by the abstract syntax itself, which makes mentioning this
> property here obsolete. Yet, I still mention it for the moment because
> some group members seem to argue for an abstract syntax in which triple
> terms may be used in the subject position.)

+1

In defining "well-formedness", property 0 make sense.

It isn't something I feel strongly one way or the other about enforcing 
in the RDF data model.

> - Property 1: For every triple in the graph that has a triple term as
> its object, the predicate of this triple must be rdf:nameOf.
> (I understand that the name of this predicate IRI is still under
> discussion.)

+1 for well formedness.

> - Property 2: For every pair of triples in the graph, if both triples
> have a triple term as their object (and, thus, have rdf:nameOf as their
> predicate, as per the previous point above) and these two triple terms
> are different from one another, then the two triples must not have the
> same subject.

> I assume that Property 2 might be controversial. It has the
> disadvantage that merging two well-formed graphs may result in a graph
> that is not well formed according to the notion of well-formedness with
> Property 2 included. However, well-formedness without Property 2 makes
> implementations that focus on efficient support for well-formed graphs
> significantly harder;

If the two graph have reused a name, the problem exists whether 
described well-formed or not. Checking for well-formedness requires the 
whole graph be available (unless further information is available like 
the data is sorted by subject URI).

> I mean, without Property 2, such implementations
> cannot employ data structures (e.g., indexes) that assume that the
> subjects of rdf:nameOf triples functionally determine the triple terms.
> Notice also that Property 2 is essentially the option-3 variant of
> Peter’s aforementioned notion of a “reification well-formed RDF graph”
> for options 1 and 2.

In describing the mapping from RDF 1.2 to the reification of RDF 1.1, it 
should map well-formed(1.2) to well-formed(1.1)

The impact on implementations needs to balanced - implementations have 
every right to reject RDF they don't like e.g. require well-formed RDF 
lists, well-formed datatypes, or not passing SHACL validation, etc.

Well-formedness and the related good practice, are something to explain 
in the explanatory material on occurrences in e.g. the primer or "what's 
new"

> An idea to eliminate the aforementioned disadvantage of including
> Property 2 is to allow only blank nodes in the subject of rdf:nameOf
> triples, but that’s probably not very desirable either because it would
> mean that “occurrences” cannot be named by an IRI. Still, I thought I
> should mention this idea as a possible option to address the
> undesirable effect on graph merging that Property 2 would imply.

I prefer to "advise" use of blank nodes when a URI isn't needed and 
indeed we have special syntax in Turtle for that << :s :p :o >>. Saying 
"must not" is too strong. I'm not sure making this stronger uniquely for 
RDF-star is in the style or RDF.

This situation isn't unique to RDF-star. Any assumed (inverse) 
functional property condition can be broken on RDF merge.
It's a data validation matter.

     Andy

> 
> Best,
> Olaf
> 
> [1]
> https://github.com/w3c/rdf-star-wg/blob/main/docs/sugar-proposal.md#criticisms-and-responses
> 
> [2]
> https://pr-preview.s3.amazonaws.com/w3c/rdf-concepts/pull/78.html#dfn-triple-term
>
Received on Wednesday, 28 February 2024 16:09:14 UTC