Trying to articulate our various "profiles"

Hi everyone,

I've mentioned last week that I'm trying to see a bit clearer in the general profile picture, i.e. what kind of profiles and conformance we have across our documents. Here is my take. I started it mostly for me, but maybe it can be useful for the more general discussion...

Let's start with what looks like the most general concept. The DXWG CONNEG document (not the IETF one for now!) uses what I'll call "data profiles" (for the sake of using a term that will be different from another one to come). That can be seen as specifications of how data is expected to be. Note that "specification" is here with the broader understanding discussed in recent mails [1], which I believe is more or less what most of us had in mind when working on our official definitions [2].

I believe there can be a notion of conformance here: a data profile is a specification, and that specification may say how to test whether the data served complies with the spec. At this point we can jump briefly to the Profile Guidance, which, in order to meet our requirements [3], is set to state that it is good practice that the specification would say how to test whether the data served complies. Either by using a machine-readable implementation in a validation process, or just by turning to manual inspection (yes why not, after all!).

Let's jump to DCAT now. I understand that DCAT defines two forms of "conformance" with respect to "profiles" (quotes everywhere, on purpose...):

1. "conformance of datasets described with DCAT wrt. standards", as per the use of dct:conformsTo on instances of dcat:Dataset [4]. These "standards" are specifications ("any resource that specifies one or more aspects of the cataloged resource content"). But I guess it won't be controversial to say that this level of conformance is the same as the general notion of "data profiles" in CONNEG. The dataset is expected to conform to some specification, which hopefully will say how to test conformance ("The meaning of conformance is determined by provisions in the target standard").
I believe this because any specification in this part of DCAT seems worth being CONNEGed in the scenarios of CONNEG. And reciprocally, I believe that any spec that is likely to be involved in CONNEG scenarios can be used to specify a dataset.

2. "conformance of data catalogue descriptions wrt. DCAT profiles" (DCAT-AP-whatever). This is the dct:conformsTo that apply to instance of dcat:CatalogueRecord [5] and maybe dcat:Catalogue itself.
"DCAT profiles" are specs that can be said to be "data profiles" as above, but there is a bit of novelty: they are specs that are "named set of constraints based on DCAT" [6].
DCAT goes on describing what "conformance" means for "DCAT profiles" [7] ("A DCAT profile is a specification [...] that adds additional constraints to DCAT. A data catalog that conforms to the profile also conforms to DCAT") and it specifies what kind of constraints can be found in a DCAT profile.
This is where DCAT refers to DCMI and hints that "DCAT Profiles" are DCMI Application Profiles (APs). Honestly for me this flows quite naturally. A DCAT Profile is a DCMI AP, where DCAT is the base specification for the AP.
(NB: at this stage I agree that DCAT itself can be seen as an application profile of other vocabularies, as said in some messages on the DXWG list, but that's not really core to the discussion: it's not about what DCAT is, but the more general profiling framework around DCAT.)

Now I'm looking at our definition of "profile" [2] and the one of DCAT Profiles [6], and I see many common points. In fact I don't see a problem at all bridging the two...

Further, generalizing DCAT#2, I believe that we can still conceive a notion of an "application profile" as a "data profile" (in the sense of DCAT#1 and CONNEG) that is based on another data profile. Would there be any contradiction with the DCMI notion of APs?

In both cases (DCAT#2/APs and DCAT#1/CONNEG/"data profiles"), there is value in having PROF being used for serving metadata about the "profiles" and connecting to the specs that can be used for testing conformance.

Now, our definition of "profiles" seems quite ok wrt DCAT#2/APs. But it is too specific for DCAT#1/CONNEG, because it assumes that there is always another 'base specification'. It works rather well with "AP of specification X" but not with "data profile" that are not based on other specs.

In the past we have floated tricks like saying that a profile can be a profile of itself, which alleviates the need to for it to be a profile of something else. But honestly I think we should find a better way. So I suggest a small but essential change to our definition, making the "based on" predicate optional. I.e. let the definition be "A named set of constraints, which can be based on one or more other identified specifications, [etc]"
This definition could be used straight by CONNEG and for DCAT#1, and APs in the sense of DCAT#2 (and hopefully DCMI) would be subsumed by it.

And now the story becomes more complex, because I'm going to dive in conformance and inheritance.
But at this stage maybe we can have a check: is all the stuff above agreeable?


Received on Tuesday, 25 June 2019 16:43:56 UTC