Re: [dxwg] Reference Data on the Web Best Practices

_This is a very rough draft of a reading of DWBP in relation to profiles. I don't know if it fits into the document directly but using DWBP would fit this document into a W3C context. Obviously if we use this it would include links to those BPs._

Using the assumption that a profile on the web has some characteristics in common with data on the web, we can look to the Data on the Web Best Practices [link] as a source of some preferred practices for profiles.

The first obvious practice is BP9 [], to use persistent URIs as identifiers of profiles. This seems to be a non-controversial requirement because it applies to any web resource. There is a related best practice (BP10) which is stated as "Use persistent URIs as identifiers within datasets." This can be well-suited to the aspect of a profile that consists of the reuse of vocabulary terms that have been prevously defined, but also to the definition of new terms within the scope of the profile itself. Each element of a profile's vocabulary should have a URI (IRI?) that identifies the term and information about the term (such as labels, definitions, etc.)

The next relevant group of best practices have to do with describing the profile itself (BP1-3). In the BPoW this is defined as metadata about the dataset; with profiles this can be descriptive metadata within the profile or a separate metadata statement about the profile. (Since profiles themselves are generally forms of metadata they may be able to incorporate description and other administrative information about the profile within itself, if desired.) In addition there is a common set of administrative data that is recommended for many information resources such as dates and version designators for each version, and provenance (who or what agency created the digital file).

The BPDoW limits its recommendation of metadata covering the topic of the dataset to general keywords and themes and categories (BP2). It may be desirable provide more specific topical information to satisfy the DXWG requirement that profiles should be discoverable by search engines. The quality of discoverability will vary based on the depth of description of the topic and/or community area that it serves. (We may wish to recommend some particulars.)

BP15 is one of the cornerstones of the profile practice, which is to reuse vocabularies, in particular standardized ones. One would need to define "standardized" in this context, but perhaps a better solution is to define the qualities of preferred vocabularies: have a stable URI; are supported by an organization or community; (more??).

The BPs also recommend that data be provided in multiple formats, and this is a good recommendation for profiles as well. If we take the point of view that profiles, like DCAT, have an abstract essence that can be made manifest in more than one way, we already have a good basis for satisfaction of BP14. (This will bring up the question that has already come up in the context of DCAT and conneg - are all of the forms equivalent? We may not be able to resolve that question.)

The recommendation in BP16 is to choose the right formalization level. This is a useful recommendation for all data and metadata, and would naturally apply to profiles. Profiles should be suited to the tasks they are designed to support; not less nor more. In particular we should caution against overly strict use of constraints, which then make it harder for others to either make direct use of the profile or to create a profile of the profile where their needs vary only a small amount.

Naturally, profiles should be published in standard data formats (BP12). It should also be stated that profiles should be published in and make use of technologies that are appropriate to the community which is expected to use them. A profile using RDF and OWL will not well serve a community that has only an XML/XSD-based skill set. This also relates to the above recommendation relating to providing data in multiple formats. In many communities the skills and data history can vary, so providing profiles with as many as possible of the commonly used technologies will increase the utility of the profile (as well as the profiled instance data).

Because profiles are intended to convey information both within a community and at times between communities, wide use would be facilitated by BP13, which is that the profile should use locale-neutral data representations where possible. Some data communities have deep and historical practices that use terminology that is specific to the community. The creation of a profile is an opportunity to transform that practice to widely known standards.

Ideally, the profile would have a management cycle for maintenance and updates. This should involve the community of users, as noted in BP29 (Gather feedback from data consumers) and BP30 (make feedback available). The strength and value of a profile will depend on the involvement of the community of users.

The aspect of the management cycle that is often ignored is that of the de-commissioning of datasets or of superceeded versions. For users it is key that previously used identifiers always point to a useful document or message. (BP27 preserve identifiers).

-- 
GitHub Notification of comment by kcoyle
Please view or discuss this issue at https://github.com/w3c/dxwg/issues/450#issuecomment-432644299 using your GitHub account

Received on Wednesday, 24 October 2018 12:58:07 UTC