Comments on the Requirements Spec

Now that I have completed my analysis of stakeholder needs, I feel I am better positioned to review the requirements proposed. So here are my current thoughts…

> Specification of use cases to provide context and motivate the need for elements of a standard.
Yes, this equates to the Use Case Viewpoint that I am proposing in the management document. I still need to formalize the stakeholder concerns and then document our approach.
> Specify (parts of) standards - specifically data models - at the level of classes and properties.
Yes, but I think we also need to identify the abstract syntax (e.g., an integer with a defined range and units) so that we can develop a logical way to transform among different standards through a common logical definition. In fact, that and the unambiguous traceability to the standards are the only two concerns that I see for this information (which I propose to contain in a Physical Data Model of the Information View)
> This should include the specification of provenance information regarding the origins of the models, including any copyright/licensing information.
I am not sure what you mean by the “origins” of the model - if you simply mean “source”, than I agree that we need to show that each Physical Data Model will represent exactly one standardized way to exchange or use the data and we should cite that standard. If you mean something more extensive (e.g., what gave rise to the development of that standard) I am not convinced that level of detail is needed, but I am willing to listen (as it does arguably relate to the stakeholder concerns related to the evolution of data)
> Question: the definition of a class/property includes a collection of material (e.g. diagrams, axioms, use cases). Should the provenance and licensing information be at the class/property level or optionally for each type of input? Perhaps provenance should be at the class/property level and then licensing can be defined for each element as required?
First, I disagree with your premise. The definition of a class/property does not include use cases. Use cases give rise to the need for data as represented in classes and properties. There are relationships between them but classes and properties are not a collection of use cases (if anything, a use cases could be considered to aggregate data that is required to realize the use case, but as it is not the sole owner of this data we cannot place them in a hierachy).

As such, by its very nature, the use cases should fall under a different part of the website (i.e., the Use Case Viewpoint) and have a many-to-many relationship (“correspondence" in architecture terms) to data; it will therefore need a separate copyright than the data. Further, I do not see any reason to map between use cases and Physical Data Models - we only need to map use cases to the Logical Data Model.

As a general rule, I’m inclined to think that two things will hold true for the content of any single “Model” (i.e., instance of one of the model kinds defined in my management paper):
- For any Model there will be a single “source” - that source will identify the copyright that applies to their provided content
- We may wish to supplement the content provided directly from that source with additional content that we create (e.g., a revised diagram to conform to our conventions, etc); the new content will fall under a W3C copyright

Even when one of our sources cites a 3rd source, we should only have to worry about the source we get information from. So for any one page on our site, I think we can limit content to two copyright statements - the W3C statement and, when needed, the statement for the identified source.

As we develop our website structure, we should consider how this can be best accomplished with minimal confusion. For example, what happens if we want to use a use case specification from a source, but their template only addresses 75% of the fields contained in our template? Do we need a source statement on each field? Do we have two templates (one with one copyright and the other with another)? Or what?
> Enable definition of classes and properties with DL and UML.
> These definitions will be reproduced directly from the standards when available (see 2).
No! Firstly, we need to realize that we have shifted from the Physical Data Model to the Logical Data Model. There is no need for us to include definitions for the Physical Data Model elements on our site; if users are interested in seeing those definitions, they can acquire the source standards. Further, I think it would be unwise to try to capture definitions even for those few SDOs that will provide the content. Adding this to our website only creates a maintenance nightmare. How do we become aware of their updated standards and every change?  The only exception to this that I can think of is if they wanted to use our site as a part of their standard, but I am inclined to say in that case they need to conform to our copyright statement (even if they use a different organizational structure to manage the content).

Re: the Logical Data Model, copying the definitions presents legal issues and technical issues.
1. In a massive number of cases, SDOs will not release copyright in this manner - including ISO (but most any SDO that charges for standards or reserves the right to in the future) - thus, legally, we cannot copy
2. The definitions must represent a consensus based approach. Many of these terms are already defined umpteen times and one of the primary goals of the model should be to create ways that we can translate among definitions. That means our model is necessarily different than any existing standard. Most importantly, it means that the City Data Model reserves the right to hold discussions about any definition and to revise the definitions as determined appropriate by our group. As a base example, most definitions are not written in DL and therefore your statement is self contradictory - we have to modify what is already defined.

IF an external source wishes to release their copyright to us (W3C) - e.g., for the definition clause of data elements - then we can use their text as a starting point; otherwise we can start from scratch. But we have to control the text. If we allow multiple independent groups control the model we will have chaos and will not have achieved anything.

> If such definitions are not specified in the standard, then the definitions will be designed based on the user's interpretation of the standard.
We can only use an existing definition if copyright is released. And even then, I expect our analysis will result in major revisions to the text so I would argue that this is a moot point. 
> Question: do we need to require both DL and UML definitions or is an OWL/DL definition sufficient assuming our objective is to identify a core (ontology) model)?
This needs discussion; I feel like we are using terms to mean different things. The way I see it (and have structured my list of concerns) is that there is a need to define an ontology (a.k.a. Conceptual Data Model) that defines the major business terms. This is also essentially the same as a well-defined Vocabulary per ISO rules (or perhaps even one step beyond). This level is purely semantical and unit-less.

The ontology then serves as the basis to develop a Logical Data Model. The data model defines the data elements that are used in information transfers among components of a system. It does this by defining classes,  properties, abstract syntax of properties, and relationships (among classes) (i.e., a UML class diagram). Most of the classes defined in the Logical Data Model will be entities defined in the ontology. Some classes will be added (e.g., more detailed items that are not deemed to be major business terms but are more modeling level artifacts useful in grouping properties). That allows the ontology to be more concise. Likewise, I am willing to accept that some properties might be represented as key business terms (e.g., speed), but within the ontology we are defining what speed is generically, not its details. This level is primarily semantic but includes the concept of units (i.e., vehicle speed might be in m/s while a snail’s speed might be in mm/hr)

The third major level is the Physical Data Model, which is the subject of the various standards. This is of significance for two reasons:
- We consider the existing standards in the develop of our Logical Data Model and how we define our data
- We need to be able to trace concepts between the Physical and Logical Data Models and document any transformations that need to occur with the data. For example, a Physical Data Model (e.g., existing interface standard) might define vehicle speed in miles per hour - we need to define that there is a formula to translate mph to m/s) as a part of our traceability (a.k.a. correspondence).

If we agree on the above, I think we have three diagrams (which is how I have structured my Information Viewpoint proposal)
- Conceptual Data Model (Ontology): I think a graphical representation is useful but I have not been impressed with what the industry has developed in this respect. I am inclined to favor some variant of UML where each defined entity is a class, but am happy to consider alternatives.

- Logical Data Model: Should be UML class diagram supplemented with state machine diagrams

- Physical Data Model: A UML class diagram would be useful and is perhaps all that is really needed (i.e., I don’t think we need any text, just refer to the source standard)
> Question: can we frame these definitions as ontology design patterns (ODPs) or mini-modules?
I don’t understand this question
> Enable specification of the relationship between model elements and use cases.
Yes, this equates to the correspondence rules in my table
> Enable specification of relationships between classes and properties
Well, yes at the Logical Data Model level, that is what a UML class diagram does
> - within and between standards.
No, not really, source standards are at the Physical Data Model level - they define their own relationships as they are independent bodies; we’ll capture these in UML diagrams but the external groups define them. Our Logical Data Model will define relationships among our classes and properties - we will separately have correspondence rules between our data elements and the data elements contained in various standards 
> Relationships within a standard will typically be part of the (domain) model.
I am guessing that you mean a domain model to be that data that is only used inside of one “domain” of Smart Cities (e.g., part 3 of the WG11 work item related to transport planning). If that is accurate, then there is a serious disconnect. Standards (i.e., source standards such as those that define how vehicles talk to other vehicles to avoid collisions) define their own Physical Data Model. By comparison, a "domain model” is just a subset of the Logical Data Model - i.e., that subset that is limited to a defined domain. The practical application of most interfaces requires data from more than the domain model (i.e., vehicles preventing collisions are likely to need to exchange data about location, which is cross-domain data). Likewise, the logical domain model will define data generically while different standards in Europe and the US will present this information differently. Thus, there is no direct relationship between a domain model and a source standard.

> Relationships across standards will typically describe how the concepts are related.
I do not understand this sentence. Standards (i.e., source (interface) standards) define how to implement interfaces.
> Support for discussion on submitted content: use cases, definitions, and proposed relationships.
> Question: One discussion thread per use case / class / property / definition / relationship or do we want multiple threads to separate discussion on separate topics?
I’d be inclined to say one thread per website page and then we figure out how to ensure that the pages do not become too complex. But roughly speaking each of the items above could be a page. Users should be able to subscribe to updates on a discussion thread or a topic (e.g., ability to subscribe to a use case and all of its linked data elements)
> Question: Do we need a capability to discuss topics that are not easily focussed on one element or do we then just pick the best guess?
If we do one per page and have some summary discussion pages, we are probably in good shape
> Support navigation through and review of submitted content.
And manage levels of review/approval
> Support for some level of sanity checking for input.
By this I assume you mean an automated validation routine; yes that should be good. The manual consensus can be handled via the discussions
> Support for some level of user approval / membership control.
Yes
> Support to enforce agreed-upon workflow (governance process).
Yes
> Support for displaying where the different topics are in the process flow
Yes
> (e.g., using a kanban/project board or similar).
Sure
> Publication of the core model of shared concepts (defined classes and properties).
Huh? Why just the core? What does “publication” mean?
> Question: Does this include support for continued development of the core model?
Huh? I don’t think any of this is ever “done”? We simply reach approved versions, but it remains open for comment and discussion.


Regards,
Ken Vaughn

Trevilon LLC
6606 FM 1488 RD #148-503
Magnolia, TX 77354
+1-936-647-1910
+1-571-331-5670 cell
www.trevilon.com

Received on Friday, 17 July 2020 17:09:45 UTC