Re: UDEF Representation in RDF from Chris Harding on 2012-05-09 (public-esw-thes@w3.org from May 2012)

From: Chris Harding <c.harding@opengroup.org>
Date: Wed, 9 May 2012 15:09:40 +0100
To: Simon Spero <sesuncedu@gmail.com>
Cc: public-esw-thes@w3.org, "richard.parent" <richard.parent@servicesquebec.gouv.qc.ca>
Message-Id: <761B41B7-730E-4EB9-B461-5493858FD497@opengroup.org>
Hello, Simon -

Thanks for your response - and for taking the trouble to look more than superficially at the UDEF in order to formulate your response. I have responded to the points that you raise below.

Regards,
Chris
++++

Chris Harding
c.harding@opengroup.org



On 8 May 2012, at 21:33, Simon Spero wrote:

> On Tue, May 8, 2012 at 5:04 AM, Chris Harding <c.harding@opengroup.org> wrote:
> 
> We are looking at an approach that would define UDEF object classes and UDEF properties as SKOS concepts, and use SKOS narrower/broader to capture the relation between a parent object class and its children, and between a parent property and its children.
> 
> The initial questions that we have are:
>    - Does this look like a sensible approach?
>    - Should we make the whole of the UDEF a single SKOS concept scheme, or would it be better
>      to have separate concept schemes for object classes and properties?
> 
> I am not entirely sure that using SKOS would necessarily be the most appropriate way of increasing the semantic richness for what seem to be UDEF's target applications. Here are some considerations  that might help you decide whether a purely SKOS based approach is ideal for your needs, or whether RDF(S) + OWL might be a better approach. 
> 
> 	• SKOS was originally developed for representing knowledge organizing systems, not knowledge representation systems; that is, it was designed to represent the relationship between ideas, not between the things that those ideas are about.  A good test to see if SKOS is right for your application is to consider whether in your application there is ever any difference between something being a kind of something else, and something being a part of  something else. 
The UDEF is an index of fields in forms, columns in databases, etc. Each field (or column) corresponds to a concept, and what is entered into a field (or a cell in a column) provides information about a thing that realises the concept. So it is perhaps not exactly about ideas, but it is more about ideas than it is about things.

> 	• SKOS does not have a standardized way of expressing sequences of concepts, for generating concatenated notations, or for expressing restrictions on the types of concepts that can be used to restrict what concepts could follow what other concepts.  This essentially forces UDEF to be a fully enumerated system, which may not be ideal.  
Yes. A UDEF identifier is a concatenation of an object class identifier and a property identifier. In theory, it is this object class/property combination that corresponds best to a SKOS concept. In practice, it is not feasible to enumerate all of the valid  object class/property combinations.

> 	• The example used on the the UDEF CONOP page involves a sample interaction with the DLA (Defense Logistics Agency).  That ( and fact that the overview page is titled "Concept of Operation" :)  suggests that interworking with DoD and other government agencies is an important consideration.  DoD semantic interoperability and federation work is using OWL/RDF as a basis - see these slides by Dennis Wisnosky   (DoD BMA CTO & Chief Architect)  from last week's DoD Enterprise Architecture conference. 
The UDEF is applicable to all areas. It was originally conceived within the defence procurement community, which is why there are so many defence-related terms in the current version. The proportion of defence-related terms has decreased as other areas have come into consideration, and will continue to decrease.

But, IMHO, ability to express UDEF definitions in RDF and OWL is crucial, regardless of the area of application. In my understanding, SKOS is not an alternative to this, but a way of defining the mapping to RDF that could have additional benefits (a) in enabling us to use tools designed for SKOS to work with the UDEF definitions, and (b) potentially in the longer term enabling us to link the UDEF with other SKOS vocabularies. 

> 	• Earlier work at USAF in the EVT under SAF-US(M) revealed that even RDF(S) + OWL was insufficient to capture all useful information for most Communities of Interest;  the weaker semantics of SKOS would presumably be able to capture even less information.  (Interestingly, the acronym EVT stood for "Enterprise Vocabulary Team"; the V was obsolete  even  before the team was stood up).   
The UDEF does not set out to capture all information. Its limited (but useful) objective is to provide an index for data fields, as described above. 

> 	• Where a natural language term has different meanings in different CoIs, it is a very bad idea to try and force the subject matter experts in one or both areas  to use a different term.  Performance level of subject matter experts is degraded to a level close to novices.      
Agreed.
> Taking a look at some of the UDEF definitions seems to suggest that the current definitions do not include much information that could be considered essential for interoperability.
> 
>   For example, in the "identifier" sub tree, we find the following terminal node. 
>   4.35.8 Air-Force.Assigned.Identifier
> 
>          1.4.35.8 United-States.Air-Force.Assigned.Identifier
> 
> Given the number of different USAF identifiers, this is somewhat problematic,  and would seem to be based on an unnamed specific use case.   
> 
See below for more on this example.
> In terms of mapping to RDF(S)/OWL/SKOS,  these properties would seem to imply a hierarchy  - SubPropertyOf in RDFS, SubDataPropertyOf in OWL, broader in SKOS. 
> 
Yes, I think this how it should be interpreted.
>  In terms of  creating interoperable systems, it is hard to understand what the semantics of these properties would be.   
> 
> What would it mean to have a data field tagged "4.35.8"? 
This is only half of a UDEF tag. A field would never be tagged 4.35.8. A field could for example be tagged  
a.o.1_4.35.8 (Military.Aircraft.Asset_Air-Force.Assigned.Identifier) in the case of an identification number of a military aircraft, or c.j.5_4.35.8 (Military.Officer.Person_Air-Force.Assigned.Identifier) in the case of an identification number for its pilot.

> Could it be meaningfully compared to another data field tagged "4.35.8"? 
It is meaningful to compare the full tags of fields, either to distinguish different fields, or to detect that fields are the same. In the example above, it is easy to envisage records that have "Id" fields that might refer to the aircraft or to the pilot. The UDEF makes it easier to put the right data in the right field.

> Could you join two records from different sources using this field?  
With extreme caution. The UDEF is in concept infinitely extensible, and currently very incomplete. There is not currently, for example, a tag for "Canadian.Air-Force.Assigned.Identifier" or for "Australian.Air-Force.Assigned.Identifier so an id for a Canadian aircraft would likely be tagged in the same way as an id for an Australian aircraft. Equating similarly-tagged fields  in a join of Australian and Canadian records would probably not give you a meaningful result, unless the Australian and Canadian air forces happen to have agreed on a uniform identification scheme for aircraft. Even United-States.Air-Force.Assigned.Identifier might not work in a join - I wouldn't be at all surprised if the US Air Force has more than one identification scheme for aircraft.

> Could you join two records, one identified by a  "1.4.35.8", the other by "4.35.8"? 
Only if you were sure that the two sources were using non-overlapping identification schemes - and the UDEF would not give you this assurance.

> 
> Is the  relationship between the fields strictly one of about-ness; everything that is in some way about a United-States.Air-Force.Assigned.Identifier it is in somewhat about an Air-Force.Assigned.Identifier?  
> 
A Military.Aircraft.Asset_United-States.Air-Force.Assigned.Identifier is a Military.Aircraft.Asset_Air-Force.Assigned.Identifier. It is also a Aircraft.Asset_United-States.Air-Force.Assigned.Identifier, an Asset_Identifier, and all the combinations in between.

I don't believe we should think in terms of joins in the classical sense, but rather about deductions to be made from the information contained in different records, that might produce new records containing derived information. (This is a bit like the difference between SPARQL and SQL). UDEF tagging can make a meaningful contribution to such deductions, even if the tags cannot be used to identify fields on which to join records.

> It well be that some many of the unique labels that UDEF has created can be usefully refactored into their semantic components, and that by doing so it becomes easier to create federate systems of systems that work at the enterprise scale and beyond.
A large part of the value of the UDEF is that its hierarchical structure imposes a discipline on the arbitrary combination of components. This makes it easier to use as a practical tool, though it reduces its power of expression. We do however very much appreciate the need to federate vocabularies developed by different communities of experts - and this is a good reason why we should look at SKOS.


> 
>  Simon
>
Received on Wednesday, 9 May 2012 14:10:45 UTC