W3C home > Mailing lists > Public > public-change@w3.org > September 2014

RE: Is abstraction to simple logical units a naive assumption?

From: Dennis E. Hamilton <dennis.hamilton@acm.org>
Date: Sun, 21 Sep 2014 10:28:25 -0700
To: <public-change@w3.org>
Message-ID: <006901cfd5c1$744ecf00$5cec6d00$@acm.org>
Further below my abridged original message is my liaison report to the OASIS ODF Interoperability and Conformance (OIC) TC.  This is a public document, archived at 
<https://lists.oasis-open.org/archives/oic/201308/msg00003.html>.

That report is from one year ago.  I have not participated since (and the OIC TC has closed) and do not know if there has been a change of direction.

 - Dennis

-----Original Message-----
From: Dennis E. Hamilton [mailto:dennis.hamilton@acm.org] 
Sent: Saturday, September 20, 2014 20:45
To: public-change@w3.org
Subject: Is abstraction to simple logical units a naive assumption?

[ ... ]

EFFORTS SO FAR

There are three efforts I know of that have attempted to abstract ODF and OOXML sufficient to provide high-fidelity conversion between the two document-file formats.  While that is not exactly what CTMarkup is about, I think the challenge of common abstraction is relevant.

[ ... ]

Another effort comes from Beijing.  That investigation is attempting to devise a model in which mapping up into abstractions and then down again is the approach.  I have seen only intermediate work and I can't tell what is available to the public.  I trust that this work will lead to another ISO/IEC Technical Report.

[ ... ]

-----Original Message-----
From: oic@lists.oasis-open.org [mailto:oic@lists.oasis-open.org] On Behalf Of Dennis E. Hamilton
Sent: Wednesday, August 7, 2013 14:08
To: 'OIC TC List '
Subject: [oic] FW: [office] JTC1/SC34/WG5 Liaison Report: Measuring Interoperability

FYI.  I failed to mention that another aspect that can be assessed is the degree to which detailed features are supported in a single standard.  That becomes another metric, but probably only useful in conjunction with the measured feature dependency of documents in a particular application profile.

 - Dennis

-----Original Message-----
From: office@lists.oasis-open.org [mailto:office@lists.oasis-open.org] On Behalf Of Dennis E. Hamilton
Sent: Wednesday, August 7, 2013 12:26 PM
To: 'ODF TC List '
Subject: [office] JTC1/SC34/WG5 Liaison Report: Measuring Interoperability

On 2013-06-17 I attended a meeting of ISO/IEC JTC1/SC34 WG5.  

The single topic of conversation was a proposal from China on a Measurement Model for Document Interoperability.

The contribution is by HOU Xia of Beijing Information Science and Technology University (BISTU).

There is the basis for developing a JTC1 Technical Report.  The proposal is to create a New Work Item for the development of the model and its application.  This may be in the category of Exploratory Work rather than a standards-track effort, at least for now.  There will be further discussion at the September SC34 Plenary, although the NWI proposal might not come in until later, possibly early 2014.

The measurement model is being used experimentally at a proof-of-concept level at this time.

CHARACTERISTICS OF THE MEASUREMENT MODEL

The central feature of the measurement model is a hierarchical identification of document (format) features.  The idea is to capture essential features that are carried by document formats.  This is meant to be an abstraction to features that individuals perceive and control in applications that consume, present, and produce the documents.  

The identification of features is meant to be as independent of format specifics as possible, apart from the necessary dependence on the nature of electronic documents and ways that users interact with them.  

That creates the CONCEPTUAL MODEL.  For rich documents such as those supported by OOXML and ODF applications, the detailing of features is an extensive undertaking. 

CONCRETE APPLICATION TO STANDARDS

A key step in the application of the model is to standard formats.  That is, a particular format specification can be analyzed to determine how it supports features in the conceptual model, down to the finest details of the model.  Iterations in the application of the model to standards will lead to refinement of the conceptual model and possible adjustment of the identification of how features are found from one document standard to another.  This will be a substantial effort and it clearly must start at some small level and be refined over time.  

A standard might not reflect a feature or might express a feature in quite different ways than is accomplished using other formats.  This narrows the scope to features that are supported in one or more formats under consideration.   The number of document features is still extensive.

My presumption is that, as feature details in individual standards are rolled up into being reflected in the conceptual model, one will end up with a downward look at the extent to which the detailed conceptual features are supported in a given format.

APPLICATION TO DOCUMENT ANALYSIS AND PROFILING

Analysis of a corpus of documents is proposed to occur mechanically.  By examining documents in a particular format and identifying the features that are expressed in such document, it is possible to estimate the prevalence of conceptual-feature usage.  This can be used to create weights with respect to feature significance in some domain of document creation and use.

APPLICATION TO CROSS-STANDARD INTEROPERABILITY

The identification of features with respect to a standard allows for comparison of the degree to which the overall conceptual feature set is supported in a given specification.  Although one can identify degrees of commonality, and possibly assess some sort of difficulty of feature-preserving translations between formats, there is no particular way to give any weight to such determinations.  

By considering cross-standard interoperability in the context of statistical document profiling, however, it is possible to ascertain the degree to which fidelity can be preserved by translation of features of such documents from one standard format to another.  

It may also be possible and desirable to create document templates that guide users to employment of features that satisfy any requirements there are for cross-format interoperability.

METRICS

The metrics proposed include 

 - conversion difficulty of preserving an individual feature from format A to B.  

 - the relative importance of a feature in format A (as determined statistically)

The crude single measure is a normalized value of the interoperability from format A to format B given the importance of the features in the corpus of format A documents of concern.

The formulas for the normalized measurements have been derived.  

Presumably, one can also factor in an implementation's feature coverage as well, and implementers might be interested in making such determinations with regard to their intended community of software adopters.

PERSONAL OBSERVATIONS

I think it is extremely difficult to analyze specifications to the level that the representation of conceptual features is identified.  There is a tremendous number of details.  This is something that has to be handled by progressive deepening and refinement.  There will also be disagreements, making it necessary to be willing to iterate and also to gain expertise in different ways of accomplishing the same thing in the same and different formats.

The agreement on weightings (difficult of conversion, importance of features, etc.) will be troublesome as well.  The question will be how is this methodology to be applied in a constructive manner that does not create barriers to contribution of expertise by competitors.

 - Dennis
Received on Sunday, 21 September 2014 17:28:52 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:11:23 UTC