The MusicXML challenge

Please excuse this very lengthy posting.  I have put a great deal of
thought into its content in the years since my first contact with MusicXML
in 2006, and this group is finally the forum in which I trust it will be
received thoughtfully as well.

To give some background for this posting, I've attached a summary of my
relevant experience and qualifications at the end.

Agenda
------

What I would like to see this group accomplish is to develop at least one
standard for digital score representation that will have the quality,
success, and longevity of the specifications / standards for other digital
document representations such as PostScript, PDF, HTML, and SVG.  (I will
use "specification" and "standard" interchangeably, because high quality
of either one demands high quality of the other, regardless of whether the
standard is de jure, as for SVG and HTML, or de facto, as for PostScript;
and of course the best de facto standards become de jure over time.)

In my opinion, a high-quality standard must meet the following criteria,
which apply to the other high quality digital format specifications that
I've examined:

    * It must provide a mechanically executable, unambiguous test for
syntactic conformance that consumers can and should implement.

    * It must provide a mechanically checkable, unambiguous test for
semantic conformance that consumers can and should implement.  This
includes not only type and range specifications for all individual data
items, but clear validity criteria for all relationships between items and
structures.

    * It must define the *meaning* of every construct that passes the above
two tests.  If a construct does not have visual or semantic meaning, such
as a line with only one end, it must not be allowed.

* It must not be bent to cater to implementation bugs.  However,
non-conformances in leading implementations should be documented in an
annex, to help embarrass implementors into fixing them.

In my opinion, these goals have been achieved well for PostScript, PDF,
HTML, and SVG, as well as for many simpler formats such as Standard MIDI
files, and also for mup (www.arkkra.com), the score language that I use
for my own work as a composer.

If a standard or specification leads to an endless stream of questions
  from consumer implementors about what some construct means, or whether
some construct is valid, that is, in my opinion, a sign of poor quality.
Likewise, widespread interoperability difficulties are an indication of
quality problems in the specification.  Some degree of vigilance, and an
independent effort to monitor producer implementation quality, are needed
to keep increasingly sloppy producer implementations from gaining ground
(since, as for all data formats, consumers should always make reasonable
efforts to compensate for sloppy producers), but I believe these costs
have a tremendous payoff in long-lived usability of both software and
(more importantly) human-produced data.

I cannot emphasize too strongly that achieving these goals is not simply a
matter of "writing a specification."  Whether they can be achieved at all,
or with a reasonable amount of effort, depends greatly on the structure of
what is being specified.  I believe strongly that the degree to which a
design can be specified clearly, succinctly, and fully is an important
measure of the quality of the design itself.

MusicXML is an obvious starting point for such standard, and because it
has made such tremendous practical contributions to digital music and has
been adopted so widely, I believe it is worth the very considerable effort
that will be required to address its problems.

The remainder of this posting is intended to lay out what I think are the
major mindset and technical issues that need to be addressed in MusicXML,
and to suggest how they might be addressed.

Mindset issues
--------------

Several mindset issues stand in the way of evolving MusicXML to a high
quality standard.

The first, and most serious, concern is the misapplication of the concept
of "selective encoding."  Of course MusicXML producers are free to encode
only those aspects of the score that they wish, and of course they can
only encode those that MusicXML can represent.  However, this concept has
also been used to legitimize the idea that MusicXML consumers are free to
ignore data ad lib even if their own semantic model can express what that
data encodes.  That is not "selective" anything: it is a bug, or at best,
a missing feature.  I encountered this personally with respect to Finale
and the handling of margins: Finale's own model of margins is rich enough
to handle most (perhaps all) of what MusicXML can encode, but it simply
ignored margins in imported MusicXML data, and when I reported this, it
was shrugged off with an implicit appeal to "selective encoding."

Of equal concern is the special pleading that "music is so complex that a
score format cannot be specified completely or rigorously."  I think this
reflects a deep misunderstanding of the nature of both score formats and
specifications.  MusicXML is first of all (1) a format for representing
printed scores, and (2) fundamentally oriented towards semantics, like
HTML, rather than towards page description, like PostScript and PDF (more
on this issue below).  While no semantic specification should attempt
describe in detail *how an engraver must render* a score, I have seen no
evidence that it cannot have clarity and completeness about the *semantic
and general visual relationships* of the elements it names.  The
specificational problems I see in MusicXML, of which the most serious are
discussed below, arise not from the nature of score notation but from
inadequately considering specifiability and meaningfulness in the design.

A further concern is the tolerance of undocumented and untagged
additions.  In an XML-based standard, I can think of four opportunities
for producers to pollute the standard by addition (and have probably
missed a few more): new contexts for existing element types, new element
types, new attributes for existing element types, and new values for
attributes or CDATA that have a specified range (including, in particular,
those that only take on an enumerated list of possible values).  A
standard that is intended to be interoperable should, in general, state
clearly that none of these are allowable: while consumers may tolerate
them (as consumers should tolerate non-conformances in general), they
should be flagged as non-conforming and reported to the user with an
encouragement to report them to the producer.  To what extent a young
specification such as a cleaned-up MusicXML should allow for unsanctioned
additions in limited, designated contexts is a legitimate subject for disc!
   ussion, b
   ut the starting point should be not to allow them anywhere.

The final concern is the embodiment in the standard of tweaks to
compensate for bugs in the two leading implementations.  The ones I know
about are tolerance of Sibelius's failure to produce both "alter" and
"accidental" elements and to terminate beams properly.  These are bugs,
and they should not distort a standard.

I hope that any W3C work on MusicXML will address these issues thoroughly.

Technical issues
----------------

There are likewise a few technical issues in the MusicXML design so
serious that they must be considered for fixing before proceeding.

The most serious issue is the relationship between flowed-location and
fixed-location constructs, which manifests the tension between a semantic
format like HTML and a concrete format like PDF.  The first release of
MusicXML was essentially the Humdrum format re-cast into XML syntax, and
was completely semantic -- it had no constructs that referred to putting
marks at specific positions on a page.  The page-level elements added
later, in contrast, are completely concrete: unlike all other score
formats known to me, MusicXML has no way to express the basic semantic
concept of "this happens on all / odd / even pages."  This, in turn, means
that in order to have any non-flowed page-level marks at all (such as page
numbers), a MusicXML file must indicate all page breaks explicitly --
destroying the ability of a reasonable consumer to reflow pages for
smaller screens, for example.  Even worse, MusicXML can express fixed
placement of individual notes and auxiliary marks.  As a res!
   ult, prod
   ucers have to choose between what amount to two completely different
dialects of MusicXML: one that is HTML-like, in which these elements are
placed only relative to other elements or to their "default"
engraver-chosen positions, and one that is PDF-like, in which *every*
element is given a fixed location and reflowing is impossible.  If the two
are mixed to the slightest degree, the result will render correctly only
with engravers whose placement algorithms are identical to those of the
original producer, making a mockery of interoperability.

As a practical instance of this problem, an earlier posting in this
discussion observed that users (at least of Finale) inadvertently use
fixed positioning for elements that should be coupled to flowed material,
leading to interoperability problems.  I have seen this myself in Finale
files where lyrics have been entered with fixed positions on the page
rather than linked to the music.  To the extent that Finale makes this
easy or natural, I would consider it poor design in Finale's user
interface.

Fixing this problem in MusicXML will require a concerted redesign to
clearly separate flowed and fixed material.  One part of this must be to
introduce all / odd / even page formatting, including a way to indicate
insertion of page numbers and of sequential page numbering in a variety of
forms.  However, fixing the user interface of Finale (I don't know whether
Sibelius or other widely used score creators have the same issue) is
beyond the scope of this discussion.

Nearly as serious as the mismatch of flowed and fixed elements are the two
constructs that create friction between file text order and time sequence:
<chord/> and <backup>/<forward>.

<chord/> raises two questions: what elements can intervene between the
notes of a chord?  What attributes and child elements must be the same
between the notes of a chord, and what may be different?  The answers
create interactions between <chord/> and many other elements of the
specification -- a red flag for a design.

The obvious fix for <chord/> is to replace this element with a <chord>
element at the measure level whose children are the notes of the chord,
and to consider carefully which of the current attributes and elements of
notes should be associated only with chords, only with notes, or (in as
few cases as possible) with either.

<backup> and <forward> are much worse: they interact with *every* element
that refers to the conceptual flow of time in the score *or* settings such
as margins that may change in the course of the file.  For every such
element, the specification must state whether "before" and "after" refer
to the time sequence as modified by backup/forward, or to the sequential
position of the element in the file.  Again, a very large red design flag.

Removing backup/forward requires finding another way to deal with multiple
time flows (not just voices) within a part.  A good simple approach would
be to require every note, chord, rest, slur start/end, cresc/dim
start/end, etc. to specify an explicit starting time within the measure,
subject to a constraint that time cannot flow backward (the start time of
each element must not be less than the start time of the previous one);
but there may be other equally good simple approaches that meet the
essential criterion of no disconnect between time order and file order.
Even with the explicit approach, the following simple abbreviation rule
would allow eliding nearly all starting times: the default starting time
of each element is the default starting time of the previous element (in
the same part and measure) tagged with the same voice number, plus the
duration of that element, or 0 if it is the first element in that part and
voice in that measure; if not tagged with a voice n!
   umber, it
    is the starting time plus duration of the previous element, or 0 at the
start of the measure.  However, the underlying model would still be one of
explicit tagging.

On another topic, perhaps it is obvious from the introduction, but the
specification must define, for every construct that uses start...stop
tags, exactly what the constraints are on what elements can participate in
the construct (limited to same measure? same part? same voice? adjacent
notes/chords? etc.), and of course must require semantic well-formedness
(every "start" must have a "stop" and vice versa, "continue" may only
occur between "start" and "stop", etc.).

Finally and least seriously, there are a large number of different
contexts in which text and/or symbol strings can appear in MusicXML files,
and each one of them has different restrictions on what attributes can
appear with the strings.  Here is the table I derived from the MusicXML
3.0 DTD (please view this in a fixed-pitch font):

        %position = default-x default-y relative-x relative-y
        %font = font-family font-style font-size font-weight
        %text-decoration = underline overline line-through

            direction         dynamics    mn.beats    lyr.text
                    reh.  words      m'nome      lyrics      lyr.el/ex
        placement        X                X                 X
        %position              X     X    X     X           X
        %font                  X     X    X     X     X           X       X
        color                  X     X    X     X           X     X       X
        %text-decoration       X     X                            X
        justify                      X                      X
        h/valign                     X
        rot,dir                X     X                            X
        letter-spacing         *     X                            X
        line-height                  X
        enclosure              X     X

        * = in MusicXML 3.0 only

While the above covers all of the larger issues, I ran into a number of
other concrete issues with the existing MusicXML definition when writing
my own MusicXML producer and consumer software.  I posted many of them on
the then-existing MusicXML discussion list and received answers to almost
all of those, but there are a few others documented only in comments in my
code.  I will be happy to provide any W3C committee with the complete list.

Recommendations
---------------

As should be clear from the above observations, I believe all existing
versions of MusicXML have issues, both with respect to mindset and with
respect to specific technical issues, so significant that no effort should
be devoted to trying to write specifications for them, even in the short
term.

I advocate strongly that our efforts be directed first to updating
MusicXML into a design that does have the potential of meeting the
criteria for a high quality standard / specification.  As a "straw man," I
suggest that this be the responsibility of a committee consisting of the
following people:

    * Someone who has at least as much experience as I do in reading and
writing high-quality specifications and standards, preferably someone with
a strong musical background.  (I personally know one person other than
myself who would fit this profile well, and I'm sure some other members of
this discussion would qualify even better.)

    * Michael Good, as originator of MusicXML and also to represent Finale's
MusicXML import and export implementation.

    * Three software developers who have implemented MusicXML import and
export functions for score applications and who are not connected with
MakeMusic or Finale, one of whom should be from the Sibelius team.
Ideally, they should represent those applications that implement the
largest fraction of MusicXML constructs.

* A person tasked with writing a fully automatic converter from all
existing MusicXML formats to the new one.  Doing this in parallel with the
design discussion should greatly increase the chances of identifying
omissions and unclarities in the existing designs.

Avoiding "second system syndrome" in this effort is of great importance,
as is producing a result within a reasonable amount of time.  The effort
should take as its goals:

    * Producing a design that is documented by a high-quality specification
(as defined above).  The design itself, the specification document, and a
mechanical converter should be developed together.

    * Fixing all of the identified *significant* problems with the current
MusicXML design, including but not limited to those listed above.

    * Not adding any other functionality unless it "falls out" of what
should be a simpler and more orderly design.  (See above re text
attributes.)

    * Changing as little else as possible, to reduce the effort of creating
updated importers and exporters.

Again, this is a "straw man" in its details, but its motivation -- to
clean up MusicXML before putting any effort into documenting or specifying
it further -- is the main thesis of this long posting.  I hope any W3C
work on MusicXML will consider this thesis thoughtfully.

Conclusion
----------

MusicXML has been tremendously successful as a "first good enough" digital
score representation for widespread use.  We are at a turning point where
we have the opportunity to take it to a new level of quality that would
give it a much better chance of taking a place among the best multi-decade
and eventually de jure successful standards.  Please, let's not lose the
opportunity.

================================================================

*Annex: Qualifications and experience*

I believe I have an unusually broad combination of experience and
qualifications in this group: as a reader and writer of careful data
specifications, as a system designer and implementor, as a student of
digital score representations, and as a composer.

    * I was the primary author of the RFCs (reference specifications) for
DEFLATE compression (used in zip and gzip) and the gzip file format.  I
was one of the very few reviewers of the PostScript and PDF reference
documentation outside Adobe.  I was also a reviewer for the Java Language
and Java Virtual Machine Specifications.

    * I was the primary author of Ghostscript; the co-author of a seminal
paper on just-in-time compilation, as well as the architect and primary
implementor of the original just-in-time compiler for Smalltalk-80; and a
co-recipient of the ACM Software System Award for my work on Interlisp.

    * I have studied the syntax and semantics of MusicXML, Finale, Sibelius,
and mup in careful detail.  I have written software that to a substantial
degree converts between all of these formats, limited mostly by my
available time, by the issues I have found in MusicXML's specification and
semantics, and by the deliberate efforts of Sibelius (and, since 2014,
Finale) to lock up their data formats.  In 2010, I wrote a graduate-school
paper roughly comparing the four formats.

    * I have a Music M.A. (composition) from Cal State Hayward, studying
with Frank La Rocca.  I have been a reasonably serious composer since
2003, including three small commissions for choral works and half a dozen
performances of instrumental chamber music on San Francisco Bay Area
NACUSA concerts.

================================================================

Received on Tuesday, 20 October 2015 07:26:52 UTC