Re: Graphical Scores and SVG

Hi Joe, Chris, Jeremy, James, and all others with interest in this topic,

I agree with the analysis that there are 3 potential approaches. I'll 
reference them by the letters in the email that outlined them.

A. This approach is probably simplest if a single computing 
element/program/algorithm is to produce both a graphical display and 
sound with some sort of synchronized motion of the score, a visible 
cursor, or both, as well producing sounds that correspond to the motion, 
for a particular instance of a synthesized performance, according to a 
particular interpretation of the score.

B. The example in this approach tightly couples a particular graphical 
score with a particular audio performance. If both were produced from a 
common MNX using particular performance parameters, it could work well.
If the references were not to specific files and times, unlike in the 
example, but rather to a generic title and to an abstract sequence 
progression (I'm being careful not to use the word time, there, because 
the sequence of seconds is based on that cesium crystal), then a single 
score could be synchronized with any performance that also includes 
references to the same abstract sequence progression, as well as some 
particular correlation between the abstract sequence progression and time.

C. This approach allows any performance of a score to be synchronized 
with the same score, even if different performance parameters were used. 
Again, the example is somewhat limited to a specific score file and a 
specific performance file, by naming them, and using time references.

The movie and broadcast industries have been grappling with this problem 
for years, not using using abstract sequence progressions, but 
attempting to capture exact times using SMPTE time codes, synchronized 
clock signals, and other devices. The known difficulties in performing 
such synchronizations are summarized at this link: 
https://www.bhphotovideo.com/explora/video/tips-and-solutions/timecode-versus-sync-how-they-differ-and-why-it-matters

For the movie and broadcast industries, there is only a single, specific 
performance of concern: for the movies, that created by the director; 
and for broadcast, that created by real-time recording (often by 
multiple recording devices). Hence something like SMPTE, based on real 
time, not on an abstract sequence progression, is appropriate... but 
they still grapple with the issues described in the link.

For music, there can be many performances: a particular generated MIDI 
file is just one. Each human performance may vary, either due to errors 
or preferences in interpretation of ambiguous elements (a fermata is a 
somewhat ambiguous element, for example, as well as tempo directives: 
rit., etc.)

For synthesized performances, the ambiguous elements are interpreted 
according to some particular, encoded set of rules.

For live performances, the ambiguous elements are interpreted according 
to some particular director, or rules agreed upon by the performer(s).

To correlate recorded live performances to a score, the recording would 
have to be annotated with the abstract sequence progression used by the 
score, either manually, or by some sophisticated, possibly AI, audio 
interpretation algorithms.

By using an abstract sequence progression, a single score and MIDI file 
could be used, together with various algorithms that might peek at the 
score, and interpret its ambiguous elements according to various rules, 
to produce a variety of performances: a different playback for each set 
of rules.

By coding recorded performances to the abstract sequence progression 
(the rules here would translate between time and the abstract sequence 
progression, which would potentially be somewhat different than the 
rules used by a synthesized performance, which peeks at the score), the 
same score could be used with a variety of either synthesized or 
recorded performances.

Where should the rules be encoded? Only approach C provides a separate 
mapping file in which the rules can be encoded.  While approach B can 
also support abstract sequence progressions, it does not provide for 
rules mapping to time to the abstract sequence progression.  An approach 
B that uses score peeking and a fixed set of rules for interpretation of 
ambiguous elements could eliminate the need for a mapping file 
containing the rules, but would be far less flexible than approach C, 
where the rules can be modified.

Approach A seems to be a subset... it has no rules, except those encoded 
into the playback program, and while it is easily synchronized between 
score and synthesized performance, it doesn't allow the score to be 
synchronized with recorded performances.

I should note that I realize that rules "encoded into the playback 
program" mentioned for approaches A & B can potentially be altered by 
users, although I mention a "fixed set of rules": But as soon as you 
also allow those users that alter the settings to save them... where? in 
a "configuration" file, no doubt... which corresponds to a completely 
separate mapping file, as in approach C... but a configuration file 
would likely be less standardized than a mapping file promulgated by a 
standards body.

I should also note that my references to "score peeking" is probably 
more easily implemented based on peeking into the MNX rather than 
peeking into the SVG, unless there are rules for making "music SVG" 
files that appropriately expose the ambiguous elements to allow the 
rules to be easily applied.

For all the above reasons, I see C as a clear winner among the 
approaches listed.



On 3/29/2017 9:47 AM, Jeremy Sawruk wrote:
> Hi Joe,
> After reading your email, I personally like approach B, because it
> offers flexibility to synchronize the score to multiple media formats
> (even video), not just MIDI.
>
> That being said, I also strongly agree that this area is not a priority
> at the moment. Getting the initial CWMN semantics down should be our
> focus; it's complex enough!
>
> Thank you for all your work. I am really extremely pleased with how this
> process is going. Keep up the good work!
>
> J. Sawruk
>
> On Wed, Mar 29, 2017 at 11:58 AM, Joe Berkovitz <joe@noteflight.com
> <mailto:joe@noteflight.com>> wrote:
>
>     Hi all,
>
>     I think this is a good moment to talk in more detail about the
>     potential uses of SVG in the work of this group since we've had a
>     fair bit of activity on that topic.
>
>     To that end, I've just had a very useful exchange with W3C's Chris
>     Lilley, who is copied on this email. Chris is a computer scientist
>     who originally chaired the W3C SVG Working Group when it began in
>     1998, and saw SVG through all the way from an abstract idea to its
>     modern realization. So it's fair to say he's been observing and
>     thinking about its uses for a very long time, with an expert's
>     perspective. Chris is also a member of the W3C Web Audio Working
>     Group which is responsible for both audio and MIDI standards on the Web.
>
>     What I'll present here is my current thinking, informed by some
>     thoughtful points made by Chris -- who I'm encouraging to jump in
>     with his own words.
>
>     Let me say first that I see at least two potential uses for SVG in
>     our community group, and they seem to harmonize perfectly:
>
>     1. SVG (with some relationship to sound) can represent musical
>     scores with arbitrary visual and sonic content. Thanks to James
>     Ingram for highlighting this particular case to the group.
>
>     2. SVG (with some relationship to sound) can serve as an
>     intermediate format that represents *a specific rendering* of
>     semantic music notation markup such as MNX, MusicXML, MEI, or
>     anything else.
>
>     So far a lot of discussion has revolved around #1, but #2 is at
>     least as significant. Consider that it permits semantic markup to be
>     converted into a format that can be easily displayed and played back
>     by an application that is much simpler than one that processes
>     semantic markup. Of course, that format is necessarily quite limited
>     in other ways (try transposing it to another key, or reformatting it
>     for a different display size!)  But, as a final-delivery mechanism,
>     #2 seems to have a lot of merit. It could provide a standardized way
>     to package the output of notation renderers, regardless of what
>     markup language they use. In fact, MathML (a semantic markup
>     language for math formulas) is routinely converted to SVG by various
>     standard libraries for exactly this purpose.
>
>     Now: I believe we don't need to get into a big debate about which
>     use is more important. They both are. Also, in neither case do they
>     eliminate our need for a semantic markup language within the
>     confines of some known musical idiom, so there's no need to stop
>     that train from leaving the station. MNX explicitly makes room for
>     graphical encodings to be placed within it.
>
>     Relative to SVG, then, the key question is: What's the best way to
>     augment an SVG document with information on sonic performance? There
>     are multiple ways to do it. Chris and I discussed several very
>     high-level approaches:
>
>     A. Intersperse performance data (e.g. individual MIDI or note
>     events) throughout an SVG file. James's proposed approach follows
>     this pattern: MIDI codes are sprinkled directly into the file and
>     attached to graphical elements. One could also use a different means
>     to specify events, say like the way that MNX proposes.
>
>     B. Intersperse *references* to a separate performance file (e.g. a
>     Standard MIDI file, MP3 audio file) throughout an SVG file. In this
>     approach, SVG elements are tagged with simpler data that "points to"
>     some time or time range within a separate document. MEI uses this
>     technique in places. Example:
>         <measure sound-ref="ThisPiece.midi" sound-start="1:22"
>     sound-end="1:24">...
>
>     C. Create a completely separate mapping file that identifies a
>     correspondence between SVG and a performance file. Such a file might
>     contain entries like this:
>         <measure-mapping graphics-ref="ThisPiece.svg#m21"
>     sound-ref="ThisPiece.midi" sound-start="1:22" sound-end="1:24"/>
>
>     I do not think there is a clear winner among these, and I don't
>     think we should immediately get into the weeds. The next step in
>     this discussion -- when we have it -- is to look at the pros and
>     cons of these various approaches for uniting graphical and sonic
>     information. Each has advantages and disadvantages, and they need to
>     be brought to the surface in what will be a lengthy discussion of
>     its own. All of the above techniques have been tried in different
>     contexts and there are definite lessons to be learned.
>
>     As a corollary: let's stop debating the importance of pure graphical
>     music encoding. There is no need for a debate: we agree that it *is*
>     important. However, its role and its structure do need not to be
>     settled in advance of our work to move ahead on CWMN semantic music
>     encoding. We will need time to tease out the answers to the
>     questions raised above.
>
>     Finally a word on Frankfurt: the co-chairs plan to devote a limited
>     period of time to discussing this topic, but it will certainly be
>     smaller than many would like (myself included). We are limited by
>     the other big things on the agenda. But, in truth, most of the good
>     stuff in standards groups, happens on email and Github over time,
>     not in large in-person meetings!
>
>     Best,
>     .            .       .    .  . ...Joe
>
>     Joe Berkovitz
>     Founder
>     Noteflight LLC
>
>     49R Day Street
>     Somerville MA 02144
>     USA
>
>     "Bring music to life"
>     www.noteflight.com <http://www.noteflight.com>
>
>

Received on Wednesday, 29 March 2017 19:03:14 UTC