- From: Glenn Linderman <v+smufl@g.nevcal.com>
- Date: Wed, 29 Mar 2017 12:02:26 -0700
- To: public-music-notation-contrib@w3.org
Hi Joe, Chris, Jeremy, James, and all others with interest in this topic, I agree with the analysis that there are 3 potential approaches. I'll reference them by the letters in the email that outlined them. A. This approach is probably simplest if a single computing element/program/algorithm is to produce both a graphical display and sound with some sort of synchronized motion of the score, a visible cursor, or both, as well producing sounds that correspond to the motion, for a particular instance of a synthesized performance, according to a particular interpretation of the score. B. The example in this approach tightly couples a particular graphical score with a particular audio performance. If both were produced from a common MNX using particular performance parameters, it could work well. If the references were not to specific files and times, unlike in the example, but rather to a generic title and to an abstract sequence progression (I'm being careful not to use the word time, there, because the sequence of seconds is based on that cesium crystal), then a single score could be synchronized with any performance that also includes references to the same abstract sequence progression, as well as some particular correlation between the abstract sequence progression and time. C. This approach allows any performance of a score to be synchronized with the same score, even if different performance parameters were used. Again, the example is somewhat limited to a specific score file and a specific performance file, by naming them, and using time references. The movie and broadcast industries have been grappling with this problem for years, not using using abstract sequence progressions, but attempting to capture exact times using SMPTE time codes, synchronized clock signals, and other devices. The known difficulties in performing such synchronizations are summarized at this link: https://www.bhphotovideo.com/explora/video/tips-and-solutions/timecode-versus-sync-how-they-differ-and-why-it-matters For the movie and broadcast industries, there is only a single, specific performance of concern: for the movies, that created by the director; and for broadcast, that created by real-time recording (often by multiple recording devices). Hence something like SMPTE, based on real time, not on an abstract sequence progression, is appropriate... but they still grapple with the issues described in the link. For music, there can be many performances: a particular generated MIDI file is just one. Each human performance may vary, either due to errors or preferences in interpretation of ambiguous elements (a fermata is a somewhat ambiguous element, for example, as well as tempo directives: rit., etc.) For synthesized performances, the ambiguous elements are interpreted according to some particular, encoded set of rules. For live performances, the ambiguous elements are interpreted according to some particular director, or rules agreed upon by the performer(s). To correlate recorded live performances to a score, the recording would have to be annotated with the abstract sequence progression used by the score, either manually, or by some sophisticated, possibly AI, audio interpretation algorithms. By using an abstract sequence progression, a single score and MIDI file could be used, together with various algorithms that might peek at the score, and interpret its ambiguous elements according to various rules, to produce a variety of performances: a different playback for each set of rules. By coding recorded performances to the abstract sequence progression (the rules here would translate between time and the abstract sequence progression, which would potentially be somewhat different than the rules used by a synthesized performance, which peeks at the score), the same score could be used with a variety of either synthesized or recorded performances. Where should the rules be encoded? Only approach C provides a separate mapping file in which the rules can be encoded. While approach B can also support abstract sequence progressions, it does not provide for rules mapping to time to the abstract sequence progression. An approach B that uses score peeking and a fixed set of rules for interpretation of ambiguous elements could eliminate the need for a mapping file containing the rules, but would be far less flexible than approach C, where the rules can be modified. Approach A seems to be a subset... it has no rules, except those encoded into the playback program, and while it is easily synchronized between score and synthesized performance, it doesn't allow the score to be synchronized with recorded performances. I should note that I realize that rules "encoded into the playback program" mentioned for approaches A & B can potentially be altered by users, although I mention a "fixed set of rules": But as soon as you also allow those users that alter the settings to save them... where? in a "configuration" file, no doubt... which corresponds to a completely separate mapping file, as in approach C... but a configuration file would likely be less standardized than a mapping file promulgated by a standards body. I should also note that my references to "score peeking" is probably more easily implemented based on peeking into the MNX rather than peeking into the SVG, unless there are rules for making "music SVG" files that appropriately expose the ambiguous elements to allow the rules to be easily applied. For all the above reasons, I see C as a clear winner among the approaches listed. On 3/29/2017 9:47 AM, Jeremy Sawruk wrote: > Hi Joe, > After reading your email, I personally like approach B, because it > offers flexibility to synchronize the score to multiple media formats > (even video), not just MIDI. > > That being said, I also strongly agree that this area is not a priority > at the moment. Getting the initial CWMN semantics down should be our > focus; it's complex enough! > > Thank you for all your work. I am really extremely pleased with how this > process is going. Keep up the good work! > > J. Sawruk > > On Wed, Mar 29, 2017 at 11:58 AM, Joe Berkovitz <joe@noteflight.com > <mailto:joe@noteflight.com>> wrote: > > Hi all, > > I think this is a good moment to talk in more detail about the > potential uses of SVG in the work of this group since we've had a > fair bit of activity on that topic. > > To that end, I've just had a very useful exchange with W3C's Chris > Lilley, who is copied on this email. Chris is a computer scientist > who originally chaired the W3C SVG Working Group when it began in > 1998, and saw SVG through all the way from an abstract idea to its > modern realization. So it's fair to say he's been observing and > thinking about its uses for a very long time, with an expert's > perspective. Chris is also a member of the W3C Web Audio Working > Group which is responsible for both audio and MIDI standards on the Web. > > What I'll present here is my current thinking, informed by some > thoughtful points made by Chris -- who I'm encouraging to jump in > with his own words. > > Let me say first that I see at least two potential uses for SVG in > our community group, and they seem to harmonize perfectly: > > 1. SVG (with some relationship to sound) can represent musical > scores with arbitrary visual and sonic content. Thanks to James > Ingram for highlighting this particular case to the group. > > 2. SVG (with some relationship to sound) can serve as an > intermediate format that represents *a specific rendering* of > semantic music notation markup such as MNX, MusicXML, MEI, or > anything else. > > So far a lot of discussion has revolved around #1, but #2 is at > least as significant. Consider that it permits semantic markup to be > converted into a format that can be easily displayed and played back > by an application that is much simpler than one that processes > semantic markup. Of course, that format is necessarily quite limited > in other ways (try transposing it to another key, or reformatting it > for a different display size!) But, as a final-delivery mechanism, > #2 seems to have a lot of merit. It could provide a standardized way > to package the output of notation renderers, regardless of what > markup language they use. In fact, MathML (a semantic markup > language for math formulas) is routinely converted to SVG by various > standard libraries for exactly this purpose. > > Now: I believe we don't need to get into a big debate about which > use is more important. They both are. Also, in neither case do they > eliminate our need for a semantic markup language within the > confines of some known musical idiom, so there's no need to stop > that train from leaving the station. MNX explicitly makes room for > graphical encodings to be placed within it. > > Relative to SVG, then, the key question is: What's the best way to > augment an SVG document with information on sonic performance? There > are multiple ways to do it. Chris and I discussed several very > high-level approaches: > > A. Intersperse performance data (e.g. individual MIDI or note > events) throughout an SVG file. James's proposed approach follows > this pattern: MIDI codes are sprinkled directly into the file and > attached to graphical elements. One could also use a different means > to specify events, say like the way that MNX proposes. > > B. Intersperse *references* to a separate performance file (e.g. a > Standard MIDI file, MP3 audio file) throughout an SVG file. In this > approach, SVG elements are tagged with simpler data that "points to" > some time or time range within a separate document. MEI uses this > technique in places. Example: > <measure sound-ref="ThisPiece.midi" sound-start="1:22" > sound-end="1:24">... > > C. Create a completely separate mapping file that identifies a > correspondence between SVG and a performance file. Such a file might > contain entries like this: > <measure-mapping graphics-ref="ThisPiece.svg#m21" > sound-ref="ThisPiece.midi" sound-start="1:22" sound-end="1:24"/> > > I do not think there is a clear winner among these, and I don't > think we should immediately get into the weeds. The next step in > this discussion -- when we have it -- is to look at the pros and > cons of these various approaches for uniting graphical and sonic > information. Each has advantages and disadvantages, and they need to > be brought to the surface in what will be a lengthy discussion of > its own. All of the above techniques have been tried in different > contexts and there are definite lessons to be learned. > > As a corollary: let's stop debating the importance of pure graphical > music encoding. There is no need for a debate: we agree that it *is* > important. However, its role and its structure do need not to be > settled in advance of our work to move ahead on CWMN semantic music > encoding. We will need time to tease out the answers to the > questions raised above. > > Finally a word on Frankfurt: the co-chairs plan to devote a limited > period of time to discussing this topic, but it will certainly be > smaller than many would like (myself included). We are limited by > the other big things on the agenda. But, in truth, most of the good > stuff in standards groups, happens on email and Github over time, > not in large in-person meetings! > > Best, > . . . . . ...Joe > > Joe Berkovitz > Founder > Noteflight LLC > > 49R Day Street > Somerville MA 02144 > USA > > "Bring music to life" > www.noteflight.com <http://www.noteflight.com> > >
Received on Wednesday, 29 March 2017 19:03:14 UTC