Re: Semantically enhanced SVG (proposal)

Hi all,
Thanks to you all for your replies. Sorry for the delay.
I know you are all very busy, and that this mail is inordinately long.
Please take your time before replying (if at all).

My answers are all in-line below, in reverse order of your replies:
Wil Macaulay
Joe Berkovitz
Jeremy Sawruk
Laurent Pugin and Andrew Hankinson
_______________________________
Wil Macaulay said:
> Just to note that abc->SVG already exists, as well.
See:
http://abcnotation.com/software#abc2svg
Yes. That application could also export semantically enhanced SVG! :-)
_______________________________
Joe Berkovitz:

Joe: I think we'll just have to agree to disagree about most of this. 
I've tried to explain why in the following. Rather than arguing 
fruitlessly, we could also just wait and see whose intuitions are correct.

You said:
> This proposal is not new, and the chair's previous invitation to any 
> interested subgroup stands. It's fine if a subgroup of people want to 
> develop a clearer notion of how SVG might be annotated with musical 
> semantic and performance data.
The main introduction to this CG at
https://www.w3.org/community/music-notation/
begins:
> The Music Notation Community Group develops and maintains format and 
> language specifications for notated music used by web, desktop, and 
> mobile applications.
SVG specializations fall into this category, so I think they should be 
added to the list of specifications being developed by members of this 
CG in the paragraph that follows. MusicXML and SMuFL are described there 
as part of the CG's "initial task". For some further insights about why 
I think semantically enhanced SVG is so important, see my replies to 
Jeremy Sawruk, Laurent Pugin and Andrew Hankinson below.

Joe again:
> [...] I do not believe this approach will address most of the use 
> cases that we've already catalogued. It only "future-proofs" the 
> graphical angle of music representation, and will yield documents that 
> are visually rigid and merely reflect a specific, original graphical 
> rendering. This defect is a big problem for CMN use cases [...].
I'm afraid I disagree. I think that having a notation-agnostic format 
for score instantiations would solve a lot of currently insoluble 
problems. Including many of those in our use-cases and some more. 
(Linking music notation editors to DAWS, for example.)
> [...] the fact that Verovio, or Noteflight, or other notation 
> rendering systems actually produce such annotated SVG does not, in 
> itself, argue in favor of adopting SVG as a central construct for 
> music representation.
The fact that all notation editors past and future (and applications 
like Verovio) could export semantically enhanced SVG does indeed mean 
that enhanced SVG becomes an ideal candidate for a common export format. 
Note too, that specific notations are defined by specific applications. 
It is possible both to include and go beyond CWMN. That's extremely 
important for the future development of music notation. This CG can't 
just sit back and assume that music notation suddenly stopped developing 
in 1970.
> Nor does it argue specifically for MEI, MusicXML 3.0, or any of the 
> existing semantic formats that produce such annotations today.
Of course not. Being a specific instantiation of a score, created by 
some application, semantically enhanced SVG is completely independent of 
whether the app uses MEI or, MusicXML 3.0 input, or no file input at all.
> Given the fact that notational semantics are inherently 
> non-tree-structured, I doubt that it is easy to make the rendering 
> include all the original information, though.
I don't think notational semantics are inherently non-tree structured. 
Its no accident that Verovio and I agree so closely about the container 
structure, even though we are coming from completely different directions.
A specific rendering (file instantiation) created by some application 
may not include all the information in some original input file (there 
may not even be an original input file), but I do think that 
semantically enhanced SVG can include all the information that is needed 
to enable a large number of interesting client applications.

All the best,
James

_______________________________
Jeremy Sawruk:

Mr. Sawruk: Again, I think we'll just have to agree to disagree about 
most of what you said. Some of your concerns are covered in my response 
to Joe.
Here, however, are a couple of interesting points:
> [...] I also wonder if it would be possible to preserve enough 
> semantics to convert a semantically enhanced SVG back to MusicXML or 
> MEI. I would be very much in favor of a bi-directional approach to 
> enhance interop.
I think that it would indeed be possible to include all of the semantic 
info from a MusicXML or MEI file in an SVG score instantiation, but that 
it would then include lots of duplicated information. For example, we 
don't need to be explicitly told that a pitch is an E-flat if the 
notation (and possible performance data) are telling us that already. 
Such a format would have a special @data-svgFileTypeID. Note that 
applications could only include MusicXML or MEI information that they 
have actually created or imported, and that they could therefore quite 
easily export separately.

A more likely scenario would be for a semantically enhanced SVG file 
(containing no redundancies) to be treated like the intermediary result 
of an optical character recognition application. If the SVG has a known 
structure (@data-svgFileTypeID), then it would be quite straightforward 
to write an OCR-like application for converting the file to MusicXML or 
MEI. Even if the score didn't start out as a MusicXML or MEI file in the 
first place. :-)

> While I think that some form of semantically enhanced SVG or an SVG 
> with a microformat would be beneficial to some in the CG, I do not 
> feel that now is the appropriate time work on this. Perhaps this 
> should be the work of a subcommittee rather than the work of the 
> entire group? 
If not now, when? :-) I quite understand that many members of this CG 
would prefer not to be bothered by continuous updates about work being 
done on SVG. Probably those who _are_ interested will gather around one 
or two public GitHub repositories. We could think of this forum  
(public-music-notation@w3.org) as a central address to which such groups 
report when they have something to say that may be of general interest. 
As I said to Joe, I think semantically enhanced SVG should be on the 
official agenda of this CG, parallel to MusicXML and SMuFL.

I agree with you that "SeSVG" is not a perfect name for the 
specification. You suggested "Musically Enhanced SVG". I think that's 
too long. How about MusicSVG?

Best wishes,
James Ingram

_______________________________
Laurent Pugin and Andrew Hankinson:

Thanks Andrew and Laurent for your replies.
This response is a bit long, but I think its the only way to let the 
sceptics here know what this is all about. We can continue the 
discussion elsewhere, if you think that would be a good idea.
Constructive comments and corrections are welcome from anyone, of course.

I've read
http://www.terasoft.com.tw/conf/ismir2014/proceedings/T020_221_Paper.pdf
and have been investigating the Verovio website.
The rism-ch midi-player being used at http://www.verovio.org/midi.xhtml 
is especially interesting! :-)

By way of getting us all on the same page, here's a summary of where I 
think we are now.

....................
Stage 1: pure, semantically enhanced SVG, containing no performance 
information.

Laurent said that Verovio uses the following container hierarchy:
<g class="system">
     <g class="staff">
         <g class="layer">
             <g class="chord">
                 <g class="note">
                 </g>
             </g>
         </g>
     </g>
</g>

I'm assuming that there are no redundancies in the files Verovio 
creates: Only graphical (SVG) elements and containers exist and, where 
possible, each object has a @class attribute (inherited somehow from the 
MEI) saying what the object is.

Its not clear to me, whether the above container hierarchy is the same, 
regardless of the MEI specialisation being instantiated. If different 
MEI specialisations result in different container hierarchies, would it 
be possible, in addition, to create a special format that could be 
created from _any_ MEI specialization (and by any other application that 
exports music scores as SVG)?

All SVG files created by Verovio should have a well defined type, 
defined (and documented) as an attribute of their main <svg> element. 
(I'd like to call the type attribute @data-scoreType, not 
@data-svgFileTypeID as in my original posting.).

The reason for having such file types is, of course, that it enables 
interesting client applications to be written. All file types should be 
documented at the URL given in the @data-scoreType attribute, and that 
documentation should be as simple as possible. It should not be 
necessary for someone programming a client application for the universal 
format to have to know all the ins and outs of MEI specializations.

....................
Stage 2: adding synchronized performance information

I think it would be quite easy for Verovio to add temporal information 
to its (purely static graphic) SVG output. Maybe it does already (see 
below).

MIDI and milliseconds are the obvious choice for temporal (performance) 
information. These are not in the SVG standard, so they need a namespace 
of their own. The one I use is called "score":
<svg
     xmlns="http://www.w3.org/2000/svg"
xmlns:score="http://www.../.../verovio/.../performanceInfo.html"
     ...
data-scoreType="http://www.../verovio/.../standardScoreType.html"
     ...
 >
(My own xmlns:score documentation can be found quite easily on my web 
site, but it defines some experimental constructs that would confuse the 
issues here.)
Note that the @data-scoreType could _require_ @xmlns:score to be 
defined. In that case, the Verovio team's score verification software 
could also verify that the information in @xmlns:score was being written 
correctly.

Verovio can calculate the (millisecond) durations and MIDI info for a 
"default performance" from the absolute meanings of the symbols in the 
score.

The synchronization needs to be done at the <g class="chord"> level [1].
Its not necessary to synchronize individual noteheads or other chord 
components, since they have the same temporal position as their 
containing chord.
The chord's logical x-coordinate and its (ms) duration and/or (ms) 
position from the beginning of the score, can be written like this:
      <g class="chord" score:alignmentX="1234.5678" score:duration=567>
or
      <g class="chord" score:alignmentX="1234.5678" score:position=3859>
In my own approach, all the timing and MIDI information is stored in 
special element inside the chord group [2]:
     <g class="chord" score:alignmentX="1234.5678">
         <score:midiChord>
             ...
         </score:midiChord>
         <g class="graphics">
             ...
         </g>
     </g>

In either case, the file's graphics can be edited independently of the 
performance information. Using appropriate software, annotations can be 
added, music examples extracted etc.

Note that (if this is CWMN) graphics for objects such as tuplet brackets 
or augmentation dots will be included in the file, but that they are not 
given any temporal attributes since the durations have already been 
calculated and are stored in the chord symbols. Notations that don't use 
tuplet brackets or augmentation dots for calculating durations can still 
use the same code for performing the score. (The performance software 
only needs to access information inside the score namespace.)

As I understand it, the rism-ch midi-player uses standard MIDI files, 
and supplies callback functions that link chords in the SVG to specific 
times in the MIDI file. The chords must somehow have temporal attributes 
that allow this. The demo application simply changes the colour of the 
noteheads to indicate performance synchronization, so the midi-player 
probably doesn't know where those noteheads are. I prefer to see a 
vertical cursor, so I'd like to know each chord's logical x-position.
The rism-ch midi-player can be found at the following locations:
Demo: http://www.verovio.org/midi.xhtml
GitHub: https://github.com/rism-ch/midi-player.

Note also, that it would be feasible to allow for more than one real 
performance to be synchronized with the same graphics. The "default 
performance", calculated from the absolute meanings of the symbols in 
the score, is only one of many possible real interpretations.
We no longer have to think of real performances as having to differ in 
some undefined way from the default performance. It becomes possible to 
imagine teaching comparative performance practice for particular scores, 
ornament symbols etc. (using _any_ notation) using such files.

Conversely, it also becomes possible to create parallel files in which 
the same real performance(s) is (are) notated using different graphics. 
Verovio can probably create transposed CWMN scores, but other software 
could extract the MIDI info, and synchronize it with completely 
different notations.
My own programs, for example, use a notation that maps durations 
directly to simple duration class symbols (each duration class 
representing an octave of duration), so I could re-notate a Ferneyhough 
score using the MIDI file that Verovio created... I think performance 
practice can only be learned by listening. Scores should only be an 
aide-memoire. Ferneyhough's scores are screaming at us that something 
went wrong with music notation in the 20th century. He's an important 
Artist.

As I said: corrections and constructive comments would be very welcome!

All the best,
James

[1]  Here's a digression about event symbols:
All forms of writing have a graphic container structure that just sits 
on the page, outside time: Strictly speaking, we read chunks of 
information (symbols). We do not read in one direction parallel to 
absolute time, like an analog tape-recorder. The lowest level symbols in 
such a structure are those that we can read "at a glance". In ordinary 
text, these are words (not characters). In CWMN, these are chord symbols 
(not notes or noteheads). In music notation in general, we can call such 
symbols "event-symbols".
Event-symbols are (graphical) _objects_ that represent (temporal) 
_events_. A sequence of event symbol objects on the page defines a 
corresponding sequence of temporal events. In CWMN, the relative 
left->right position corresponds to before->after.
Both the graphics and temporal info for these event-symbols can be 
complex. For example, CWMN chord symbols are complex combinations of 
characters and lines (noteheads, stem, accidentals, augmentation dots, 
articulations, ornament signs etc.) that can represent very complex 
temporal events (ornaments, pitch inflexions, dynamic envelopes etc.)
Event symbols can also be neumes, tablatures or Asian characters. 
Sometimes (e.g. when writing for transposing instruments), its not even 
clear whether the symbols represent pitch or fingering. Keeping the 
graphics independent of the meaning solves all these problems.

[2] I'm using the Web MIDI API (see http://caniuse.com/#feat=midi), 
which allows MIDI input and output devices to be selected and individual 
MIDI messages to be sent. So it is useful to have the raw MIDI data 
available in Javascript (the data can be more easily manipulated than if 
the data is in a binary MIDI file). But synchronizing with multiple 
performances would probably be a lot easier if they were stored in 
Standard MIDI files (Verovio's approach). Could the rism-ch midi-player 
be adapted to use MIDI input and output devices? Maybe this is a case 
for different file types. Maybe the target audience for apps that teach 
performance practice is not the same as the audience that needs to use 
MIDI input and output devices. This needs thinking about...

j

-- 
http://james-ingram-act-two.de
https://github.com/notator

Received on Wednesday, 30 November 2016 17:03:51 UTC