Fwd: Unique ids in MusicXML 3.1 and beyond from James Sutton on 2017-09-11 (public-music-notation-contrib@w3.org from September 2017)

From: James Sutton <jsutton@dolphin-com.co.uk>
Date: Mon, 11 Sep 2017 17:55:20 +0100
To: public-music-notation-contrib@w3.org
Message-Id: <CD116BBB-DA0E-498D-A441-99C3AC099B99@dolphin-com.co.uk>
> Begin forwarded message:
> 
> From: James Sutton <jsutton@dolphin-com.co.uk>
> Subject: Re: Unique ids in MusicXML 3.1 and beyond
> Date: 9 September 2017 at 09:04:36 BST
> To: Jeremy Sawruk <jeremy.sawruk@gmail.com>
> 
> Sorry Jeremy,
> 
>  I think you misunderstand. The associative array does not answer any requirement!
> 
> There are three requirements, each of which does have a solution for the random string id, but the solution is simpler for a number id
> 
> a) Find an object by id eg if I have an instruction in a separate file which says 'add an up-bow marking to note "blah"' (where "blah" is the id)
> The answer for random string ids is to create a map of the hash of the id string to location in the document. It's a bit heavyweight but it will work.
> Note that the hashing algorithm needs to be reliable and reasonably fast - interesting info here if you are interested: https://softwareengineering.stackexchange.com/questions/49550/which-hashing-algorithm-is-best-for-uniqueness-and-speed <https://softwareengineering.stackexchange.com/questions/49550/which-hashing-algorithm-is-best-for-uniqueness-and-speed>
> 
> b) Create a new unique id on inserting a new item in the document
> This is a thornier problem. Given a set of random strings create a new string that is not in that set. It can be done, eg by taking the longest string and adding a character at the end, but I don't know any acceptably simple way 
> 
> c) Validate the ids in the file for uniqueness
> For random string ids we can create a hash table and for each insertion check that the hash is not already in the table.
> 
> All these requirements are more easily and cheaply met by using a uid that is explicitly a number encoded in a known format. Why don't we suggest that there is an agreed interoperable format at the outset to save writing lots of unnecessary extra code to handle strings?
> 
> Your second suggestion is exactly what I will have to do if I meet uids that are not in the expected format. However Johannes Kepper says:
> 
> "If an application would change my IDs without being instructed to do so, that would be the last time for me to use that application… "
> 
> for which I am suggesting an interoperability option to avoid the necessity.
> 
> best regards
> James Sutton
> Dolphin Computing
> http://www.dolphin-com.co.uk <http://www.dolphin-com.co.uk/>
> http://www.seescore.co.uk <http://www.dolphin-com.co.uk/>
> http://www.playscore.co <http://www.dolphin-com.co.uk/>
> 
> 
> 
> 
>> On 8 Sep 2017, at 23:14, Jeremy Sawruk <jeremy.sawruk@gmail.com <mailto:jeremy.sawruk@gmail.com>> wrote:
>> 
>> James asked "However if one program uses 1,2,3.. for ids and another uses "a", "b", "c".. another uses decimal numbers, another uses hex, and another uses "part1-bar1" how can they interoperate in their use of the id?"
>> 
>> My response is that it doesn't matter what the ID is, you can still use them sequentially if you absolutely must do that. You would do this by using an associative array of ID -> Int. Every time you find an ID that isn't in your associative array, you store that value and increment your ID counter. Now if you need the document's ID given your sequential number, you just do a lookup: ID[1] might return "abc". Because this is an associative array, the lookup should be O(1), though there is a memory overhead. Fortunately, most individual MusicXML files are relatively small (< 1MB), so the memory overhead isn't too much.
>> 
>> If that isn't a viable solution, then you could just replace the IDs in the document with the IDs that you need in a preprocessing step. I don't know enough about James' software to know why this is useful, but there is nothing stopping him from doing this (nor should there be). The IDs will always have xml:id semantics when transmitted in documents, but there is nothing to stop a developer from reinterpreting them once they are inside of a MusicXML client. The MusicXML/MNX specifications cannot dictate HOW software is written, they merely specify the interchange format between different pieces of software.
>> 
>> On Fri, Sep 8, 2017 at 4:31 PM, James Sutton <jsutton@dolphin-com.co.uk <mailto:jsutton@dolphin-com.co.uk>> wrote:
>> Hi Michael and all,
>> 
>> comments inline..
>> 
>> James Sutton
>> Dolphin Computing
>> http://www.dolphin-com.co.uk <http://www.dolphin-com.co.uk/>
>> http://www.seescore.co.uk <http://www.dolphin-com.co.uk/>
>> http://www.playscore.co <http://www.dolphin-com.co.uk/>
>> 
>> 
>>> On 8 Sep 2017, at 19:54, Michael Good <mgood@makemusic.com <mailto:mgood@makemusic.com>> wrote:
>>> 
>>> Hi James and all,
>>> 
>>> 
>> ...
>> 
>>> I don’t see where that causes a problem though. What difference does it make how a unique ID is formatted as long as it is unique within the document, which any XML validator will check?
>> 
>> Not true. CodeSynthesis xsd/e (which SeeScore uses) does not check this. This is a sensible approach as it can be expensive (time and space) to check in the general case, especially for applications which don't care
>> 
>>> 
>>> With many MusicXML applications, these id attributes will not be preserved on a round trip. MusicXML applications tend to read in a MusicXML file and convert to the application’s underlying data structures. Data that doesn’t fit into the application’s data structures is ignored. When the file is exported it is coming from the application’s internal data, not the original MusicXML file. So many things may change between import and export. Mogens has mentioned this many times, but it seems inherent in how MusicXML is currently used for document interchange.
>> 
>> yes
>> 
>>> 
>>> With MNX we are working on a format that is better suited as a native representation for applications, for use cases that go well beyond document interchange. So the preservation of data across cooperating applications becomes a more interesting issue for exploration there.
>> 
>> ok, but I am sure we are all apprehensive about the huge pile of work to adopt MNX! Even more for those of us where MusicXML is the document format.
>> 
>>> 
>>> Of course there are applications already using MusicXML as a native format, or the basis for a native format. Those applications can define the format of the unique IDs however they wish. But I still don’t understand how a standardized id format would enhance interoperability.
>> 
>> numbers are cheap for the sort of processing needed for uids. It is a natural choice. However if one program uses 1,2,3.. for ids and another uses "a", "b", "c".. another uses decimal numbers, another uses hex, and another uses "part1-bar1" how can they interoperate in their use of the id?
>> In particular, if you edit the file, add an annotation say,  how can the editor generate a new uid in a file which uses different standards? The only possibility is to regenerate all the ids in the file using the standard that the editor uses, notwithstanding strictures from Johannes ;-).
>> If simple numbers are not seen as a good choice then we could agree some other, but anything involving the index of the item in the file/part/measure does not work as these could not be invariant on file change.
>> 
>>> 
>>> For MusicXML 3.1, the main issue is if there is anything we need to change with these new id attributes before release. I don’t think there is, but I am not confident that I am really understanding what is behind James’s request.
>>> 
>> 
>> Nothing needs to change as far as I am concerned.
>> If you need more info we could go off this thread
>> 
>> 
>
Received on Monday, 11 September 2017 16:55:33 UTC