- From: Cox, Simon (L&W, Clayton) <Simon.Cox@csiro.au>
- Date: Thu, 21 Jan 2021 08:23:36 +0000
- To: Antoine Zimmermann <antoine.zimmermann@emse.fr>, "Abhyankar, Swapna" <sabhyank@regenstrief.org>, “semantic-web@w3.org” <semantic-web@w3.org>
- CC: David Booth <david@dbooth.org>, Maxime Lefrançois <maxime.lefrancois@emse.fr>
Some quick comments on the other issues: - namespace - would be good to get this into the RDF or even XSD namespace, alongside the other standard datatypes ... - dimensionless is always tricky, and note that %, ‰, ppm, ppb are typically used to scale dimensionless numbers so that has to be accounted for - QUDT has a treatment for dimension vectors - see http://www.qudt.org/doc/DOC_VOCAB-DIMENSION-VECTORS.html - I generally prefer `xsd:decimal` in the transfer layer. Of course that doesn't help much in the internal implementation - Jan Martin Keil wrote a short paper on the issue available here https://arxiv.org/pdf/2011.08077.pdf -----Original Message----- From: Antoine Zimmermann <antoine.zimmermann@emse.fr> Sent: Thursday, 21 January, 2021 19:01 To: Cox, Simon (L&W, Clayton) <Simon.Cox@csiro.au>; Abhyankar, Swapna <sabhyank@regenstrief.org>; “semantic-web@w3.org” <semantic-web@w3.org> Cc: David Booth <david@dbooth.org>; Maxime Lefrançois <maxime.lefrancois@emse.fr> Subject: Re: [External] Re: UCUM licensing [was Re: Blank nodes must DIE! ] Thanks for the reply, Simon. One point that deserves mentionning: cdt:ucum and its implementation are in a state of proof of concepts. We are aware of limitations and flaws that should be fixed if this is going to move towards more adoption. In particular: - the URI for cdt:ucum is http://w3id.org/lindt/custom_datatypes#ucum, which is such because initially, we worked on the idea of "custom datatypes" and cdt:ucum was merely a part of a larger idea. In an effort to standardise, or at least cristalise the specification, a more neutral namespace should be used. - the current specification has an issue for values that are dimensionless. One would expect that you can write dimensionless quantities as plain numbers like "1"^^cdt:ucum, but with the current specification, one has to have a number followed by a space, followed by a unit, where the unit can be the empty string. So you have to write: "1 "^^cdt:ucum instead, which is ugly. - the current specification mentions a bunch of subtypes that limit the value space and lexical space to one dimension (such as length, mass, power). The set of dimnesions is quite arbitrary as it is based on what the Java library we use is supporting. I believe that datatypes for specific dimensions should be put in an optional spec to keep the datatype clean. Also, the specification of these datatypes is currently defined according to an older version of the cdt:ucum spec (due to lack of time and lazyness on my part), so the text of their description is irrelevant right now (as of version v3 of the cdt:ucum spec). - the implementation of UCUM that we use has significant flaws: numeric values are approximated into float. So, for instance, any use of constant pi and fractions of pi, or any value that cannot be represented as binary floating point with fixed precision is inexact in our implementation. - there is a little issue, pointed to me by Peter Patel-Schneider, in the UCUM spec itself. The constant for astronomical unit is wrong. It uses an old value for AU that has been normatively updated in 2012. The whole idea of using UCUM in literals can only succeed if UCUM is kept up to date wrt the evolutions of standards of measurement. All these issues are not awfully difficult to solve, but should be fixed collectively if this is going to be added to SPARQL, for instance. For this purpose, I've got a task in my pile of TODOs to create a community group revolving around the idea of representing measurements in RDF. The main task of the group would be to fix cdt:ucum, but it does not have to be limited to this. In particular, the problem of representing measurement is much larger than the problem of representation exact physical quantities. A measurement has a precision or error margin, a method of measurement, a date and time of measurement, etc. The community group could also discuss the idea of possible IRIs for individual units and the potential benefits (or drawbacks) of having such. Best, --AZ Le 18/01/2021 à 02:21, Cox, Simon (L&W, Clayton) a écrit : > Thanks Antoine - > > TBQH I could swing either way on the encoding/typing issue. Your arguments are good. I feel that the use of a ^^unit:XX datatype is a little more elegant, definitely more extensible, and provides the opportunity to include an explicit link to a definition of the unit, and in general explicit semantics is a Good Thing. But I also grant that it risks burdening 90% of simple cases with a tax in order to support 10% of more complex cases. > > In another place I have strongly advocated the use of a micro-format with a literal for a complex 'quantity' - i.e. for geometries in GeoSPARQL, where I think the correct decision was made to 'partition' complex values into a formatted literal, rather than push maths into the reasoning layer. So I have definitely had a foot in your camp. > > Furthermore, the energy that you and Maxime have brought to actually implementing the solution may be enough to carry the day. Do you have a sense of which/how many RDF processing platforms/libraries have implemented or are close to implementing/adopting your proposal? Would you be proposing some additions to the clauses in SPARQL Query Language to accommodate cdt:ucum - e.g. > > 1. add `cdt:ucum` to the list of Operand Data Types in > https://www.w3.org/TR/sparql11-query/#operandDataTypes > 2. add `cdt:ucum` in the tabulation of binary operators > https://www.w3.org/TR/sparql11-query/#OperatorMapping > > Simon > > -----Original Message----- > From: Antoine Zimmermann <antoine.zimmermann@emse.fr> > Sent: Friday, 15 January, 2021 22:05 > To: Cox, Simon (L&W, Clayton) <Simon.Cox@csiro.au>; Abhyankar, Swapna > <sabhyank@regenstrief.org>; “semantic-web@w3.org” > <semantic-web@w3.org> > Cc: David Booth <david@dbooth.org>; Maxime Lefrançois > <maxime.lefrancois@emse.fr> > Subject: Re: [External] Re: UCUM licensing [was Re: Blank nodes must > DIE! ] > > Simon, > > > I think you missed part of a sentence: > > "Dereferencing these should get a , and that these should" > > > By the way, I am reiterating that our proposed solution (at > https://ci.mines-stetienne.fr/lindt/v3/custom_datatypes#ucum) does not use "web links for individual units-of-measure" and we do so on purpose. > > One reasons for this is precisely the sentence you quote. If someone devise a set of IRIs to identify units of measure, then they are proposing a way to identify units that competes with UCUM. > > There could be a working group that tries to propose a standard set of IRIs for units of measure, but it would have to be independent of UCUM. > > Instead, our proposal is simply reusing the UCUM codes strictly, which is perfectly compatible with the UCUM licence. > > There are other reasons why we do not use distinct IRIs for units, some very technical, some less. With Maxime, we are trying to gather all these reasons and some extra rationale for the design of cdt:ucum, but it's a busy period, so we are not done yet. But here are a few things I can already say, from the top of my head: > > - Having to parse the datatype IRI in order to decide the meaning of a > literal goes against the principle of IRI opacity > - Infinite sets of IRIs in standard vocabularies are awful (think of > rdf:_1, rdf_2, etc., they are always annoying when you want to be > fully > conformant) > - A number with a datatype IRI is an awkward way of writing a physical quantity. Of course, in RDF, you have to have a datatype IRI anyway, but it's much easier if you just have a single IRI to remember (and interface or syntactic sugar could even hide it). > - With one IRI per unit, you need to understand the URL-encoding of compound units. How do you identify "kilometer per hour"? "volts per meter squared"? the number pi? With UCUM, most units are written the way every scientists have always written their quantities. > - When the lexical form looks like a number, it's tempting to hack the support for physical quantities into a system that does not support a dedicated datatype for physical quantities. For instance, it's tempting to treat "10"^^ucum:m as the integer (or decimal, or float?) 10 and go on with the processing. But physical quantities are not numbers. This kind of hacks break as soon as you change the unit. The number 10 is not a length. If you want to support physical quantities in the general sense, you need to be able to deal with the various units, so you need to parse the IRI, which is at least as much complicated as parsing the lexical form of a literal. > - If lexical forms that are not numbers are problematic, then why are people using xsd:dateTime and not xsd:unixepoch? We have to think about physical quantities as something that is as important as dates: > something that requires support embedded in standardised > implementations > - With multiple IRIs, physical quantity arithmetics is very cumbersome. > If I have a duration in seconds and a distance in meters and I want to > compute a speed in kilometers per hour, then I'd have to do IRI > gymnastics from hell to compute the products. All this is seemlessly > supported easily in SPARQL with cdt:ucum (see our online playground > https://ci.mines-stetienne.fr/lindt/playground.html) > > > --AZ > > Le 14/01/2021 à 20:10, Cox, Simon (L&W, Clayton) a écrit : >> Just found this draft that I should have sent 4 months ago. >> >> ----------- >> >> Thanks Swapna – >> >> Looking at https://ucum.org/trac/wiki/TermsOfUse >> <https://ucum.org/trac/wiki/TermsOfUse> I see the following issues: >> >> Clause 1) >> >> “… users shall not use any of the Licensed Materials for the purpose >> of developing or promulgating a different standard for identifying >> units of measure …” >> >> It has been proposed to use UCUM codes in the context of RDF data to >> indicate the scale for quantity data. The exact arrangement has not >> been decided. But it is expected that it will be necessary to use >> web-links for individual units-of-measure, which will include the >> UCUM code as an element or an argument. Dereferencing these should >> get a , and that these should >> >> QUDT (http://qudt.org/ <http://qudt.org/>) is a different standard, >> designed with an RDF/OWL/SHACL representation. Every unit-of-measure >> in the QUDT vocabulary is ‘identified’ using a URI. There is an >> (optional) field in QUDT to add the UCUM code as an ‘annotation’ on >> an individual description, to support cross-matching or to enable >> discovery using the UCUM code. But the use of UCUM codes in the >> context of QUDT appears to violate this provision. >> >> Simon >> >> *From:*Abhyankar, Swapna <sabhyank@regenstrief.org> >> *Sent:* Friday, 4 September, 2020 01:15 >> *To:* “semantic-web@w3.org” <semantic-web@w3.org> >> *Cc:* David Booth <david@dbooth.org>; Maxime Lefrançois >> <maxime.lefrancois@emse.fr> >> *Subject:* Re: [External] Re: UCUM licensing [was Re: Blank nodes >> must DIE! ] >> >> Hi everyone, >> >> Can someone please summarize the issues related to the UCUM terms of >> use? We are in the process of updating the terms, so your timing is >> perfect, but the bottom line is that when the codes are used in >> conjunction with clinical data, no copyright notice will be required. >> >> Thank you! >> >> -Swapna >> >> ---------------------------------------------------- >> >> *Swapna Abhyankar, MD* >> >> Interim Director >> >> LOINC and Health Data Standards >> >> 1101 West Tenth Street >> >> Indianapolis, IN 46202 >> >> Confidentiality Notice: The contents of this message and any files >> transmitted with it may contain confidential and/or privileged >> information and are intended solely for the use of the named >> addressee(s). Additionally, the information contained herein may have >> been disclosed to you from medical records with confidentiality >> protected by federal and state laws. Federal regulations and State >> laws prohibit you from making further disclosure of such information >> without the specific written consent of the person to whom the >> information pertains or as otherwise permitted by such regulations. A >> general authorization for the release of medical or other information >> is not sufficient for this purpose. >> >> If you have received this message in error, please notify the sender >> by return e-mail and delete the original message. Any retention, >> disclosure, copying, distribution or use of this information by >> anyone other than the intended recipient is strictly prohibited. >> >> *From: *Maxime Lefrançois <maxime.lefrancois@emse.fr >> <mailto:maxime.lefrancois@emse.fr>> >> *Date: *Thursday, September 3, 2020 at 10:22 AM >> *To: *David Booth <david@dbooth.org <mailto:david@dbooth.org>> >> *Cc: *"“semantic-web@w3.org <mailto:semantic-web@w3.org>”" >> <semantic-web@w3.org <mailto:semantic-web@w3.org>>, "Abhyankar, Swapna" >> <sabhyank@regenstrief.org <mailto:sabhyank@regenstrief.org>> >> *Subject: *[External] Re: UCUM licensing [was Re: Blank nodes must >> DIE! ] >> >> This message was sent from a non-IU address. Please exercise caution >> when clicking links or opening attachments from external sources. >> >> Dear all, >> >> I am very happy to see that things are moving in the right direction >> with UCUM! Thanks a lot! >> >> I am looking forward to resuming the work on cdt:ucum, and would be >> excited to see it usable in triplestores. >> >> Best regards, >> >> Maxime Lefrançois >> >> MINES Saint-Étienne >> >> http://maxime-lefrancois.info/ <http://maxime-lefrancois.info/> >> >> Le jeu. 3 sept. 2020 à 16:15, David Booth <david@dbooth.org >> <mailto:david@dbooth.org>> a écrit : >> >> FYI, Swapna Abhyankar from Regenstrief (copied) is working on updating >> the UCUM license, and reached out to me to understand the concerns that >> have been raised on this list. I suggested that he join this >> discussion, to directly understand all concerns. >> >> David Booth >> >> On 9/3/20 5:43 AM, Antoine Zimmermann wrote: >> > Indeed, Dave. The datatype discussed in this thread is the one >> > colloquially identified as cdt:ucum, which stands for: >> > >> > http://w3id.org/lindt/custom_datatypes#ucum >> <http://w3id.org/lindt/custom_datatypes#ucum> >> > >> > This URI dereferences to a documentation which is currently in >> > disagreement with the Copyright Notice and License of UCUM since it does >> > not include the said notice. >> > >> > The documentation is a draft, subject to evolve, and is not currently >> > officially endorsed by any organisation, although we know people other >> > than us who are using it in their projects. >> > >> > The URI contains the term "custom_datatype" because it is one of several >> > custom datatypes that we are defining for various purposes. It was not >> > initially planned to separate cdt:ucum from our other custom datatypes, >> > but if their is a community willing to push this work towards >> > standardisation, we should give a second thought to the namespace of the >> > URI. >> > >> > We should also, obviously, update the documentation to make the >> > Copyright Notice appear explicitly. >> > >> > However, I doubt that the copyright notice can legally enforce anyone to >> > include the notice if they are merely using the codes in data about >> > measurements or physical quantities. So, as far as I'm concerned, I will >> > continue to use these codes and the cdt:ucum datatype whenever relevant >> > in my projects or publications, as well as encourage others to do so. >> > >> > >> > --AZ >> > >> > >> > >> > Le 03/09/2020 à 10:14, Dave Reynolds a écrit : >> >> On 03/09/2020 09:04, Cox, Simon (L&W, Clayton) wrote: >> >>> >> >>> * That just allows exchange of any /measurements/ >> >>> >> >>> This is the RDF application that we were discussing in this thread, I >> >>> think – where the UCUM code only appears in the context of a >> >>> measurement instance (i.e. a quantity) either embedded in the literal >> >>> else appearing in a data-type. >> >>> >> >> >> >> If appearing as a data-type that would be a URI surely? And, if a URI, >> >> given this is on the semantic-web list, wouldn't that URI resolve to >> >> something? That something would be explicitly or implicitly >> >> communicating partial information from UCUM. It's whoever puts up >> >> those data type URIs that needs to find a way through the "prickly" >> >> license. >> >> >> >> Dave >> >> >> >>> I can see your point that QUDT may be violating the strict >> >>> interpretation, so will attempt to clear that up separately. But I >> >>> still content that the use-case canvassed in this thread is OK. >> >>> >> >>> *From:*Dave Reynolds <dave.reynolds@epimorphics.com <mailto:dave.reynolds@epimorphics.com>> >> >>> *Sent:* Thursday, 3 September, 2020 17:49 >> >>> *To:* semantic-web@w3.org <mailto:semantic-web@w3.org> >> >>> *Subject:* Re: Blank nodes must DIE! [ was Re: Blank nodes semantics >> >>> - existential variables?] >> >>> >> >>> On 03/09/2020 03:43, Cox, Simon (L&W, Clayton) wrote: >> >>> >> >>> Dan Brickley wrote (a while back): >> >>> >> >>> ØOn Thu, 23 Jul 2020 at 19:50, Patrick J Hayes <phayes@ihmc.us <mailto:phayes@ihmc.us> >> >>> >> >>> <mailto:phayes@ihmc.us >> <mailto:phayes@ihmc.us>?Subject=Re%3A%20Blank%20nodes%20must%20DIE!%20%5B%20was%20Re%3A%20Blank%20nodes%20semantics%20-%20%20existential%20variables%3F%5D&In-Reply-To=%3CCAFfrAFqgq7JxxwzEhYoMV70haRznXkjLBiOwhQUjwGJ0S0vsug%40mail.gmail.com >> <http://40mail.gmail.com>%3E&References=%3CCAFfrAFqgq7JxxwzEhYoMV70haRznXkjLBiOwhQUjwGJ0S0vsug%40mail.gmail.com >> <http://40mail.gmail.com>%3E>> >> >>> >> >>> wrote: >> >>> >> >>> Ø >> >>> >> >>> Ø> Excellent. I have thought for some time that this way of using >> >>> datatyping >> >>> >> >>> Ø> would be the right way to go. Congratulations on having >> >>> actually done it :-) >> >>> >> >>> Ø> >> >>> >> >>> Ø >> >>> >> >>> ØThis is really interesting. Every couple of years I stumble >> >>> across UCUM ( >> >>> >> >>> Øhttp://unitsofmeasure.org/trac <http://unitsofmeasure.org/trac>-> >> >>> >> >>> Øhttp://unitsofmeasure.org/trac/wiki/TermsOfUse >> <http://unitsofmeasure.org/trac/wiki/TermsOfUse>) before being >> >>> scared away by >> >>> >> >>> Øthe prickly terms of use document. It is not a document that >> >>> seems to >> >>> >> >>> Øwelcome re-use. >> >>> >> >>> Ø >> >>> >> >>> ØDan >> >>> >> >>> I’ve attempted to clarify this with Gunther Schadow, but can’t get >> >>> a response. >> >>> >> >>> Meanwhile, I was pointed to this service which does quantity >> >>> conversions based on UCUM codes: >> >>> >> >>> * Form UI - https://ucum.nlm.nih.gov/ucum-lhc/demo.html >> <https://ucum.nlm.nih.gov/ucum-lhc/demo.html> >> >>> * API - https://ucum.nlm.nih.gov/ucum-service.html >> <https://ucum.nlm.nih.gov/ucum-service.html> >> >>> >> >>> FWIW QUDT now has basic UCUM support as well - >> >>> >> >>> https://github.com/qudt/qudt-public-repo/blob/master/schema/SCHEMA_QUDT-v2.1.ttl#L2924 >> <https://github.com/qudt/qudt-public-repo/blob/master/schema/SCHEMA_QUDT-v2.1.ttl#L2924> >> >>> >> >>> >> >>> >> >>> I peered into the UCUM Terms of Use document and I believe this is >> >>> the relevant clause: >> >>> >> >>> * 5) UCUM codes and other information from the UCUM table may be >> >>> used in electronic messages communicating measurements without >> >>> the need to include this Copyright Notice and License or a >> >>> reference thereto in the message (and without the need to >> >>> include all fields required by Section 7 hereof). >> >>> >> >>> So I think we are in the clear to use UCUM codes in the manner >> >>> that has been discussed in this conversation. >> >>> >> >>> I disagree. >> >>> >> >>> That just allows exchange of any /measurements/, it doesn't allow use >> >>> of UCUM codes within metadata. Any service which, for example, >> >>> provided metadata on units of measures and included UCUM codes as >> >>> part of that metadata would be in violation. Assuming it including >> >>> non UCUM metadata then it would violate the "not add any new >> >>> contents" element of clause 2. If you kept the UCUM codes separate >> >>> and included /all/ the fields required then you might be able to >> >>> claim that as the "master term dictionary" use allowed under clause 7 >> >>> but then would have to show how you were satisfying the notice >> >>> requirement which has no such corresponding allowance for "electronic >> >>> messages". >> >>> >> >>> I am not a lawyer and so what I say here carries no value. Perhaps >> >>> the QUDT folks, if they are now using UCUM, have a documented legal >> >>> opinion that suggests more flexible reuse is possible. >> >>> >> >>> Dave >> >>> >> >>> *Simon J D Cox * >> >>> >> >>> Research Scientist - Environmental Informatics >> >>> <https://research.csiro.au/ei <https://research.csiro.au/ei>> >> >>> >> >>> Team Leader – Environmental Information Infrastructure >> >>> >> >>> CSIRO Land and Water <http://www.csiro.au/Research/LWF <http://www.csiro.au/Research/LWF>> >> >>> >> >>> ** >> >>> >> >>> *E*simon.cox@csiro.au >> <mailto:*E*simon.cox@csiro.au><mailto:simon.cox@csiro.au >> <mailto:simon.cox@csiro.au>> *T*+61 3 9545 >> >>> 2365 *M*+61 403 302 672 >> >>> >> >>> /Mail:/ Private Bag 10, Clayton South, Vic 3169 >> >>> >> >>> /Visit: /Central Reception,//Research Way, Clayton, Vic 3168 >> >>> ///honey.zebra.chip <https://w3w.co/honey.zebra.chip <https://w3w.co/honey.zebra.chip>> >> >>> >> >>> /Workstation:/ Building 209 ///couple.page.roses >> >>> <https://w3w.co/couple.page.roses <https://w3w.co/couple.page.roses>> >> >>> >> >>> /Deliver: /Gate 3, Normanby Road, Clayton, Vic 3168 >> >>> >> >>> people.csiro.au/Simon-Cox >> <http://people.csiro.au/Simon-Cox><http://people.csiro.au/Simon-Cox >> <http://people.csiro.au/Simon-Cox>> >> >>> >> >>> orcid.org/0000-0002-3884-3420 >> <http://orcid.org/0000-0002-3884-3420><http://orcid.org/0000-0002-3884-3420 >> <http://orcid.org/0000-0002-3884-3420>> >> >>> >> >>> github.com/dr-shorthair >> <http://github.com/dr-shorthair><https://github.com/dr-shorthair >> <https://github.com/dr-shorthair>> >> >>> >> >>> Twitter @dr_shorthair <https://twitter.com/dr_shorthair <https://twitter.com/dr_shorthair>> >> >>> >> >>> https://xkcd.com/1810/ <https://xkcd.com/1810/> >> >>> >> >>> CSIRO acknowledges the Traditional Owners of the land, sea and >> >>> waters, of the area that we live and work on across Australia. We >> >>> acknowledge their continuing connection to their culture and we >> >>> pay our respects to their Elders past and present. >> >>> >> >>> The information contained in this email may be confidential or >> >>> privileged. Any unauthorised use or disclosure is prohibited. If >> >>> you have received this email in error, please delete it >> >>> immediately and notify the sender by return email. Thank you. To >> >>> the extent permitted by law, CSIRO does not represent, warrant >> >>> and/or guarantee that the integrity of this communication has been >> >>> maintained or that the communication is free of errors, virus, >> >>> interception or interference. >> >>> >> >>> CSIRO Australia’sNational Science Agency | csiro.au >> >>> <https://www.csiro.au/ <https://www.csiro.au/>> >> >>> >> > >> > -- Antoine Zimmermann Institut Henri Fayol École des Mines de Saint-Étienne 158 cours Fauriel CS 62362 42023 Saint-Étienne Cedex 2 France Tél:+33(0)4 77 42 66 03 Fax:+33(0)4 77 42 66 66 http://www.emse.fr/~zimmermann/
Received on Thursday, 21 January 2021 08:24:36 UTC