- From: Cox, Simon (L&W, Clayton) <Simon.Cox@csiro.au>
- Date: Fri, 24 Jul 2020 00:12:34 +0000
- To: Hugh Glaser <hugh@glasers.org>, Eric Prud'hommeaux <eric@w3.org>
- CC: Antoine Zimmermann <antoine.zimmermann@emse.fr>, Semantic Web <semantic-web@w3.org>, Maxime Lefrançois <maxime.lefrancois@emse.fr>
Yes you would need a UCUM parser. Note however that UCUM is not a "large vocabulary". There is a relatively small set of terminals here http://unitsofmeasure.org/ucum-essence.xml , and a rule to combine these into a countably infinite set. The rule is described here: http://unitsofmeasure.org/ucum.html#section-Syntax-Rules There are a number of implementations listed here https://unitsofmeasure.org/trac at 'Implementation Support'. This documentation has not been updated for about 3 years, so some of the links might be stale, and there may be others. A units-of-measure library, with UCUM support, that was available to be integrated into RDF applications would be a significant contribution to the community. Simon > -----Original Message----- > From: Hugh Glaser <hugh@glasers.org> > Sent: Friday, 24 July, 2020 08:58 > To: Eric Prud'hommeaux <eric@w3.org> > Cc: Antoine Zimmermann <antoine.zimmermann@emse.fr>; Semantic Web > <semantic-web@w3.org>; Maxime Lefrançois > <maxime.lefrancois@emse.fr> > Subject: Re: Blank nodes must DIE! [ was Re: Blank nodes semantics - > existential variables?] > > If I understand correctly. > I will need to add a UCUM parser to my system to be able to process these > datatypes, if people send them to me in their RDF? > In fact, I will need a UCUM to RDF converter to be able to "understand" > properly what they "mean"? > Does such an animal exist? > > It looks to me that UCUM is quite a large vocabulary of units, for a start - > what would the URI for the "liter" unit of measurement be, for example? > > I'm very happy to have widely adopted standards like this - I just want to > keep my Semantic Web processing in the Semantic Web (RDF), and as simple > as possible. > Or at least be helped to do that. > > Cheers > > > On 23 Jul 2020, at 23:06, Eric Prud'hommeaux <eric@w3.org> wrote: > > > > On Tue, Jul 21, 2020 at 02:35:02PM +0200, Antoine Zimmermann wrote: > >> Regarding physical quantities, such as "5 inches", etc., my colleague > >> Maxime Lefrançois and myself coauthored a specification for a > >> datatype for physical quantities [1]. It is quite simple: we reuse > >> the Unified Code for Units of Measurement (UCUM), a standard that is > >> used in many scientific applications, and combine it with a number: > >> > >> <QUANTITY> ::= <NUMBER> <SPACES> <UCUMCODE> <NUMBER> ::= > >> xsd:decimal(('e'|'E')xsd:integer)? > >> > >> Since UCUM has a well defined semantics, so does our datatype. > >> Better, since UCUM is implemented in many programming languages, my > >> colleague Maxime could easily integrate it into Jena and its SPARQL engine > [2]. > >> > >> So, with our Jena fork, one can write: > >> > >> SELECT ?planet WHERE { > >> ?planet a ex:Planet; > >> ex:diameter ?s . > >> FILTER(?s > "2e11 mm"^^cdt:ucum) > >> } > > > > I applaud the work to extend XSD's numeric types so that RDF can have > standard measurement types. But why not leverage your work by adding > SPARQL support for UCUM types? e.g. > > > > SELECT ?planet WHERE { > > ?planet a ex:Planet; > > ex:diameter ?s . > > FILTER(?s > "2e11"^^ucum:mm) > > } > > > > It feels cleaner to me to embed the entire type of the data in the literal's > datatype rather than spreading it across an aggregator type (cdt:ucum) and > the lexical value (" mm"). > > > > In either case we probably have a union type in the lexical value so we'd > have to micro-parse doubles, decimals and integers, but the parsing is easier > if the measurement unit is broken out into the end of the datatype URL. > > > > There are a few UCUM units that aren't viable localnames (e.g. "m/s.s"), > but I think we can encode around that (e.g. "m_s.s") in a way that still makes > ucum: a practical namespace for datatypes. > > > > > >> This works if the size of the planet is encoded as a cdt:ucum, no > >> matter what unit one is using. One can even use "link for Gunter's > >> chain" (unit "[lk_us]"), or "cubic meters per acre" (unit > >> "m3/[acr_us]") [3], which are both units of length. > >> > >> With some of our industrial partners, we are using this for energy > >> data, and they seem to be very pleased with this approach, compared > >> to an ontology-based approach. > >> > >> > >> [1] https://w3id.org/lindt/custom_datatypes#ucum > >> [2] You can try it at > >> https://ci.mines-stetienne.fr/lindt/playground.html > >> [3] Try this query in the playground: > >> > >> """ > >> PREFIX iter: <http://w3id.org/sparql-generate/iter/> > >> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> > >> PREFIX cdt: <http://w3id.org/lindt/custom_datatypes#> > >> PREFIX ex: <http://example.org/> > >> > >> SELECT ?length ?normalized > >> > >> WHERE{ > >> > >> VALUES ?position { "2.7e3 m3/[acr_us]"^^cdt:ucum } # convert to > >> meters > >> BIND("0 m"^^cdt:ucum + ?position AS ?normalized ) > >> > >> } > >> """ > >> > >> --AZ > >> > >> Le 17/07/2020 à 01:57, Cox, Simon (L&W, Clayton) a écrit : > >>> Yeah, the atomicity of the chunk is the point. This even applies to > >>> quantities. 25.4mm is *identical* to 1” – they are the same thing. > >>> Any engine that operates with quantities needs to understand that. > ’25.4’ > >>> and ‘mm’ cannot be separated. Coordinates are slightly more complex > >>> but it comes down to the same thing. A single element within a set > >>> of coordinates that describes a position in space is not independent > >>> of the other numbers in the tuple, or of the coordinate reference > >>> system within which they are expressed. One value should *never* be > >>> used independent of the others. Exactly the same position on the > >>> earth will be denoted by three different numbers if embedded in a > >>> different coordinate reference system. You can only ‘reason’ over them > as a group, not individually. > >>> > >>> *From:*Dan Brickley <danbri@danbri.org> > >>> *Sent:* Thursday, 16 July, 2020 23:58 > >>> *To:* Jeen Broekstra <jeen@fastmail.com> > >>> *Cc:* Semantic Web <semantic-web@w3.org> > >>> *Subject:* Re: Blank nodes must DIE! [ was Re: Blank nodes semantics > >>> - existential variables?] > >>> > >>> … > >>> > >>> I believe the big appeal of putting it all into the zone we call > >>> "literals" is that you get a kind of atomicity; that chunk of data > >>> is either there, or not there; it is asserted, or not asserted. With > >>> a triples-based (description of a ) data structure you have to be > >>> constantly on your guard that every subset of the full graph pattern > >>> is at least sensible and harmless, even when subsetting these chunks > >>> is often confusing or misleading for data consumers. I can't help > >>> wondering whether notions of graph shapes from shacl, shex (and > >>> sparql) could be exploited to create an RDF-based data format which > >>> had atomicity at the level of entire shapes. > >>> > >>> Dan > >>> > >>> Jeen > >>> > >> > >> -- > >> Antoine Zimmermann > >> Institut Henri Fayol > >> École des Mines de Saint-Étienne > >> 158 cours Fauriel > >> CS 62362 > >> 42023 Saint-Étienne Cedex 2 > >> France > >> Tél:+33(0)4 77 42 66 03 > >> Fax:+33(0)4 77 42 66 66 > >> http://www.emse.fr/~zimmermann/ > >> Member of team Connected Intelligence, Laboratoire Hubert Curien > > -- > Hugh > 023 8061 5652 >
Received on Friday, 24 July 2020 00:13:18 UTC