- From: Hugh Glaser <hugh@glasers.org>
- Date: Thu, 23 Jul 2020 23:58:07 +0100
- To: Eric Prud'hommeaux <eric@w3.org>
- Cc: Antoine Zimmermann <antoine.zimmermann@emse.fr>, Semantic Web <semantic-web@w3.org>, Maxime Lefrançois <maxime.lefrancois@emse.fr>
If I understand correctly. I will need to add a UCUM parser to my system to be able to process these datatypes, if people send them to me in their RDF? In fact, I will need a UCUM to RDF converter to be able to "understand" properly what they "mean"? Does such an animal exist? It looks to me that UCUM is quite a large vocabulary of units, for a start - what would the URI for the "liter" unit of measurement be, for example? I'm very happy to have widely adopted standards like this - I just want to keep my Semantic Web processing in the Semantic Web (RDF), and as simple as possible. Or at least be helped to do that. Cheers > On 23 Jul 2020, at 23:06, Eric Prud'hommeaux <eric@w3.org> wrote: > > On Tue, Jul 21, 2020 at 02:35:02PM +0200, Antoine Zimmermann wrote: >> Regarding physical quantities, such as "5 inches", etc., my colleague Maxime >> Lefrançois and myself coauthored a specification for a datatype for physical >> quantities [1]. It is quite simple: we reuse the Unified Code for Units of >> Measurement (UCUM), a standard that is used in many scientific applications, >> and combine it with a number: >> >> <QUANTITY> ::= <NUMBER> <SPACES> <UCUMCODE> >> <NUMBER> ::= xsd:decimal(('e'|'E')xsd:integer)? >> >> Since UCUM has a well defined semantics, so does our datatype. Better, since >> UCUM is implemented in many programming languages, my colleague Maxime could >> easily integrate it into Jena and its SPARQL engine [2]. >> >> So, with our Jena fork, one can write: >> >> SELECT ?planet WHERE { >> ?planet a ex:Planet; >> ex:diameter ?s . >> FILTER(?s > "2e11 mm"^^cdt:ucum) >> } > > I applaud the work to extend XSD's numeric types so that RDF can have standard measurement types. But why not leverage your work by adding SPARQL support for UCUM types? e.g. > > SELECT ?planet WHERE { > ?planet a ex:Planet; > ex:diameter ?s . > FILTER(?s > "2e11"^^ucum:mm) > } > > It feels cleaner to me to embed the entire type of the data in the literal's datatype rather than spreading it across an aggregator type (cdt:ucum) and the lexical value (" mm"). > > In either case we probably have a union type in the lexical value so we'd have to micro-parse doubles, decimals and integers, but the parsing is easier if the measurement unit is broken out into the end of the datatype URL. > > There are a few UCUM units that aren't viable localnames (e.g. "m/s.s"), but I think we can encode around that (e.g. "m_s.s") in a way that still makes ucum: a practical namespace for datatypes. > > >> This works if the size of the planet is encoded as a cdt:ucum, no matter >> what unit one is using. One can even use "link for Gunter's chain" (unit >> "[lk_us]"), or "cubic meters per acre" (unit "m3/[acr_us]") [3], which are >> both units of length. >> >> With some of our industrial partners, we are using this for energy data, and >> they seem to be very pleased with this approach, compared to an >> ontology-based approach. >> >> >> [1] https://w3id.org/lindt/custom_datatypes#ucum >> [2] You can try it at https://ci.mines-stetienne.fr/lindt/playground.html >> [3] Try this query in the playground: >> >> """ >> PREFIX iter: <http://w3id.org/sparql-generate/iter/> >> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> >> PREFIX cdt: <http://w3id.org/lindt/custom_datatypes#> >> PREFIX ex: <http://example.org/> >> >> SELECT ?length ?normalized >> >> WHERE{ >> >> VALUES ?position { "2.7e3 m3/[acr_us]"^^cdt:ucum } >> # convert to meters >> BIND("0 m"^^cdt:ucum + ?position AS ?normalized ) >> >> } >> """ >> >> --AZ >> >> Le 17/07/2020 à 01:57, Cox, Simon (L&W, Clayton) a écrit : >>> Yeah, the atomicity of the chunk is the point. This even applies to >>> quantities. 25.4mm is *identical* to 1” – they are the same thing. Any >>> engine that operates with quantities needs to understand that. ’25.4’ >>> and ‘mm’ cannot be separated. Coordinates are slightly more complex but >>> it comes down to the same thing. A single element within a set of >>> coordinates that describes a position in space is not independent of the >>> other numbers in the tuple, or of the coordinate reference system within >>> which they are expressed. One value should *never* be used independent >>> of the others. Exactly the same position on the earth will be denoted by >>> three different numbers if embedded in a different coordinate reference >>> system. You can only ‘reason’ over them as a group, not individually. >>> >>> *From:*Dan Brickley <danbri@danbri.org> >>> *Sent:* Thursday, 16 July, 2020 23:58 >>> *To:* Jeen Broekstra <jeen@fastmail.com> >>> *Cc:* Semantic Web <semantic-web@w3.org> >>> *Subject:* Re: Blank nodes must DIE! [ was Re: Blank nodes semantics - >>> existential variables?] >>> >>> … >>> >>> I believe the big appeal of putting it all into the zone we call >>> "literals" is that you get a kind of atomicity; that chunk of data is >>> either there, or not there; it is asserted, or not asserted. With a >>> triples-based (description of a ) data structure you have to be >>> constantly on your guard that every subset of the full graph pattern is >>> at least sensible and harmless, even when subsetting these chunks is >>> often confusing or misleading for data consumers. I can't help wondering >>> whether notions of graph shapes from shacl, shex (and sparql) could be >>> exploited to create an RDF-based data format which had atomicity at the >>> level of entire shapes. >>> >>> Dan >>> >>> Jeen >>> >> >> -- >> Antoine Zimmermann >> Institut Henri Fayol >> École des Mines de Saint-Étienne >> 158 cours Fauriel >> CS 62362 >> 42023 Saint-Étienne Cedex 2 >> France >> Tél:+33(0)4 77 42 66 03 >> Fax:+33(0)4 77 42 66 66 >> http://www.emse.fr/~zimmermann/ >> Member of team Connected Intelligence, Laboratoire Hubert Curien -- Hugh 023 8061 5652
Received on Thursday, 23 July 2020 22:58:29 UTC