- From: Cox, Simon (L&W, Clayton) <Simon.Cox@csiro.au>
- Date: Sun, 26 Jul 2020 10:52:14 +0000
- To: Eric Prud'hommeaux <eric@w3.org>
- CC: Hugh Glaser <hugh@glasers.org>, Antoine Zimmermann <antoine.zimmermann@emse.fr>, Semantic Web <semantic-web@w3.org>, Maxime Lefrançois <maxime.lefrancois@emse.fr>
> wipe:mercury ex:diameter "4879.4"^^ucum:km , "3031.9"^^ucum:mi . > wipe:saturn ex:diameter "1.1e5"^^ucum:km , "72367"^^ucum:mi . > wipe:jupiter ex:diameter "139822"^^ucum:km , "86881"^^ucum:mi . This style is a bit like language tags on string-literals - but here it is "unit tags" on number-literals. It looks like a nice parallel: "air"@en denotes a very different thing to "air"@ms , and "2.5^^ucum:mi is very different to "2.5"@ucum:km . But do we need to remember and avoid the processing issues that came from language tags? Simon > -----Original Message----- > From: Eric Prud'hommeaux <eric@w3.org> > Sent: Saturday, 25 July, 2020 01:38 > To: Cox, Simon (L&W, Clayton) <Simon.Cox@csiro.au> > Cc: Hugh Glaser <hugh@glasers.org>; Antoine Zimmermann > <antoine.zimmermann@emse.fr>; Semantic Web <semantic-web@w3.org>; > Maxime Lefrançois <maxime.lefrancois@emse.fr> > Subject: Re: Blank nodes must DIE! [ was Re: Blank nodes semantics - > existential variables?] > > You'll need a microparsing regardless, and, as Simon points out, it's not that > onerous. My point was just that having to microparse union types out of the > same literal as a UCUM type is more complicated than parsing the UCUM > type out of the literal's datatype. Having numeric types separated from their > units would allow SPARQL 1.1 queries to avoid cracking the literal form, e.g. > > Data: > [[ > PREFIX ex: <http://a.example/astro#> > PREFIX ucum: <http://ucum.nlm.nih.gov/#> PREFIX wipe: > <https://en.wikipedia.org/wiki/> > > wipe:mercury ex:diameter "4879.4"^^ucum:km , "3031.9"^^ucum:mi . > wipe:saturn ex:diameter "1.1e5"^^ucum:km , "72367"^^ucum:mi . > wipe:jupiter ex:diameter "139822"^^ucum:km , "86881"^^ucum:mi . > ]] > > Query: > [[ > PREFIX ex: <http://a.example/astro#> > PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> > PREFIX ucum: <http://ucum.nlm.nih.gov/#> > > SELECT ?planet WHERE { > ?planet ex:diameter ?d . > FILTER(datatype(?d) = ucum:km > && xsd:float(str(?d)) > 1E5) > } > ]] > > Results: > ┌─────────────────────────────────────────┐ > │ ?planet │ > │ <https://en.wikipedia.org/wiki/saturn> │ │ > <https://en.wikipedia.org/wiki/jupiter> │ > └─────────────────────────────────────────┘ > > Here, SPARQL takes care of parsing value types for us ("1.1e5" or "139822") > so the query is safe (datatype(?d) = ucum:km) and pretty easy to compose. > The same query is substantially more tedious with data like: > wipe:saturn ex:diameter "1.1e5 km"^^cdt:ucum , "72367 mi"^^cdt:ucum . > which is likely to lead to unsafe shortcuts. > > So Maxime and Antionne, how hard would it be to transplant your semantics > to apply directly to ucum (presuming their cooperation). > > > On Fri, Jul 24, 2020 at 12:12:34AM +0000, Cox, Simon (L&W, Clayton) wrote: > > Yes you would need a UCUM parser. > > > > Note however that UCUM is not a "large vocabulary". > > There is a relatively small set of terminals here > http://unitsofmeasure.org/ucum-essence.xml , and a rule to combine these > into a countably infinite set. > > The rule is described here: > > http://unitsofmeasure.org/ucum.html#section-Syntax-Rules > > > > There are a number of implementations listed here > https://unitsofmeasure.org/trac at 'Implementation Support'. > > This documentation has not been updated for about 3 years, so some of > the links might be stale, and there may be others. > > > > A units-of-measure library, with UCUM support, that was available to be > integrated into RDF applications would be a significant contribution to the > community. > > > > Simon > > > > > -----Original Message----- > > > From: Hugh Glaser <hugh@glasers.org> > > > Sent: Friday, 24 July, 2020 08:58 > > > To: Eric Prud'hommeaux <eric@w3.org> > > > Cc: Antoine Zimmermann <antoine.zimmermann@emse.fr>; Semantic > Web > > > <semantic-web@w3.org>; Maxime Lefrançois > <maxime.lefrancois@emse.fr> > > > Subject: Re: Blank nodes must DIE! [ was Re: Blank nodes semantics - > > > existential variables?] > > > > > > If I understand correctly. > > > I will need to add a UCUM parser to my system to be able to process > > > these datatypes, if people send them to me in their RDF? > > > In fact, I will need a UCUM to RDF converter to be able to "understand" > > > properly what they "mean"? > > > Does such an animal exist? > > > > > > It looks to me that UCUM is quite a large vocabulary of units, for a > > > start - what would the URI for the "liter" unit of measurement be, for > example? > > > > > > I'm very happy to have widely adopted standards like this - I just > > > want to keep my Semantic Web processing in the Semantic Web (RDF), > > > and as simple as possible. > > > Or at least be helped to do that. > > > > > > Cheers > > > > > > > On 23 Jul 2020, at 23:06, Eric Prud'hommeaux <eric@w3.org> wrote: > > > > > > > > On Tue, Jul 21, 2020 at 02:35:02PM +0200, Antoine Zimmermann wrote: > > > >> Regarding physical quantities, such as "5 inches", etc., my > > > >> colleague Maxime Lefrançois and myself coauthored a specification > > > >> for a datatype for physical quantities [1]. It is quite simple: > > > >> we reuse the Unified Code for Units of Measurement (UCUM), a > > > >> standard that is used in many scientific applications, and combine it > with a number: > > > >> > > > >> <QUANTITY> ::= <NUMBER> <SPACES> <UCUMCODE> <NUMBER> ::= > > > >> xsd:decimal(('e'|'E')xsd:integer)? > > > >> > > > >> Since UCUM has a well defined semantics, so does our datatype. > > > >> Better, since UCUM is implemented in many programming languages, > > > >> my colleague Maxime could easily integrate it into Jena and its > > > >> SPARQL engine > > > [2]. > > > >> > > > >> So, with our Jena fork, one can write: > > > >> > > > >> SELECT ?planet WHERE { > > > >> ?planet a ex:Planet; > > > >> ex:diameter ?s . > > > >> FILTER(?s > "2e11 mm"^^cdt:ucum) } > > > > > > > > I applaud the work to extend XSD's numeric types so that RDF can > > > > have > > > standard measurement types. But why not leverage your work by > > > adding SPARQL support for UCUM types? e.g. > > > > > > > > SELECT ?planet WHERE { > > > > ?planet a ex:Planet; > > > > ex:diameter ?s . > > > > FILTER(?s > "2e11"^^ucum:mm) > > > > } > > > > > > > > It feels cleaner to me to embed the entire type of the data in the > > > > literal's > > > datatype rather than spreading it across an aggregator type > > > (cdt:ucum) and the lexical value (" mm"). > > > > > > > > In either case we probably have a union type in the lexical value > > > > so we'd > > > have to micro-parse doubles, decimals and integers, but the parsing > > > is easier if the measurement unit is broken out into the end of the > datatype URL. > > > > > > > > There are a few UCUM units that aren't viable localnames (e.g. > > > > "m/s.s"), > > > but I think we can encode around that (e.g. "m_s.s") in a way that > > > still makes > > > ucum: a practical namespace for datatypes. > > > > > > > > > > > >> This works if the size of the planet is encoded as a cdt:ucum, no > > > >> matter what unit one is using. One can even use "link for > > > >> Gunter's chain" (unit "[lk_us]"), or "cubic meters per acre" > > > >> (unit > > > >> "m3/[acr_us]") [3], which are both units of length. > > > >> > > > >> With some of our industrial partners, we are using this for > > > >> energy data, and they seem to be very pleased with this approach, > > > >> compared to an ontology-based approach. > > > >> > > > >> > > > >> [1] https://w3id.org/lindt/custom_datatypes#ucum > > > >> [2] You can try it at > > > >> https://ci.mines-stetienne.fr/lindt/playground.html > > > >> [3] Try this query in the playground: > > > >> > > > >> """ > > > >> PREFIX iter: <http://w3id.org/sparql-generate/iter/> > > > >> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> > > > >> PREFIX cdt: <http://w3id.org/lindt/custom_datatypes#> > > > >> PREFIX ex: <http://example.org/> > > > >> > > > >> SELECT ?length ?normalized > > > >> > > > >> WHERE{ > > > >> > > > >> VALUES ?position { "2.7e3 m3/[acr_us]"^^cdt:ucum } # convert to > > > >> meters > > > >> BIND("0 m"^^cdt:ucum + ?position AS ?normalized ) > > > >> > > > >> } > > > >> """ > > > >> > > > >> --AZ > > > >> > > > >> Le 17/07/2020 à 01:57, Cox, Simon (L&W, Clayton) a écrit : > > > >>> Yeah, the atomicity of the chunk is the point. This even applies > > > >>> to quantities. 25.4mm is *identical* to 1” – they are the same thing. > > > >>> Any engine that operates with quantities needs to understand that. > > > ’25.4’ > > > >>> and ‘mm’ cannot be separated. Coordinates are slightly more > > > >>> complex but it comes down to the same thing. A single element > > > >>> within a set of coordinates that describes a position in space > > > >>> is not independent of the other numbers in the tuple, or of the > > > >>> coordinate reference system within which they are expressed. One > > > >>> value should *never* be used independent of the others. Exactly > > > >>> the same position on the earth will be denoted by three > > > >>> different numbers if embedded in a different coordinate > > > >>> reference system. You can only ‘reason’ over them > > > as a group, not individually. > > > >>> > > > >>> *From:*Dan Brickley <danbri@danbri.org> > > > >>> *Sent:* Thursday, 16 July, 2020 23:58 > > > >>> *To:* Jeen Broekstra <jeen@fastmail.com> > > > >>> *Cc:* Semantic Web <semantic-web@w3.org> > > > >>> *Subject:* Re: Blank nodes must DIE! [ was Re: Blank nodes > > > >>> semantics > > > >>> - existential variables?] > > > >>> > > > >>> … > > > >>> > > > >>> I believe the big appeal of putting it all into the zone we call > > > >>> "literals" is that you get a kind of atomicity; that chunk of > > > >>> data is either there, or not there; it is asserted, or not > > > >>> asserted. With a triples-based (description of a ) data > > > >>> structure you have to be constantly on your guard that every > > > >>> subset of the full graph pattern is at least sensible and > > > >>> harmless, even when subsetting these chunks is often confusing > > > >>> or misleading for data consumers. I can't help wondering whether > > > >>> notions of graph shapes from shacl, shex (and > > > >>> sparql) could be exploited to create an RDF-based data format > > > >>> which had atomicity at the level of entire shapes. > > > >>> > > > >>> Dan > > > >>> > > > >>> Jeen > > > >>> > > > >> > > > >> -- > > > >> Antoine Zimmermann > > > >> Institut Henri Fayol > > > >> École des Mines de Saint-Étienne > > > >> 158 cours Fauriel > > > >> CS 62362 > > > >> 42023 Saint-Étienne Cedex 2 > > > >> France > > > >> Tél:+33(0)4 77 42 66 03 > > > >> Fax:+33(0)4 77 42 66 66 > > > >> http://www.emse.fr/~zimmermann/ > > > >> Member of team Connected Intelligence, Laboratoire Hubert Curien > > > > > > -- > > > Hugh > > > 023 8061 5652 > > > > >
Received on Sunday, 26 July 2020 10:52:52 UTC