RE: Blank nodes must DIE! [ was Re: Blank nodes semantics - existential variables?] from Cox, Simon (L&W, Clayton) on 2020-07-24 (semantic-web@w3.org from July 2020)

From: Cox, Simon (L&W, Clayton) <Simon.Cox@csiro.au>
Date: Fri, 24 Jul 2020 00:12:34 +0000
To: Hugh Glaser <hugh@glasers.org>, Eric Prud'hommeaux <eric@w3.org>
CC: Antoine Zimmermann <antoine.zimmermann@emse.fr>, Semantic Web <semantic-web@w3.org>, Maxime Lefrançois <maxime.lefrancois@emse.fr>
Message-ID: <ME2PR01MB2882E5204A97349720DE08A788770@ME2PR01MB2882.ausprd01.prod.outlook.com>
Yes you would need a UCUM parser. 

Note however that UCUM is not a "large vocabulary". 
There is a relatively small set of terminals here http://unitsofmeasure.org/ucum-essence.xml , and a rule to combine these into a countably infinite set. 
The rule is described here: http://unitsofmeasure.org/ucum.html#section-Syntax-Rules 

There are a number of implementations listed here https://unitsofmeasure.org/trac at 'Implementation Support'. 
This documentation has not been updated for about 3 years, so some of the links might be stale, and there may be others.   

A units-of-measure library, with UCUM support, that was available to be integrated into RDF applications would be a significant contribution to the community. 

Simon 

> -----Original Message-----
> From: Hugh Glaser <hugh@glasers.org>
> Sent: Friday, 24 July, 2020 08:58
> To: Eric Prud'hommeaux <eric@w3.org>
> Cc: Antoine Zimmermann <antoine.zimmermann@emse.fr>; Semantic Web
> <semantic-web@w3.org>; Maxime Lefrançois
> <maxime.lefrancois@emse.fr>
> Subject: Re: Blank nodes must DIE! [ was Re: Blank nodes semantics -
> existential variables?]
> 
> If I understand correctly.
> I will need to add a UCUM parser to my system to be able to process these
> datatypes, if people send them to me in their RDF?
> In fact, I will need a UCUM to RDF converter to be able to "understand"
> properly what they "mean"?
> Does such an animal exist?
> 
> It looks to me that UCUM is quite a large vocabulary of units, for a start -
> what would the URI for the "liter" unit of measurement be, for example?
> 
> I'm very happy to have widely adopted standards like this - I just want to
> keep my Semantic Web processing in the Semantic Web (RDF), and as simple
> as possible.
> Or at least be helped to do that.
> 
> Cheers
> 
> > On 23 Jul 2020, at 23:06, Eric Prud'hommeaux <eric@w3.org> wrote:
> >
> > On Tue, Jul 21, 2020 at 02:35:02PM +0200, Antoine Zimmermann wrote:
> >> Regarding physical quantities, such as "5 inches", etc., my colleague
> >> Maxime Lefrançois and myself coauthored a specification for a
> >> datatype for physical quantities [1]. It is quite simple: we reuse
> >> the Unified Code for Units of Measurement (UCUM), a standard that is
> >> used in many scientific applications, and combine it with a number:
> >>
> >> <QUANTITY> ::= <NUMBER> <SPACES> <UCUMCODE> <NUMBER> ::=
> >> xsd:decimal(('e'|'E')xsd:integer)?
> >>
> >> Since UCUM has a well defined semantics, so does our datatype.
> >> Better, since UCUM is implemented in many programming languages, my
> >> colleague Maxime could easily integrate it into Jena and its SPARQL engine
> [2].
> >>
> >> So, with our Jena fork, one can write:
> >>
> >> SELECT ?planet WHERE {
> >>  ?planet a ex:Planet;
> >>    ex:diameter ?s .
> >>  FILTER(?s > "2e11 mm"^^cdt:ucum)
> >> }
> >
> > I applaud the work to extend XSD's numeric types so that RDF can have
> standard  measurement types. But why not leverage your work by adding
> SPARQL support for UCUM types? e.g.
> >
> > SELECT ?planet WHERE {
> >  ?planet a ex:Planet;
> >    ex:diameter ?s .
> >  FILTER(?s > "2e11"^^ucum:mm)
> > }
> >
> > It feels cleaner to me to embed the entire type of the data in the literal's
> datatype rather than spreading it across an aggregator type (cdt:ucum) and
> the lexical value (" mm").
> >
> > In either case we probably have a union type in the lexical value so we'd
> have to micro-parse doubles, decimals and integers, but the parsing is easier
> if the measurement unit is broken out into the end of the datatype URL.
> >
> > There are a few UCUM units that aren't viable localnames (e.g. "m/s.s"),
> but I think we can encode around that (e.g. "m_s.s") in a way that still makes
> ucum: a practical namespace for datatypes.
> >
> >
> >> This works if the size of the planet is encoded as a cdt:ucum, no
> >> matter what unit one is using. One can even use "link for Gunter's
> >> chain" (unit "[lk_us]"), or "cubic meters per acre" (unit
> >> "m3/[acr_us]") [3], which are both units of length.
> >>
> >> With some of our industrial partners, we are using this for energy
> >> data, and they seem to be very pleased with this approach, compared
> >> to an ontology-based approach.
> >>
> >>
> >> [1] https://w3id.org/lindt/custom_datatypes#ucum

> >> [2] You can try it at
> >> https://ci.mines-stetienne.fr/lindt/playground.html

> >> [3] Try this query in the playground:
> >>
> >> """
> >> PREFIX iter: <http://w3id.org/sparql-generate/iter/>
> >> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
> >> PREFIX cdt: <http://w3id.org/lindt/custom_datatypes#>
> >> PREFIX ex: <http://example.org/>
> >>
> >> SELECT ?length ?normalized
> >>
> >> WHERE{
> >>
> >>  VALUES ?position { "2.7e3 m3/[acr_us]"^^cdt:ucum }  # convert to
> >> meters
> >>  BIND("0 m"^^cdt:ucum + ?position AS ?normalized )
> >>
> >> }
> >> """
> >>
> >> --AZ
> >>
> >> Le 17/07/2020 à 01:57, Cox, Simon (L&W, Clayton) a écrit :
> >>> Yeah, the atomicity of the chunk is the point. This even applies to
> >>> quantities. 25.4mm is *identical* to 1” – they are the same thing.
> >>> Any engine that operates with quantities needs to understand that.
> ’25.4’
> >>> and ‘mm’ cannot be separated. Coordinates are slightly more complex
> >>> but it comes down to the same thing. A single element within a set
> >>> of coordinates that describes a position in space is not independent
> >>> of the other numbers in the tuple, or of the coordinate reference
> >>> system within which they are expressed. One value should *never* be
> >>> used independent of the others. Exactly the same position on the
> >>> earth will be denoted by three different numbers if embedded in a
> >>> different coordinate reference system. You can only ‘reason’ over them
> as a group, not individually.
> >>>
> >>> *From:*Dan Brickley <danbri@danbri.org>
> >>> *Sent:* Thursday, 16 July, 2020 23:58
> >>> *To:* Jeen Broekstra <jeen@fastmail.com>
> >>> *Cc:* Semantic Web <semantic-web@w3.org>
> >>> *Subject:* Re: Blank nodes must DIE! [ was Re: Blank nodes semantics
> >>> - existential variables?]
> >>>
> >>> …
> >>>
> >>> I believe the big appeal of putting it all into the zone we call
> >>> "literals" is that you get a kind of atomicity; that chunk of data
> >>> is either there, or not there; it is asserted, or not asserted. With
> >>> a triples-based (description of a ) data structure you have to be
> >>> constantly on your guard that every subset of the full graph pattern
> >>> is at least sensible and harmless, even when subsetting these chunks
> >>> is often confusing or misleading for data consumers. I can't help
> >>> wondering whether notions of graph shapes from shacl, shex (and
> >>> sparql) could be exploited to create an RDF-based data format which
> >>> had atomicity at the level of entire shapes.
> >>>
> >>> Dan
> >>>
> >>>    Jeen
> >>>
> >>
> >> --
> >> Antoine Zimmermann
> >> Institut Henri Fayol
> >> École des Mines de Saint-Étienne
> >> 158 cours Fauriel
> >> CS 62362
> >> 42023 Saint-Étienne Cedex 2
> >> France
> >> Tél:+33(0)4 77 42 66 03
> >> Fax:+33(0)4 77 42 66 66
> >> http://www.emse.fr/~zimmermann/

> >> Member of team Connected Intelligence, Laboratoire Hubert Curien
> 
> --
> Hugh
> 023 8061 5652
>
Received on Friday, 24 July 2020 00:13:18 UTC