RE: Blank nodes must DIE! [ was Re: Blank nodes semantics - existential variables?]

Whoops - of course I meant and "2.5^^ucum:mi is very different to "2.5"^^ucum:km
But there we are

> -----Original Message-----
> From: Cox, Simon (L&W, Clayton) <Simon.Cox@csiro.au>
> Sent: Sunday, 26 July, 2020 20:52
> To: Eric Prud'hommeaux <eric@w3.org>
> Cc: Hugh Glaser <hugh@glasers.org>; Antoine Zimmermann
> <antoine.zimmermann@emse.fr>; Semantic Web <semantic-web@w3.org>;
> Maxime Lefrançois <maxime.lefrancois@emse.fr>
> Subject: [ExternalEmail] RE: Blank nodes must DIE! [ was Re: Blank nodes
> semantics - existential variables?]
> 
> > wipe:mercury ex:diameter "4879.4"^^ucum:km , "3031.9"^^ucum:mi .
> > wipe:saturn  ex:diameter "1.1e5"^^ucum:km , "72367"^^ucum:mi .
> > wipe:jupiter ex:diameter "139822"^^ucum:km , "86881"^^ucum:mi .
> 
> This style is a bit like language tags on string-literals - but here it is "unit tags"
> on number-literals.
> It looks like a nice parallel: "air"@en denotes a very different thing to
> "air"@ms  , and "2.5^^ucum:mi is very different to "2.5"@ucum:km .
> 
> But do we need to remember and avoid the processing issues that came
> from language tags?
> 
> Simon
> 
> > -----Original Message-----
> > From: Eric Prud'hommeaux <eric@w3.org>
> > Sent: Saturday, 25 July, 2020 01:38
> > To: Cox, Simon (L&W, Clayton) <Simon.Cox@csiro.au>
> > Cc: Hugh Glaser <hugh@glasers.org>; Antoine Zimmermann
> > <antoine.zimmermann@emse.fr>; Semantic Web <semantic-
> web@w3.org>;
> > Maxime Lefrançois <maxime.lefrancois@emse.fr>
> > Subject: Re: Blank nodes must DIE! [ was Re: Blank nodes semantics -
> > existential variables?]
> >
> > You'll need a microparsing regardless, and, as Simon points out, it's
> > not that onerous. My point was just that having to microparse union
> > types out of the same literal as a UCUM type is more complicated than
> > parsing the UCUM type out of the literal's datatype. Having numeric
> > types separated from their units would allow SPARQL 1.1 queries to avoid
> cracking the literal form, e.g.
> >
> > Data:
> > [[
> > PREFIX ex: <http://a.example/astro#>
> > PREFIX ucum: <http://ucum.nlm.nih.gov/#> PREFIX wipe:
> > <https://en.wikipedia.org/wiki/>
> >
> > wipe:mercury ex:diameter "4879.4"^^ucum:km , "3031.9"^^ucum:mi .
> > wipe:saturn  ex:diameter "1.1e5"^^ucum:km , "72367"^^ucum:mi .
> > wipe:jupiter ex:diameter "139822"^^ucum:km , "86881"^^ucum:mi .
> > ]]
> >
> > Query:
> > [[
> > PREFIX ex: <http://a.example/astro#>
> > PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
> > PREFIX ucum: <http://ucum.nlm.nih.gov/#>
> >
> > SELECT ?planet WHERE {
> >  ?planet ex:diameter ?d .
> >  FILTER(datatype(?d) = ucum:km
> >      && xsd:float(str(?d)) > 1E5)
> > }
> > ]]
> >
> > Results:
> > ┌─────────────────────────────────────────┐
> > │ ?planet                                 │
> > │  <https://en.wikipedia.org/wiki/saturn> │ │
> > <https://en.wikipedia.org/wiki/jupiter> │
> > └─────────────────────────────────────────┘
> >
> > Here, SPARQL takes care of parsing value types for us ("1.1e5" or
> > "139822") so the query is safe (datatype(?d) = ucum:km) and pretty easy to
> compose.
> > The same query is substantially more tedious with data like:
> >   wipe:saturn  ex:diameter "1.1e5 km"^^cdt:ucum , "72367 mi"^^cdt:ucum .
> > which is likely to lead to unsafe shortcuts.
> >
> > So Maxime and Antionne, how hard would it be to transplant your
> > semantics to apply directly to ucum (presuming their cooperation).
> >
> >
> > On Fri, Jul 24, 2020 at 12:12:34AM +0000, Cox, Simon (L&W, Clayton) wrote:
> > > Yes you would need a UCUM parser.
> > >
> > > Note however that UCUM is not a "large vocabulary".
> > > There is a relatively small set of terminals here
> > http://unitsofmeasure.org/ucum-essence.xml , and a rule to combine
> > these into a countably infinite set.
> > > The rule is described here:
> > > http://unitsofmeasure.org/ucum.html#section-Syntax-Rules

> > >
> > > There are a number of implementations listed here
> > https://unitsofmeasure.org/trac at 'Implementation Support'.
> > > This documentation has not been updated for about 3 years, so some
> > > of
> > the links might be stale, and there may be others.
> > >
> > > A units-of-measure library, with UCUM support, that was available to
> > > be
> > integrated into RDF applications would be a significant contribution
> > to the community.
> > >
> > > Simon
> > >
> > > > -----Original Message-----
> > > > From: Hugh Glaser <hugh@glasers.org>
> > > > Sent: Friday, 24 July, 2020 08:58
> > > > To: Eric Prud'hommeaux <eric@w3.org>
> > > > Cc: Antoine Zimmermann <antoine.zimmermann@emse.fr>; Semantic
> > Web
> > > > <semantic-web@w3.org>; Maxime Lefrançois
> > <maxime.lefrancois@emse.fr>
> > > > Subject: Re: Blank nodes must DIE! [ was Re: Blank nodes semantics
> > > > - existential variables?]
> > > >
> > > > If I understand correctly.
> > > > I will need to add a UCUM parser to my system to be able to
> > > > process these datatypes, if people send them to me in their RDF?
> > > > In fact, I will need a UCUM to RDF converter to be able to "understand"
> > > > properly what they "mean"?
> > > > Does such an animal exist?
> > > >
> > > > It looks to me that UCUM is quite a large vocabulary of units, for
> > > > a start - what would the URI for the "liter" unit of measurement
> > > > be, for
> > example?
> > > >
> > > > I'm very happy to have widely adopted standards like this - I just
> > > > want to keep my Semantic Web processing in the Semantic Web (RDF),
> > > > and as simple as possible.
> > > > Or at least be helped to do that.
> > > >
> > > > Cheers
> > > >
> > > > > On 23 Jul 2020, at 23:06, Eric Prud'hommeaux <eric@w3.org> wrote:
> > > > >
> > > > > On Tue, Jul 21, 2020 at 02:35:02PM +0200, Antoine Zimmermann
> wrote:
> > > > >> Regarding physical quantities, such as "5 inches", etc., my
> > > > >> colleague Maxime Lefrançois and myself coauthored a
> > > > >> specification for a datatype for physical quantities [1]. It is quite
> simple:
> > > > >> we reuse the Unified Code for Units of Measurement (UCUM), a
> > > > >> standard that is used in many scientific applications, and
> > > > >> combine it
> > with a number:
> > > > >>
> > > > >> <QUANTITY> ::= <NUMBER> <SPACES> <UCUMCODE> <NUMBER>
> ::=
> > > > >> xsd:decimal(('e'|'E')xsd:integer)?
> > > > >>
> > > > >> Since UCUM has a well defined semantics, so does our datatype.
> > > > >> Better, since UCUM is implemented in many programming
> > > > >> languages, my colleague Maxime could easily integrate it into
> > > > >> Jena and its SPARQL engine
> > > > [2].
> > > > >>
> > > > >> So, with our Jena fork, one can write:
> > > > >>
> > > > >> SELECT ?planet WHERE {
> > > > >>  ?planet a ex:Planet;
> > > > >>    ex:diameter ?s .
> > > > >>  FILTER(?s > "2e11 mm"^^cdt:ucum) }
> > > > >
> > > > > I applaud the work to extend XSD's numeric types so that RDF can
> > > > > have
> > > > standard  measurement types. But why not leverage your work by
> > > > adding SPARQL support for UCUM types? e.g.
> > > > >
> > > > > SELECT ?planet WHERE {
> > > > >  ?planet a ex:Planet;
> > > > >    ex:diameter ?s .
> > > > >  FILTER(?s > "2e11"^^ucum:mm)
> > > > > }
> > > > >
> > > > > It feels cleaner to me to embed the entire type of the data in
> > > > > the literal's
> > > > datatype rather than spreading it across an aggregator type
> > > > (cdt:ucum) and the lexical value (" mm").
> > > > >
> > > > > In either case we probably have a union type in the lexical
> > > > > value so we'd
> > > > have to micro-parse doubles, decimals and integers, but the
> > > > parsing is easier if the measurement unit is broken out into the
> > > > end of the
> > datatype URL.
> > > > >
> > > > > There are a few UCUM units that aren't viable localnames (e.g.
> > > > > "m/s.s"),
> > > > but I think we can encode around that (e.g. "m_s.s") in a way that
> > > > still makes
> > > > ucum: a practical namespace for datatypes.
> > > > >
> > > > >
> > > > >> This works if the size of the planet is encoded as a cdt:ucum,
> > > > >> no matter what unit one is using. One can even use "link for
> > > > >> Gunter's chain" (unit "[lk_us]"), or "cubic meters per acre"
> > > > >> (unit
> > > > >> "m3/[acr_us]") [3], which are both units of length.
> > > > >>
> > > > >> With some of our industrial partners, we are using this for
> > > > >> energy data, and they seem to be very pleased with this
> > > > >> approach, compared to an ontology-based approach.
> > > > >>
> > > > >>
> > > > >> [1] https://w3id.org/lindt/custom_datatypes#ucum

> > > > >> [2] You can try it at
> > > > >> https://ci.mines-stetienne.fr/lindt/playground.html

> > > > >> [3] Try this query in the playground:
> > > > >>
> > > > >> """
> > > > >> PREFIX iter: <http://w3id.org/sparql-generate/iter/>
> > > > >> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
> > > > >> PREFIX cdt: <http://w3id.org/lindt/custom_datatypes#>
> > > > >> PREFIX ex: <http://example.org/>
> > > > >>
> > > > >> SELECT ?length ?normalized
> > > > >>
> > > > >> WHERE{
> > > > >>
> > > > >>  VALUES ?position { "2.7e3 m3/[acr_us]"^^cdt:ucum }  # convert
> > > > >> to meters
> > > > >>  BIND("0 m"^^cdt:ucum + ?position AS ?normalized )
> > > > >>
> > > > >> }
> > > > >> """
> > > > >>
> > > > >> --AZ
> > > > >>
> > > > >> Le 17/07/2020 à 01:57, Cox, Simon (L&W, Clayton) a écrit :
> > > > >>> Yeah, the atomicity of the chunk is the point. This even
> > > > >>> applies to quantities. 25.4mm is *identical* to 1” – they are the
> same thing.
> > > > >>> Any engine that operates with quantities needs to understand
> that.
> > > > ’25.4’
> > > > >>> and ‘mm’ cannot be separated. Coordinates are slightly more
> > > > >>> complex but it comes down to the same thing. A single element
> > > > >>> within a set of coordinates that describes a position in space
> > > > >>> is not independent of the other numbers in the tuple, or of
> > > > >>> the coordinate reference system within which they are
> > > > >>> expressed. One value should *never* be used independent of the
> > > > >>> others. Exactly the same position on the earth will be denoted
> > > > >>> by three different numbers if embedded in a different
> > > > >>> coordinate reference system. You can only ‘reason’ over them
> > > > as a group, not individually.
> > > > >>>
> > > > >>> *From:*Dan Brickley <danbri@danbri.org>
> > > > >>> *Sent:* Thursday, 16 July, 2020 23:58
> > > > >>> *To:* Jeen Broekstra <jeen@fastmail.com>
> > > > >>> *Cc:* Semantic Web <semantic-web@w3.org>
> > > > >>> *Subject:* Re: Blank nodes must DIE! [ was Re: Blank nodes
> > > > >>> semantics
> > > > >>> - existential variables?]
> > > > >>>
> > > > >>> …
> > > > >>>
> > > > >>> I believe the big appeal of putting it all into the zone we
> > > > >>> call "literals" is that you get a kind of atomicity; that
> > > > >>> chunk of data is either there, or not there; it is asserted,
> > > > >>> or not asserted. With a triples-based (description of a ) data
> > > > >>> structure you have to be constantly on your guard that every
> > > > >>> subset of the full graph pattern is at least sensible and
> > > > >>> harmless, even when subsetting these chunks is often confusing
> > > > >>> or misleading for data consumers. I can't help wondering
> > > > >>> whether notions of graph shapes from shacl, shex (and
> > > > >>> sparql) could be exploited to create an RDF-based data format
> > > > >>> which had atomicity at the level of entire shapes.
> > > > >>>
> > > > >>> Dan
> > > > >>>
> > > > >>>    Jeen
> > > > >>>
> > > > >>
> > > > >> --
> > > > >> Antoine Zimmermann
> > > > >> Institut Henri Fayol
> > > > >> École des Mines de Saint-Étienne
> > > > >> 158 cours Fauriel
> > > > >> CS 62362
> > > > >> 42023 Saint-Étienne Cedex 2
> > > > >> France
> > > > >> Tél:+33(0)4 77 42 66 03
> > > > >> Fax:+33(0)4 77 42 66 66
> > > > >> http://www.emse.fr/~zimmermann/ Member of team Connected
> > > > >> Intelligence, Laboratoire Hubert Curien
> > > >
> > > > --
> > > > Hugh
> > > > 023 8061 5652
> > > >
> > >

Received on Sunday, 26 July 2020 11:06:38 UTC