Re: Blank nodes must DIE! [ was Re: Blank nodes semantics - existential variables?]

Hi Antoine,

you have a vocabulary that you'd like users to adopt. But not only
that, they would also have to adopt a software library specific to
that vocabulary (the Jena fork), because your syntax does not use
standard SPARQL datatype URIs. Am I getting it right?

I'm all for new vocabularies, but having to depend on your Jena fork
in our projects would be a no-go for me. If you want broad adoption,
standard SPARQL is the only way IMO.

A side question (haven't followed the thread very much): how does UCUM
compare to QUDT? https://qudt.org/

Martynas

On Fri, Jul 24, 2020 at 9:49 PM Antoine Zimmermann
<antoine.zimmermann@emse.fr> wrote:
>
> Hugh,
>
> I will answer as if you are asking about our own proposal (cdt:ucum),
> and not Eric's proposal (ucum:m, ucum:W, ucum:s, etc).
>
> Le 24/07/2020 à 00:58, Hugh Glaser a écrit :
> > If I understand correctly.
> > I will need to add a UCUM parser to my system to be able to process these datatypes, if people send them to me in their RDF?
>
> Yes, exactly.
>
> > In fact, I will need a UCUM to RDF converter to be able to "understand" properly what they "mean"?
>
> No, you need RDF with support for cdt:cum. Same thing as when you want
> to use wkt:Literal, from the GeoSPARQL standard. Other RDF parsers will
> still work but they won't be able to understand what the value is.
>
> > Does such an animal exist?
>
> Yes. My colleague Maxime Lefrançois implemented it as a fork of Apache
> Jena. You can try it at
> https://ci.mines-stetienne.fr/lindt/playground.html. There are examples
> queries to play with.
>
> > It looks to me that UCUM is quite a large vocabulary of units, for a start - what would the URI for the "liter" unit of measurement be, for example?
>
> There are infinitely many units. E.g., meters (m), square meters (m2),
> cubic meters (m3), meters to the fourth (m4), etc. There is an infinite
> algebra of units of measures. Nothing to be afraid of: there are
> infinitely many integers, and yet xsd:integer is well supported.
>
> > I'm very happy to have widely adopted standards like this - I just want to keep my Semantic Web processing in the Semantic Web (RDF), and as simple as possible.
> > Or at least be helped to do that.
>
> The alternative approaches always require specific code. If you exclude
> new datatypes (stick to XSDs and RDF datatypes), you need code that
> interpret some vocabulary, e.g.:
>
> <quantity> :numericValue 10;
>    :unit <inches> .
>
> You have to retrieve the relevant triples to reconstruct the quantity.
> You have to deal with the cases where there are missing triples:
>
> <quantity> :numericValue 10 .
>
> or multiple values:
>
> <quantity> :numericValue 10;
>    :unit <inches> .
> <quantity> :numericValue 25.4;
>    :unit <meters> .
>
> It's quite complicated to use, implement, and cumbersome to query.
>
>
> --AZ
>
> >
> > Cheers
> >
> >> On 23 Jul 2020, at 23:06, Eric Prud'hommeaux <eric@w3.org> wrote:
> >>
> >> On Tue, Jul 21, 2020 at 02:35:02PM +0200, Antoine Zimmermann wrote:
> >>> Regarding physical quantities, such as "5 inches", etc., my colleague Maxime
> >>> Lefrançois and myself coauthored a specification for a datatype for physical
> >>> quantities [1]. It is quite simple: we reuse the Unified Code for Units of
> >>> Measurement (UCUM), a standard that is used in many scientific applications,
> >>> and combine it with a number:
> >>>
> >>> <QUANTITY> ::= <NUMBER> <SPACES> <UCUMCODE>
> >>> <NUMBER> ::= xsd:decimal(('e'|'E')xsd:integer)?
> >>>
> >>> Since UCUM has a well defined semantics, so does our datatype. Better, since
> >>> UCUM is implemented in many programming languages, my colleague Maxime could
> >>> easily integrate it into Jena and its SPARQL engine [2].
> >>>
> >>> So, with our Jena fork, one can write:
> >>>
> >>> SELECT ?planet WHERE {
> >>>   ?planet a ex:Planet;
> >>>     ex:diameter ?s .
> >>>   FILTER(?s > "2e11 mm"^^cdt:ucum)
> >>> }
> >>
> >> I applaud the work to extend XSD's numeric types so that RDF can have standard  measurement types. But why not leverage your work by adding SPARQL support for UCUM types? e.g.
> >>
> >> SELECT ?planet WHERE {
> >>   ?planet a ex:Planet;
> >>     ex:diameter ?s .
> >>   FILTER(?s > "2e11"^^ucum:mm)
> >> }
> >>
> >> It feels cleaner to me to embed the entire type of the data in the literal's datatype rather than spreading it across an aggregator type (cdt:ucum) and the lexical value (" mm").
> >>
> >> In either case we probably have a union type in the lexical value so we'd have to micro-parse doubles, decimals and integers, but the parsing is easier if the measurement unit is broken out into the end of the datatype URL.
> >>
> >> There are a few UCUM units that aren't viable localnames (e.g. "m/s.s"), but I think we can encode around that (e.g. "m_s.s") in a way that still makes ucum: a practical namespace for datatypes.
> >>
> >>
> >>> This works if the size of the planet is encoded as a cdt:ucum, no matter
> >>> what unit one is using. One can even use "link for Gunter's chain" (unit
> >>> "[lk_us]"), or "cubic meters per acre" (unit "m3/[acr_us]") [3], which are
> >>> both units of length.
> >>>
> >>> With some of our industrial partners, we are using this for energy data, and
> >>> they seem to be very pleased with this approach, compared to an
> >>> ontology-based approach.
> >>>
> >>>
> >>> [1] https://w3id.org/lindt/custom_datatypes#ucum
> >>> [2] You can try it at https://ci.mines-stetienne.fr/lindt/playground.html
> >>> [3] Try this query in the playground:
> >>>
> >>> """
> >>> PREFIX iter: <http://w3id.org/sparql-generate/iter/>
> >>> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
> >>> PREFIX cdt: <http://w3id.org/lindt/custom_datatypes#>
> >>> PREFIX ex: <http://example.org/>
> >>>
> >>> SELECT ?length ?normalized
> >>>
> >>> WHERE{
> >>>
> >>>   VALUES ?position { "2.7e3 m3/[acr_us]"^^cdt:ucum }
> >>>   # convert to meters
> >>>   BIND("0 m"^^cdt:ucum + ?position AS ?normalized )
> >>>
> >>> }
> >>> """
> >>>
> >>> --AZ
> >>>
> >>> Le 17/07/2020 à 01:57, Cox, Simon (L&W, Clayton) a écrit :
> >>>> Yeah, the atomicity of the chunk is the point. This even applies to
> >>>> quantities. 25.4mm is *identical* to 1” – they are the same thing. Any
> >>>> engine that operates with quantities needs to understand that. ’25.4’
> >>>> and ‘mm’ cannot be separated. Coordinates are slightly more complex but
> >>>> it comes down to the same thing. A single element within a set of
> >>>> coordinates that describes a position in space is not independent of the
> >>>> other numbers in the tuple, or of the coordinate reference system within
> >>>> which they are expressed. One value should *never* be used independent
> >>>> of the others. Exactly the same position on the earth will be denoted by
> >>>> three different numbers if embedded in a different coordinate reference
> >>>> system. You can only ‘reason’ over them as a group, not individually.
> >>>>
> >>>> *From:*Dan Brickley <danbri@danbri.org>
> >>>> *Sent:* Thursday, 16 July, 2020 23:58
> >>>> *To:* Jeen Broekstra <jeen@fastmail.com>
> >>>> *Cc:* Semantic Web <semantic-web@w3.org>
> >>>> *Subject:* Re: Blank nodes must DIE! [ was Re: Blank nodes semantics -
> >>>> existential variables?]
> >>>>
> >>>> …
> >>>>
> >>>> I believe the big appeal of putting it all into the zone we call
> >>>> "literals" is that you get a kind of atomicity; that chunk of data is
> >>>> either there, or not there; it is asserted, or not asserted. With a
> >>>> triples-based (description of a ) data structure you have to be
> >>>> constantly on your guard that every subset of the full graph pattern is
> >>>> at least sensible and harmless, even when subsetting these chunks is
> >>>> often confusing or misleading for data consumers. I can't help wondering
> >>>> whether notions of graph shapes from shacl, shex (and sparql) could be
> >>>> exploited to create an RDF-based data format which had atomicity at the
> >>>> level of entire shapes.
> >>>>
> >>>> Dan
> >>>>
> >>>>     Jeen
> >>>>
> >>>
> >>> --
> >>> Antoine Zimmermann
> >>> Institut Henri Fayol
> >>> École des Mines de Saint-Étienne
> >>> 158 cours Fauriel
> >>> CS 62362
> >>> 42023 Saint-Étienne Cedex 2
> >>> France
> >>> Tél:+33(0)4 77 42 66 03
> >>> Fax:+33(0)4 77 42 66 66
> >>> http://www.emse.fr/~zimmermann/
> >>> Member of team Connected Intelligence, Laboratoire Hubert Curien
> >
>
>

Received on Monday, 27 July 2020 20:02:35 UTC