W3C home > Mailing lists > Public > semantic-web@w3.org > June 2021

Re: Lists of tagged strings in RDF

From: Rumph, Frens Jan <mail@frensjan.nl>
Date: Thu, 10 Jun 2021 22:43:48 +0200
Message-ID: <CAH3f1B95NAt1+M=0Csf_LPTUODVELyo5Wdg2h6oLkiJbXtGY4Q@mail.gmail.com>
To: Hugh Glaser <hugh@glasers.org>
Cc: SW-forum <semantic-web@w3.org>
Hello Hugh,

Thank you for your thoughts!

When people move from an existing application in a programming language to
> using RDF, it can often seem that things don't move over easily and
> naturally; and indeed that can be the case.
>

I'm active in the area of gathering, processing and organising "intel".
Think databases, searching for and matching of entities, etc. Most of the
data modelling in the applications I'm working on map very well to RDF;
they are already expressed in either large triple tables or in virtually
partitioned predicate tables with mostly primitive data (text, numbers,
dates, etc.). There's the matter of provenance (all our statements are
annotated with source information), but I won't dwell on e.g. RDF-star here.


> Other have commented many times on this list, that RDF is neither a
> programming language nor a data structure description, so perhaps that is
> not surprising.


The reason for my interest in RDF is that our data model is already pretty
closely aligned, and I'd like to tap into a richer ecosystem. I don't want
'it' to be like the java I already have, but it should give an idea on
where I'm coming from. Let's say that we have some 'literals' that have
structure as well as (some) semantics.

But the main blockers are in the area of person names but also addresses.
For the latter we use a format similar to names ('annotated strings' /
tagged lists of strings) somewhat similar to how Google Maps models them:
https://developers.google.com/maps/documentation/geocoding/overview#GeocodingResponses
.

The applications at hand are tasked with a lot of searching and entity
resolution / matching. Some of the sources used actually have fields like
first given name, second given name, first family name and second family
name (in this case in a spanish context). Another example is sources
discerning between given (formal) names and a so-called roepnaam (a
fairly Dutch concept). Obviously some sources don't have such high
fidelity. My goal of the initial design was to a) not attempt to capture
all nuances of person name cultures, but a fair amount and b) to not get
stuck in an ever growing but always incomplete set of name formats.

In any case, we want to support the notion of people going by various
names; hence going beyond associating given and family names directly with
a person. So Herman Iván would ideally be described as

[ a :Person ;
  :name [ :familyName "Herman" ; :givenName "Iván" ] ;
  :name [ :givenName "Ivan" ; :familyName "Herman" ]
]

Sacha Baron Cohen would I guess ideally be described as:

[ a :Person ;
  :name [ :givenName "Sacha" ; :familyName "Cohen" ] ;
  :name [ :givenName "Sacha" ; :familyName "Baron Cohen" ] ;
  :name [ :givenName "Sacha" ; :givenName "Noam" ; :familyName "Baron
Cohen" ] ;
  :name [ :nickName "Ali G" ] ;
  ...
]

José Plácido Domingo Embil could be described in our systems as:

[ a :Person ;
  :name [ :givenName "Plácido" ; :familyName "Domingo" ] ;
  :name [ :givenName "José" ; :givenName "Plácido" ; :familyName "Domingo"
] ;
  :name [ :givenName "José" ; :givenName "Plácido" ; :familyName "Domingo";
:familyName "Embil"
]

Xi Jinping could be described in our systems as:

[ a :Person ;
  :name [ :familyName "Xi" ; :givenName "Jinping" ] ;
  :name [ :familyName "習" ; :givenName "近平" ] ;
]

Note that we're not all that interested in capturing what someone's actual
name is, we're mostly interested in what someone goes by; i.e. what could
be considered identifying. (I am painfully aware that the namespace in the
Netherlands is a lot less crowded than in e.g. China). And we're solely
dependent on what data and how much structure is available. And most of the
time, there is a difference in what format 'seed' data is available,
possible search formats and the structure of actual records that can be
matched.

A final point of interest in the intended context of use is that
application users are able to change name input provided earlier. So
ideally there is no 'expansion' of earlier input in e.g. a format of
unordered but annotated name elements and a full name string.

Thanks again, for thinking along with me. There are probably concessions to
be made. Feedback like yours stresses my thinking; much appreciated!

Best regards,
Frens Jan


On Thu, Jun 10, 2021 at 9:07 PM Hugh Glaser <hugh@glasers.org> wrote:

> Hi Frens Jan,
>
> Sorry to perhaps be a bit difficult here, rather than answer the question
> as put.
>
> I read your posting with some unease.
> In general:
> When people move from an existing application in a programming language to
> using RDF, it can often seem that things don't move over easily and
> naturally; and indeed that can be the case.
> Other have commented many times on this list, that RDF is neither a
> programming language nor a data structure description, so perhaps that is
> not surprising.
>
> Without the specific set of ways in which you will be using the knowledge
> (rather than an abstract "well I want it to be like the Java I already
> have"), it is hard to suggest alternatives.
> > This allows reconstruction of the name into a string while at the same
> time expressing the components of the name. So it captures the roles of the
> elements of a name (e.g. given names, family names) *as well as* their
> order (given names aren't first everywhere). Also, it allows expressing
> multiple names. E.g. in multiple languages / scripts. Or even aliases used
> in different areas of the world.
> Since you talk about "given names", it seems to me that you could use
>         :givenNames "Frens Jan"
>
> More specifically, you seem to want to tread an almost impossible line of
> small amount of the knowledge of a person's name, without having anything
> extra.
> If you really want to be able to embrace the multi-cultural stuff of even
> just UK, HUN & ESP, for example, you need to think what you will do with
> people like
> Bartók Béla and our own Ivan Herman, who might also been know as Herman
> Ivan;
> José Plácido Domingo Embil;
> Pablo Ruiz Picasso;
> Sacha Noam Baron Cohen;
>
> I actually have a feeling you can get away with
> :givenNames
> :familyNames
> for quite a while, if you are lucky, but as I said, it will depend on the
> context of your application.
>
> Good luck
> Hugh
>
>
>
> > On 10 Jun 2021, at 18:37, Rumph, Frens Jan <mail@frensjan.nl> wrote:
> >
> > Dear Christophe,
> >
> > Thank you for the pointer. I wasn't aware of this ontology! There are
> some elements missing from the vocabulary, but it comes a long way. But
> knowing that others went down this route is somewhat reassuring.
> >
> > As for the use of blank nodes: agreed, this is not necessary. Given the
> inability to delete them (with SPARQL) I am steering away from them anyway.
> >
> > Best regards,
> > Frens Jan
> >
> > On Thu, Jun 10, 2021 at 1:08 PM Christophe Debruyne <
> christophe.debruyne@gmail.com> wrote:
> > MADS (https://www.loc.gov/standards/mads/rdf/) provides you a way to
> represent parts of a name using a collection. A madsrdf:PersonalName has a
> madsrdf:elementList that refers to a list (thus keeping order). In that
> list, you can have various typed resources with a madsrdf:elementValue
> containing the literals.
> > The nodes do not necessarily have to be blank. So this looks like your
> second approach but using a vocabulary published by the Library of Congres.
> > With my best regards,
> > Christophe
> >
> > On Thu, Jun 10, 2021 at 12:39 PM Martynas Jusevičius <
> martynas@atomgraph.com> wrote:
> > Why is the list syntax ( ) not satisfactofy?
> >
> > On Thu, 10 Jun 2021 at 12.07, Rumph, Frens Jan <mail@frensjan.nl> wrote:
> > Dear readers,
> >
> > I am investigating transitioning an application to use RDF. One
> roadblock is how this application models names of persons. It supports
> straight-forward full names as a single string, but also supports
> decomposed names, e.g. person X has given name *Frens* followed by a second
> given name *Jan* followed by the family name *Rumph*.
> >
> > Note that this is a crosspost of
> https://stackoverflow.com/questions/65982459/rdf-modelling-of-list-of-name-elements.
> I hope to get some more
> >
> > The data structure is something like:
> >
> > ```java
> > enum Role {
> >    ...
> >    GIVEN_NAME,
> >    FAMILY_NAME,
> >    ...
> > }
> >
> > record NameElement(role: Role, value: String) {}
> >
> > record AnnotatedName(NameElement... elements) {}
> > ```
> >
> > in order to be instantiated like:
> >
> > ```java
> > var name = new AnnotatedName(
> >     new NameElement(GIVEN_NAME, "Frens"),
> >     new NameElement(GIVEN_NAME, "Jan"),
> >     new NameElement(FAMILY_NAME, "de Vries")
> > );
> > ```
> >
> > This allows reconstruction of the name into a string while at the same
> time expressing the components of the name. So it captures the roles of the
> elements of a name (e.g. given names, family names) *as well as* their
> order (given names aren't first everywhere). Also, it allows expressing
> multiple names. E.g. in multiple languages / scripts. Or even aliases used
> in different areas of the world.
> >
> > I have toyed around with some RDF constructs, but none are really
> satisfactory:
> >
> > ```turtle
> > # list of strings misusing data types as tags
> > :frens :name ( "Frens"^^:givenName "Jan"^^:givenName "de
> Vries"^^:familyName ) .
> >
> > # list of blank nodes
> > :frens :name ( [ :givenName "Frens" ]
> >                [ :givenName "Jan" ]
> >                [ :familyName "de Vries" ] ) .
> >
> > # single blank node with unordered 'elements'
> > :frens :name [ a           :AnnotatedPersonName ;
> >                :fullName   "Frens Jan de Vries" ;
> >                :givenName  "Frens" ;
> >                :givenName  "Jan" ;
> >                :familyName "de Vries" ] .
> > ```
> >
> > ---
> >
> > **Existing ontologies for HD names?**
> > Is there an existing ontology that covers such 'high fidelity'? FOAF and
> vcard have some relevant properties, but aren't able to capture this level
> of semantics.
> >
> > **Lists?** One major 'blocker' in migrating this approach to RDF is the
> notion of order that is used. If at all possible, I'd like to stay away
> from the List / Container swamp in RDF land ...
> >
> > I'd be grateful for any thoughts on the matter!
> >
> > Best regards,
> > Frens Jan
>
> --
> Hugh
> 023 8061 5652
>
>
Received on Thursday, 10 June 2021 20:44:54 UTC

This archive was generated by hypermail 2.4.0 : Tuesday, 5 July 2022 08:46:09 UTC