Re: Semantics and Embedding Vectors from Mike Prorock on 2022-10-10 (semantic-web@w3.org from October 2022)

From: Mike Prorock <mprorock@mesur.io>
Date: Mon, 10 Oct 2022 05:01:48 -0600
To: Carlos Bobed <cbobed@unizar.es>
Cc: semantic-web@w3.org
Message-ID: <CAGJKSNT4M8QL=iYPr4JKQbDEzcOcbQY5w7JDLO64NErqo9wfLw@mail.gmail.com>

+1 Carlos - definitely a tricky thing

One tactic we are taking is to take a similar approach to space/time
transformers, and use semantic tags as an additional piece of information,
like a positional embedding.  Too early to talk results on this, vs other
approaches that preprocess rdf or json-ld into more language like
constructs to train off of knowledge bases in rdf.

Mike Prorock
mesur.io

On Mon, Oct 10, 2022, 03:20 Carlos Bobed <cbobed@unizar.es> wrote:

> Hi Adam,
> El 09/10/2022 a las 9:07, Adam Sobieski escribió:
>
> Semantic Web Interest Group,
>
>
>
> Embedding vectors can represent many things: words [1], sentences [2],
> paragraphs, documents, percepts, concepts, multimedia data, users, and so
> forth.
>
>
>
> A few months ago, I started a discussion on GitHub about formal ontologies
> for describing these vectors and their models [3]. There, I also indicated
> that MIME types for these vectors could be created, e.g., “embedding/gpt-3”
> or “vector/gpt-3”.
>
>
>
> For discussion and brainstorming, I would like to share some ideas with
> the group.
>
>
>
> Firstly, we can envision machine-utilizable lexicons which, for each sense
> of each lexeme, include, refer to, or hyperlink to embedding vectors.
>
> My two cents: It can be extremely tricky. Embedding vectors by themselves
> are meaningless if you don't provide all the information about the source:
> model, training dataset, task trained for (maybe tasks fine-tuned for).
> First of all, I think we should define more precisely what information is
> to be shared.
>
> I can see the point if you are aiming at providing entry points to fixed
> embedding spaces (for example, this sense S is represented in Nasari as
> this vector x ) so as to try to align external elements using a well-known
> and shared model as anchor. If the goal is to share embeddings by
> themselves ... even fixing the model and the dataset might have different
> resulting spaces.
>
> BTW, the above mentioned holds for static vectors, if you have
> dynamic/contextual ones (e.g., BERT-* ones), then the entry point would be
> a little bit meaningless as the vector will change depending on the
> accompanying elements.
>
> Best,
>
> Carlos
>
>

Received on Monday, 10 October 2022 11:02:13 UTC