Re: Semantics and Embedding Vectors from Carlos Bobed on 2022-10-10 (semantic-web@w3.org from October 2022)

From: Carlos Bobed <cbobed@unizar.es>
Date: Mon, 10 Oct 2022 11:19:33 +0200
To: semantic-web@w3.org
Message-ID: <5dfc41ff-e132-3b0d-08ea-c9528be85b9e@unizar.es>

Hi Adam,

El 09/10/2022 a las 9:07, Adam Sobieski escribió:
>
> Semantic Web Interest Group,
>
> Embedding vectors can represent many things: words [1], sentences [2], 
> paragraphs, documents, percepts, concepts, multimedia data, users, and 
> so forth.
>
> A few months ago, I started a discussion on GitHub about formal 
> ontologies for describing these vectors and their models [3]. There, I 
> also indicated that MIME types for these vectors could be created, 
> e.g., “embedding/gpt-3” or “vector/gpt-3”.
>
> For discussion and brainstorming, I would like to share some ideas 
> with the group.
>
> Firstly, we can envision machine-utilizable lexicons which, for each 
> sense of each lexeme, include, refer to, or hyperlink to embedding 
> vectors.
>
My two cents: It can be extremely tricky. Embedding vectors by 
themselves are meaningless if you don't provide all the information 
about the source: model, training dataset, task trained for (maybe tasks 
fine-tuned for). First of all, I think we should define more precisely 
what information is to be shared.

I can see the point if you are aiming at providing entry points to fixed 
embedding spaces (for example, this sense S is represented in Nasari as 
this vector x ) so as to try to align external elements using a 
well-known and shared model as anchor. If the goal is to share 
embeddings by themselves ... even fixing the model and the dataset might 
have different resulting spaces.

BTW, the above mentioned holds for static vectors, if you have 
dynamic/contextual ones (e.g., BERT-* ones), then the entry point would 
be a little bit meaningless as the vector will change depending on the 
accompanying elements.

Best,

Carlos

Received on Monday, 10 October 2022 09:19:48 UTC