- From: Carlos Bobed <cbobed@unizar.es>
- Date: Mon, 10 Oct 2022 11:19:33 +0200
- To: semantic-web@w3.org
- Message-ID: <5dfc41ff-e132-3b0d-08ea-c9528be85b9e@unizar.es>
Hi Adam, El 09/10/2022 a las 9:07, Adam Sobieski escribió: > > Semantic Web Interest Group, > > Embedding vectors can represent many things: words [1], sentences [2], > paragraphs, documents, percepts, concepts, multimedia data, users, and > so forth. > > A few months ago, I started a discussion on GitHub about formal > ontologies for describing these vectors and their models [3]. There, I > also indicated that MIME types for these vectors could be created, > e.g., “embedding/gpt-3” or “vector/gpt-3”. > > For discussion and brainstorming, I would like to share some ideas > with the group. > > Firstly, we can envision machine-utilizable lexicons which, for each > sense of each lexeme, include, refer to, or hyperlink to embedding > vectors. > My two cents: It can be extremely tricky. Embedding vectors by themselves are meaningless if you don't provide all the information about the source: model, training dataset, task trained for (maybe tasks fine-tuned for). First of all, I think we should define more precisely what information is to be shared. I can see the point if you are aiming at providing entry points to fixed embedding spaces (for example, this sense S is represented in Nasari as this vector x ) so as to try to align external elements using a well-known and shared model as anchor. If the goal is to share embeddings by themselves ... even fixing the model and the dataset might have different resulting spaces. BTW, the above mentioned holds for static vectors, if you have dynamic/contextual ones (e.g., BERT-* ones), then the entry point would be a little bit meaningless as the vector will change depending on the accompanying elements. Best, Carlos
Received on Monday, 10 October 2022 09:19:48 UTC