- From: Mike Prorock <mprorock@mesur.io>
- Date: Mon, 10 Oct 2022 05:01:48 -0600
- To: Carlos Bobed <cbobed@unizar.es>
- Cc: semantic-web@w3.org
- Message-ID: <CAGJKSNT4M8QL=iYPr4JKQbDEzcOcbQY5w7JDLO64NErqo9wfLw@mail.gmail.com>
+1 Carlos - definitely a tricky thing One tactic we are taking is to take a similar approach to space/time transformers, and use semantic tags as an additional piece of information, like a positional embedding. Too early to talk results on this, vs other approaches that preprocess rdf or json-ld into more language like constructs to train off of knowledge bases in rdf. Mike Prorock mesur.io On Mon, Oct 10, 2022, 03:20 Carlos Bobed <cbobed@unizar.es> wrote: > Hi Adam, > El 09/10/2022 a las 9:07, Adam Sobieski escribió: > > Semantic Web Interest Group, > > > > Embedding vectors can represent many things: words [1], sentences [2], > paragraphs, documents, percepts, concepts, multimedia data, users, and so > forth. > > > > A few months ago, I started a discussion on GitHub about formal ontologies > for describing these vectors and their models [3]. There, I also indicated > that MIME types for these vectors could be created, e.g., “embedding/gpt-3” > or “vector/gpt-3”. > > > > For discussion and brainstorming, I would like to share some ideas with > the group. > > > > Firstly, we can envision machine-utilizable lexicons which, for each sense > of each lexeme, include, refer to, or hyperlink to embedding vectors. > > My two cents: It can be extremely tricky. Embedding vectors by themselves > are meaningless if you don't provide all the information about the source: > model, training dataset, task trained for (maybe tasks fine-tuned for). > First of all, I think we should define more precisely what information is > to be shared. > > I can see the point if you are aiming at providing entry points to fixed > embedding spaces (for example, this sense S is represented in Nasari as > this vector x ) so as to try to align external elements using a well-known > and shared model as anchor. If the goal is to share embeddings by > themselves ... even fixing the model and the dataset might have different > resulting spaces. > > BTW, the above mentioned holds for static vectors, if you have > dynamic/contextual ones (e.g., BERT-* ones), then the entry point would be > a little bit meaningless as the vector will change depending on the > accompanying elements. > > Best, > > Carlos > >
Received on Monday, 10 October 2022 11:02:13 UTC