Computer Vision, Scene Understanding and Knowledgebases

Semantic Web Interest Group,

There have been some exciting advancements in computer vision. I have been recently reading about the topics of: automatic image captioning, video description, visual question answering, visual concept learning and visual reasoning (e.g. [1]), video question answering, story and narrative question answering, and visual dialogue systems (e.g. [2][3]).

It would be interesting to have a knowledgebase interface to the contents of understood scenes as viewed by one or more cameras or sensors. Such a knowledgebase might have events on its interface or on the interface for its query results to support the dynamic nature of scene contents.

For example, as a yellow ball enters a cameras’ field of view, an event could be raised that a new object has entered the knowledgebase. If one had an open query on the knowledgebase for recognized objects, for instance, the query results would update, an event would be raised, and an on-screen view of data would update.

Interestingly, the visual attention of an intelligent system could be guided, in part, by the specific open queries on such a knowledgebase interface.

Has anyone else considered such scenarios combining computer vision and scene understanding with knowledgebase and semantic technology? Might there be any publications on these topics to recommend?


Best regards,
Adam Sobieski
http://www.phoster.com/contents/

[1] Mao, Jiayuan, Chuang Gan, Pushmeet Kohli, Joshua B. Tenenbaum, and Jiajun Wu. "The neuro-symbolic concept learner: Interpreting scenes, words, and sentences from natural supervision." arXiv preprint arXiv:1904.12584 (2019).
[2] Das, Abhishek, Satwik Kottur, Khushi Gupta, Avi Singh, Deshraj Yadav, José MF Moura, Devi Parikh, and Dhruv Batra. "Visual dialog." In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 326-335. 2017.
[3] Strub, Florian, Harm De Vries, Jeremie Mary, Bilal Piot, Aaron Courville, and Olivier Pietquin. "End-to-end optimization of goal-driven and visually grounded dialogue systems." arXiv preprint arXiv:1703.05423 (2017).

Received on Friday, 31 January 2020 01:55:03 UTC