viseme vs phoneme KR

The article pointed out by Dave  [1] describes how to detect deepfakes
using
viseme and phoneme, which are KR constructs, as explained in [2]

[1] Agarwal, Shruti, et al. "Detecting deep-fake videos from phoneme-viseme
mismatches." *Proceedings of the IEEE/CVF conference on computer vision and
pattern recognition workshops*. 2020
We describe a technique to detect such
manipulated videos by exploiting the fact that the dynamics
of the mouth shape – visemes – are occasionally inconsis-
tent with a spoken phoneme. We focus on the visemes as-
sociated with words having the sound M (mama), B (baba),
or P (papa) in which the mouth must completely close in
order to pronounce these phonemes. We observe that this
is not the case in many deep-fake videos. Such phoneme-
viseme mismatches can, therefore, be used to detect even
spatially small and temporally localized manipulations.

[2] A. Metallinou, C. Busso, S. Lee and S. Narayanan, "Visual emotion
recognition using compact facial representations and viseme information," *2010
IEEE International Conference on Acoustics, Speech and Signal Processing*,
2010, pp. 2474-2477, doi: 10.1109/ICASSP.2010.5494893.
https://ieeexplore.ieee.org/abstract/document/5494893
*We derive compact facial representations using methods motivated by
Principal Component Analysis and speaker face normalization. Moreover, we
model emotional facial movements by conditioning on knowledge of
speech-related movements (articulation)*.

Received on Saturday, 5 November 2022 02:08:56 UTC