Multimodal Emotion Recognition System

A real-time pipeline that fuses CNN-based facial features with NLP sentiment embeddings to predict emotion.

An applied computer-vision and NLP system that predicts emotion in real time by combining what a person looks like with what they say.

What it does

  • Builds a multimodal pipeline fusing CNN-based facial feature extraction with NLP text-sentiment embeddings for real-time emotion prediction.
  • Applies preprocessing optimisations and hyperparameter tuning to improve cross-modal fusion performance.

Tech stack: Python · CNNs · NLP · computer vision