Multimodal Emotion


Semi-Supervised Learning, Noise Robustness, and Open-Vocabulary Emotion Recognition

Multimodal emotion recognition is an active research topic in artificial intelligence. Its main goal is to integrate multi-modalities to identify human emotional states. Current works generally assume accurate emotion labels for benchmark datasets and focus on developing more effective architectures. However, existing technologies are difficult to meet the demand for practical applications. To this end, last year, we launched MER23@ACM Multimedia and MRAC23@ACM Multimedia. In this year, we will continuously holding related workshops and challenges that bring together researchers around the world to further discuss recent research and future directions for robust multimodal emotion recognition.

In this workshop and challenge, we aim to bring together researchers from the fields of multimodal modeling of human affect, modality robustness of affect recognition, low-resource affect recognition, human affect synthesis in multimedia, privacy in affective computing, and applications in health, education, entertainment, etc., to further discuss recent research and future directions for affective computing in multimedia. At the same time, we intend to provide a communication platform for all participants of MER24@IJCAI, to systematically evaluate the robustness of emotion recognition systems and promote applications of this technology in practice.


April 30, 2024: We establish an initial website for MER24@IJCAI and MRAC24@ACM MM

May 7, 2024: For MER-OV, we require that participants cannot use closed-source models (such as GPT or Claude).

May 10, 2024: We update the paper length requirement. The final paper will be accepted into MRAC24@MM.

May 26, 2024: For MER-OV, we update the evaluation code , baseline paper and EMER.

June 18, 2024: For all tracks, the final submitted labels should be in English.

June 18, 2024: For MER-OV, final-openset-*.csv in the provided dataset is the labels extracted from EMER descriptions. check-openset.csv in github is the final checked ground truths.

June 21, 2024: We provide CodaLab links for three tracks. To reduce the difficulty, we limit the evaluation scope to 20,000 samples and provide pre-extracted audio and subtitles for unlabeled data in [link]. We silghtly advance the competition to June 26 and limit the maximum number of submissions per day to 20 times.

MER24 Challenge@IJCAI

Compared with previous MER23, in this year, we enlarge the dataset size by including more labeled and unlabeled samples. Meanwhile, besides MER-SEMI and MER-NOISE, we introduce a new track called MER-OV.

Track 1. MER-SEMI. It is difficult to collect large amounts of samples with emotion labels. To address the problem of data sparseness, researchers revolve around unsupervised or semi-supervised learning and use unlabeled data during training. Furthermore, MERBench [1] points out the necessity of using unlabeled data from the same domain as labeled data. Therefore, we provide a large number of human-centric unlabeled videos in MER2024 and encourage participants to explore more effective unsupervised or semi-supervised learning strategies for better performance.

Track 2. MER-Noise. Noise generally exists in the video. It is hard to guarantee that every video is free of any audio noise and each frame is in high-resolution. To improve noise robustness, researchers have carried out a series of works on emotion recognition under noise conditions. However, there lacks a benchmark dataset to fairly compare different strategies. Therefore, we organize a track around noise robustness. Although there are many types of noise, we consider the two most common ones: audio additive noise and image blur noise. We encourage participants to exploit data augmentation [2] or other techniques [3] to improve the noise robustness of emotion recognition systems.

Track 3. MER-OV. Emotions are subjective and ambiguous. To increase annotation consistency, existing datasets typically limit the label space to a few discrete labels, employ multiple annotators, and use majority voting to select the most likely label. However, this process may cause some correct but non-candidate or non-majority labels to be ignored, resulting in inaccurate annotations. To this end, we introduce a new track on open-vocabulary emotion recognition. We encourage participants to generate any number of labels in any category, trying to describe the emotional state accurately [4].

MathJax Example

Evaluation Metrics. For MER-SEMI and MER-NOISE, we choose two widely used metrics in emotion recognition: accuracy and weighted average F-score (WAF). Considering the inherent class imbalance, we choose WAF as the final ranking. For MER-OV, we draw on our previous work [4], in which we extend traditional classification metrics (i.e., accuracy and recall) and define set-level accuracy and recall. More details can be found in our baseline code.

Dataset. Please download the End User License Agreement, fill it out and send it to to access the data. We will review your application and get in touch as soon as possible. EULA requires participants to use this dataset only for academic research and not to edit or upload samples to the Internet.

Result submission. For MER-SEMI and MER-NOISE, each team should submit the most likely discrete label among the 6 candidate labels (i.e., worried, happy, neutral, angry, surprise, and sad ). For MER-OV, each team can submit any number of labels in any category. For three tracks, participants should predict results of 20,000 samples from 115,595 unlabeled data, although we only evaluate a small subset. To focus on generalization performance rather than optimizing for a specific subset, we do not provide information about which samples belong to the test subset. For MER-OV, participants can only use open-source LLMs, MLLMs, or other models. Closed source models (such as GPT Series or Claude Series models) are not allowed.

CodaLab link for MER2024-SEMI:

CodaLab link for MER2024-NOISE:

CodaLab link for MER2024-OV:

Note: please register on Codalab using the email provided on the EULA or the email where you sent the EULA. The rankings of MER2024-SEMI and MER2024-NOISE are based on the CodaLab leaderboard. But for MER2024-OV, CodaLab is only used for format check. Each team can submit five times to the official email, the best performance among them used for final ranking​​. Due to slightly randomness of GPT-3.5, we will run the evaluation code five times and report the average score. The ranking of MER2024-OV track will be announced a few weeks after the competition ends.

Paper submission. All participants are encouraged to submit a paper describing their solution to the MRAC24 Workshop@ACM Multimedia. Top-5 teams in each track MUST submit a paper. Top-3 winning teams in each track will be awarded with a certificate. Paper sumbission link:

Baseline paper:
Baseline code:
Contact email:

[1] Zheng Lian, Licai Sun, Yong Ren, Hao Gu, Haiyang Sun, Lan Chen, Bin Liu, and Jianhua Tao. Merbench: A unified evaluation benchmark for multimodal emotion recognition. arXiv preprint arXiv:2401.03429, 2024.
[2] Devamanyu Hazarika, Yingting Li, Bo Cheng, Shuai Zhao, Roger Zimmermann, and Soujanya Poria. Analyzing modality robustness in multimodal sentiment analysis. In Proceedings of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 685–696, 2022.
[3] Zheng Lian, Lan Chen, Licai Sun, Bin Liu, and Jianhua Tao. Gcnet: Graph completion network for incomplete multimodal learning in conversation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(07):8419–8432, 2023.
[4] Zheng Lian, Licai Sun, Haiyang Sun, Hao Gu, Zhuofan Wen, Siyuan Zhang, Shun Chen, Mingyu Xu, Ke Xu, Lan Chen, Jiangyan Yi, Bin Liu, and Jianhua Tao. Explainable Multimodal Emotion Reasoning. arXiv preprint arXiv:2306.15401, 2023.
[5] Zheng Lian, Haiyang Sun, Licai Sun, Zhuofan Wen, Siyuan Zhang, Shun Chen, Hao Gu, Jinming Zhao, Ziyang Ma, Xie Chen, Jiangyan Yi, Rui Liu, Kele Xu, Bin Liu, Erik Cambria, Guoying Zhao, Björn W. Schuller, and Jianhua Tao. MER 2024: Semi-Supervised Learning, Noise Robustness, and Open-Vocabulary Multimodal Emotion Recognition. arXiv preprint arXiv:2404.17113, 2024.

MRAC24 Workshop@ACM MM

Besides papers for the MER24 Challenge, we also invite submissions on any aspect of multimodal emotion recognition and synthesis in deep learning. Topics include but not limited to:

  • Multimodal modeling of human affect
  • Modality robustness of affect recognition
  • LLM-based and MLLM-based emotion recognition
  • Open-set emotion recognition
  • Low-resource affect recognition
  • Contextualized modeling of affective states
  • Human affect synthesis in multimedia
  • Privacy in affective computing
  • Explainable, fair, trustworthy in affective computing
  • Applications in health, education, entertainment, etc.

Format: Submitted papers (.pdf format) must use the ACM Article Template: paper template. Please use the template in traditional double-column format to prepare your submissions. For example, word users may use Word Interim Template, and latex users may use sample-sigconf-authordraft template. When using sample-sigconf-authordraft template, please comment all the author information for submission and review of manuscript, instead of changing the documentclass command to '\documentclass[manuscript, screen, review]{acmart}' as told by instructions. Please ensure that you submit your papers subscribing to this format for full consideration during the review process.

Length: The manuscript’s length is limited to one of the two options: a) 4 pages plus 1-page reference; or b) 8 pages plus up to 2-page reference. The reference pages must only contain references. Overlength papers will be rejected without review. Papers should be single-blind. We do not allow appendix that follow right after the main paper in the main submission file.

Peer Review and publication in ACM Digital Library: Paper submissions must conform with the “double-blind” review policy. All papers will be peer-reviewed by experts in the field, they will receive at least two reviews. Acceptance will be based on relevance to the workshop, scientific novelty, and technical quality. The workshop papers will be published in the ACM Digital Library.

Contact email:



April 30, 2024: Data, baseline paper & code available

June 26, 2024: Results submission start

July 10, 2024: Results submission deadline

July 20, 2024: Paper submission deadline

August 5, 2024: Paper acceptance notification

August 18, 2024: Deadline for camera-ready papers

October 28, 2024: MRAC24 workshop@ACM MM

All submission deadlines are at 23:59 Anywhere on Earth (AoE).


Coming soon



Jianhua Tao

Tsinghua University


Zheng Lian

Institute of Automation, Chinese Academy of Sciences (CASIA)


Björn W. Schuller

Imperial College London


Guoying Zhao

University of Oulu


Erik Cambria

Nanyang Technological University


Data Chairs


Bin Liu



Rui Liu

Inner Mongolia University


Kele Xu

National University of Defense Technology

Program Committee

Xiaobai Li

Zhejiang University


Zixing Zhang

Hunan University


Jianfei Yu

Nanjing University of Science and Technology


Ya Li

Beijing University of Posts and Telecommunications

Mengyue Wu

Shanghai Jiao Tong University


Jing Han

University of Cambridge


Jinming Zhao

Renmin University of China


Mingyue Niu

Yanshan University


Yongwei Li

Institute of Automation, Chinese Academy of Sciences


Licai Sun

University of Chinese Academy of Sciences