MER24@IJCAI and MRAC24@ACM MM




Multimodal Emotion


Recognition




Semi-Supervised Learning, Noise Robustness, and Open-Vocabulary Emotion Recognition



Multimodal emotion recognition is an active research topic in artificial intelligence. Its main goal is to integrate multi-modalities to identify human emotional states. Current works generally assume accurate emotion labels for benchmark datasets and focus on developing more effective architectures. However, existing technologies are difficult to meet the demand for practical applications. To this end, last year, we launched MER23@ACM Multimedia and MRAC23@ACM Multimedia. In this year, we will continuously holding related workshops and challenges that bring together researchers around the world to further discuss recent research and future directions for robust multimodal emotion recognition.

In this workshop and challenge, we aim to bring together researchers from the fields of multimodal modeling of human affect, modality robustness of affect recognition, low-resource affect recognition, human affect synthesis in multimedia, privacy in affective computing, and applications in health, education, entertainment, etc., to further discuss recent research and future directions for affective computing in multimedia. At the same time, we intend to provide a communication platform for all participants of MER24@IJCAI, to systematically evaluate the robustness of emotion recognition systems and promote applications of this technology in practice.

News

April 30, 2024: We establish an initial website for MER24@IJCAI and MRAC24@ACM MM

May 7, 2024: For MER-OV, we require that participants cannot use closed-source models (such as GPT or Claude).

May 26, 2024: For MER-OV, we update the evaluation code , baseline paper and EMER.

June 18, 2024: For all tracks, the final submitted labels should be in English.

June 18, 2024: For MER-OV, final-openset-*.csv in the provided dataset is the labels extracted from EMER descriptions. check-openset.csv in github is the final checked ground truths.

June 21, 2024: We provide CodaLab links for three tracks. To reduce the difficulty, we limit the evaluation scope to 20,000 samples and provide pre-extracted audio and subtitles for unlabeled data in [link].

July 10, 2024: The competition results submission deadline has now passed. We would like to express our sincere gratitude for your support during MER24. We encourage all participants to submit papers to MRAC24@ACM Multimedia.

Sep 17, 2024: All accepted papers are listed on this website.

Oct 10, 2024: We release the workshop schedule. Hope to see you in Melbourne.

Oct 25, 2024: We share the Zoom link for our workshop. We welcome anyone who is interested in attending but cannot join us in person. (Meeting ID: 534 244 0490 Passcode: mer2024)

MER24 Challenge@IJCAI

Compared with previous MER23, in this year, we enlarge the dataset size by including more labeled and unlabeled samples. Meanwhile, besides MER-SEMI and MER-NOISE, we introduce a new track called MER-OV.


Track 1. MER-SEMI. It is difficult to collect large amounts of samples with emotion labels. To address the problem of data sparseness, researchers revolve around unsupervised or semi-supervised learning and use unlabeled data during training. Furthermore, MERBench [1] points out the necessity of using unlabeled data from the same domain as labeled data. Therefore, we provide a large number of human-centric unlabeled videos in MER2024 and encourage participants to explore more effective unsupervised or semi-supervised learning strategies for better performance.

Track 2. MER-Noise. Noise generally exists in the video. It is hard to guarantee that every video is free of any audio noise and each frame is in high-resolution. To improve noise robustness, researchers have carried out a series of works on emotion recognition under noise conditions. However, there lacks a benchmark dataset to fairly compare different strategies. Therefore, we organize a track around noise robustness. Although there are many types of noise, we consider the two most common ones: audio additive noise and image blur noise. We encourage participants to exploit data augmentation [2] or other techniques [3] to improve the noise robustness of emotion recognition systems.

Track 3. MER-OV. Emotions are subjective and ambiguous. To increase annotation consistency, existing datasets typically limit the label space to a few discrete labels, employ multiple annotators, and use majority voting to select the most likely label. However, this process may cause some correct but non-candidate or non-majority labels to be ignored, resulting in inaccurate annotations. To this end, we introduce a new track on open-vocabulary emotion recognition. We encourage participants to generate any number of labels in any category, trying to describe the emotional state accurately [4].


MathJax Example

Evaluation Metrics. For MER-SEMI and MER-NOISE, we choose two widely used metrics in emotion recognition: accuracy and weighted average F-score (WAF). Considering the inherent class imbalance, we choose WAF as the final ranking. For MER-OV, we draw on our previous work [4], in which we extend traditional classification metrics (i.e., accuracy and recall) and define set-level accuracy and recall. More details can be found in our baseline code.


Dataset. Please download the End User License Agreement, fill it out and send it to merchallenge.contact@gmail.com to access the data. We will review your application and get in touch as soon as possible. EULA requires participants to use this dataset only for academic research and not to edit or upload samples to the Internet.


Result submission. For MER-SEMI and MER-NOISE, each team should submit the most likely discrete label among the 6 candidate labels (i.e., worried, happy, neutral, angry, surprise, and sad ). For MER-OV, each team can submit any number of labels in any category. For three tracks, participants should predict results of 20,000 samples from 115,595 unlabeled data, although we only evaluate a small subset. To focus on generalization performance rather than optimizing for a specific subset, we do not provide information about which samples belong to the test subset. For MER-OV, participants can only use open-source LLMs, MLLMs, or other models. Closed source models (such as GPT Series or Claude Series models) are not allowed.

CodaLab link for MER2024-SEMI: https://codalab.lisn.upsaclay.fr/competitions/19437

CodaLab link for MER2024-NOISE: https://codalab.lisn.upsaclay.fr/competitions/19438

CodaLab link for MER2024-OV: https://codalab.lisn.upsaclay.fr/competitions/19439

Note: please register on Codalab using the email provided on the EULA or the email where you sent the EULA. The rankings of MER2024-SEMI and MER2024-NOISE are based on the CodaLab leaderboard. But for MER2024-OV, CodaLab is only used for format check. Each team can submit five times to the official email, the best performance among them used for final ranking​​. Due to slightly randomness of GPT-3.5, we will run the evaluation code five times and report the average score. The ranking of MER2024-OV track will be announced a few weeks after the competition ends.


Paper submission. All participants are encouraged to submit a paper describing their solution to the MRAC24 Workshop@ACM Multimedia. Top-5 teams in each track MUST submit a paper. Top-3 winning teams in each track will be awarded with a certificate. Paper sumbission link: https://cmt3.research.microsoft.com/MRAC2024


Baseline paper: https://arxiv.org/abs/2404.17113
Baseline code: https://github.com/zeroQiaoba/MERTools/tree/master/MER2024
Contact email: merchallenge.contact@gmail.com



[1] Zheng Lian, Licai Sun, Yong Ren, Hao Gu, Haiyang Sun, Lan Chen, Bin Liu, and Jianhua Tao. Merbench: A unified evaluation benchmark for multimodal emotion recognition. arXiv preprint arXiv:2401.03429, 2024.
[2] Devamanyu Hazarika, Yingting Li, Bo Cheng, Shuai Zhao, Roger Zimmermann, and Soujanya Poria. Analyzing modality robustness in multimodal sentiment analysis. In Proceedings of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 685–696, 2022.
[3] Zheng Lian, Lan Chen, Licai Sun, Bin Liu, and Jianhua Tao. Gcnet: Graph completion network for incomplete multimodal learning in conversation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(07):8419–8432, 2023.
[4] Zheng Lian, Haiyang Sun, Licai Sun, Hao Gu, Zhuofan Wen, Siyuan Zhang, Shun Chen, Mingyu Xu, Ke Xu, Kang Chen, Lan Chen, Shan Liang, Ya Li, Jiangyan Yi, Bin Liu, and Jianhua Tao. Explainable Multimodal Emotion Recognition. arXiv preprint arXiv:2306.15401, 2023.
[5] Zheng Lian, Haiyang Sun, Licai Sun, Zhuofan Wen, Siyuan Zhang, Shun Chen, Hao Gu, Jinming Zhao, Ziyang Ma, Xie Chen, Jiangyan Yi, Rui Liu, Kele Xu, Bin Liu, Erik Cambria, Guoying Zhao, Björn W. Schuller, and Jianhua Tao. MER 2024: Semi-Supervised Learning, Noise Robustness, and Open-Vocabulary Multimodal Emotion Recognition. arXiv preprint arXiv:2404.17113, 2024.

MRAC24 Workshop@ACM MM



Besides papers for the MER24 Challenge, we also invite submissions on any aspect of multimodal emotion recognition and synthesis in deep learning. Topics include but not limited to:

  • Multimodal modeling of human affect
  • Modality robustness of affect recognition
  • LLM-based and MLLM-based emotion recognition
  • Open-set emotion recognition
  • Low-resource affect recognition
  • Contextualized modeling of affective states
  • Human affect synthesis in multimedia
  • Privacy in affective computing
  • Explainable, fair, trustworthy in affective computing
  • Applications in health, education, entertainment, etc.


Format: Submitted papers (.pdf format) must use the ACM Article Template: paper template. Please use the template in traditional double-column format to prepare your submissions. For example, word users may use Word Interim Template, and latex users may use sample-sigconf-authordraft template. When using sample-sigconf-authordraft template, please comment all the author information for submission and review of manuscript, instead of changing the documentclass command to '\documentclass[manuscript, screen, review]{acmart}' as told by instructions. Please ensure that you submit your papers subscribing to this format for full consideration during the review process.

Length: The manuscript’s length is limited to one of the two options: a) 4 pages plus 1-page reference; or b) 8 pages plus up to 2-page reference. The reference pages must only contain references. Overlength papers will be rejected without review. Papers should be single-blind. We do not allow appendix that follow right after the main paper in the main submission file.

Peer Review and publication in ACM Digital Library: Paper submissions must conform with the “double-blind” review policy. All papers will be peer-reviewed by experts in the field, they will receive at least two reviews. Acceptance will be based on relevance to the workshop, scientific novelty, and technical quality. The workshop papers will be published in the ACM Digital Library.

Accepted Papers:
[Baseline] MER 2024: Semi-Supervised Learning, Noise Robustness, and Open-Vocabulary Multimodal Emotion Recognition.
Zheng Lian, Haiyang Sun, Licai Sun, Zhuofan Wen, Siyuan Zhang, Shun Chen, Hao Gu, Jinming Zhao, Ziyang Ma, Xie Chen, Jiangyan Yi, Rui Liu, Kele Xu, Bin Liu, Erik Cambria, Guoying Zhao, Björn W. Schuller, Jianhua Tao

[MER-SEMI] Multimodal Emotion Recognition with Vision-language Prompting and Modality Dropout.
Anbin Qi, Zhongliang Liu, Xinyong Zhou, Jinba Xiao, Fengrun Zhang, Qi Gan, Ming Tao, Gaozheng Zhang, Lu Zhang
[MER-SEMI] Early Joint Learning of Emotion Information Makes MultiModal Model Understand You Better.
Mengying Ge, Mingyang Li, Dongkai Tang, Pengbo Li, Kuo Liu, Shuhao Deng, Songbai Pu, Long Liu, Yang Song, Tao Zhang
[MER-SEMI] Audio-Guided Fusion Techniques for Multimodal Emotion Analysis.
Pujin Shi, Fei Gao
[MER-SEMI] Improving Multimodal Emotion Recognition by Leveraging Acoustic Adaptation and Visual Alignment.
Zhixian Zhao, Haifeng Chen, Xi Li, Dongmei Jiang, Lei Xie
[MER-SEMI] Leveraging Contrastive Learning and Self-Training for Multimodal Emotion Recognition with Limited Labeled Samples.
Qi Fan, Yutong Li, Yi Xin, Xinyu Cheng, Guanglai Gao, Miao Ma

[MER-NOISE] SZTU-CMU at MER2024: Improving Emotion-LLaMA with Conv-Attention for Multimodal Emotion Recognition.
Zebang Cheng, Shuyuan Tu, Dawei Huang, Minghan Li, Xiaojiang Peng, Zhi-Qi Cheng, Alexander G. Hauptmann
[MER-NOISE] Early Joint Learning of Emotion Information Makes MultiModal Model Understand You Better.
Mengying Ge, Mingyang Li, Dongkai Tang, Pengbo Li, Kuo Liu, Shuhao Deng, Songbai Pu, Long Liu, Yang Song, Tao Zhang
[MER-NOISE] Multimodal Blockwise Transformer for Robust Sentiment Recognition.
Zhengqin Lai, Xiaopeng Hong, Yabin Wang
[MER-NOISE] Robust Representation Learning for Multimodal Emotion Recognition with Contrastive Learning and Mixup.
Yunrui Cai, Runchuan Ye, Jingran Xie, Yaoxun Xu, Yixuan Zhou, Zhiyong Wu

[MER-OV] Video Emotion Open-vocabulary Recognition Based on Multimodal Large Language Model.
Mengying Ge, Dongkai Tang, Mingyang Li
[MER-OV] Open Vocabulary Emotion Prediction Based on Large Multimodal Models.
Zixing Zhang, Zhongren Dong, Zhiqiang Gao, Shihao Gao, Donghao Wang, Ciqiang Chen, Yuhan Nie, Huan Zhao
[MER-OV] SZTU-CMU at MER2024: Improving Emotion-LLaMA with Conv-Attention for Multimodal Emotion Recognition.
Zebang Cheng, Shuyuan Tu, Dawei Huang, Minghan Li, Xiaojiang Peng, Zhi-Qi Cheng, Alexander G. Hauptmann
[MER-OV] Multimodal Emotion Captioning Using Large Language Model with Prompt Engineering.
Yaoxun Xu, Yixuan Zhou, Yunrui Cai, Jingran Xie, Runchuan Ye, Zhiyong Wu

[Workshop] MicroEmo: Time-Sensitive Multimodal Emotion Recognition with Subtle Clue Dynamics in Video Dialogues.
Liyun Zhang, Zhaojie Luo, Shuqiong Wu, Yuta Nakashima
[Workshop] Learning Noise-Robust Joint Representation for Multimodal Emotion Recognition under Incomplete Data Scenarios.
Qi Fan, Haolin Zuo, Rui Liu, Zheng Lian, Guanglai Gao

Contact email: merchallenge.contact@gmail.com

Schedule

 

April 30, 2024: Data, baseline paper & code available

June 26, 2024: Results submission start

July 10, 2024: Results submission deadline

July 20, 2024: Paper submission deadline

August 5, 2024: Paper acceptance notification

August 18, 2024: Deadline for camera-ready papers

November 1 PM, 2024: MRAC24 workshop@ACM MM


All submission deadlines are at 23:59 Anywhere on Earth (AoE).


Each speaker is required to attend in person, and each paper will be presented orally for a total of 10 minutes (8 minutes for the presentation and 2 minutes for questions).

Zoom Meeting ID: 534 244 0490 Passcode: mer2024

schedule

Speakers

Zitong Yu

Assistant Professor
Great Bay University

Title: Facial Physiological and Emotional Analysis.

Besides biometric info, there are rich physiological and emotional clues from human faces. Thanks to the rapid development of the AI/CV community, instead of handcrafted expert-level features, many vision foundation models and efficient learning methods are designed for subtle facial physiological and emotional analysis. This talk will explore the technological advancements in facial video-based remote photoplethysmography (rPPG), facial action unit (AU) detection, and emotion recognition systems. Furthermore, a downstream real-world application on deception detection will be introduced. Finally, some challenges and future directions will be discussed.

ORGANISERS

 

Jianhua Tao

Tsinghua University

 

Zheng Lian

Institute of Automation, Chinese Academy of Sciences

 

Björn W. Schuller

Imperial College London

 
 

Guoying Zhao

University of Oulu

 

Erik Cambria

Nanyang Technological University

 

Data Chairs

 

Bin Liu

Institute of Automation, Chinese Academy of Sciences

 

Rui Liu

Inner Mongolia University

 

Kele Xu

National University of Defense Technology

Program Committee



Xiaobai Li

Zhejiang University

 

Zixing Zhang

Hunan University

 

Jianfei Yu

Nanjing University of Science and Technology

 

Ya Li

Beijing University of Posts and Telecommunications

Mengyue Wu

Shanghai Jiao Tong University

 

Jing Han

University of Cambridge

 

Jinming Zhao

Qiyuan Lab

 

Mingyue Niu

Yanshan University

 
 

Yongwei Li

Institute of Psychology, Chinese Academy of Sciences

 

Licai Sun

University of Oulu