Institute of Computing Technology,
Chinese Academy of Sciences
Email: yuchenwen1@gmail.com
I am a second-year Ph.D. student in Computer Science and Technology at the Institute of Computing Technology (ICT), Chinese Academy of Sciences (CAS) supervised by Prof. Wei Chen and Prof. Keping Bi. Before that, I received my B.Eng. degree in Software Engineering from Nankai University (NKU) in June 2023 as an Outstanding Graduate. I was admitted to the direct Ph.D. program as the highest-ranking student of the whole laboratory then.
Research Interests: My research interests include AI Safety, (Multi-Modal) Large Language Models, and Natural Language Processing. Specifically, my current research focuses on:I am always willing to introduce my research topics and explore new ones! If you are interested in collaboration or have any inquiries about my research, please contact me. 😀
As large language models (LLMs) become an important way of information access, there have been increasing concerns that LLMs may intensify the spread of unethical content, including implicit bias that hurts certain populations without explicit harmful words. In this paper, we conduct a rigorous evaluation of LLMs’ implicit bias towards certain demographics by attacking them from a psychometric perspective to elicit agreements to biased viewpoints. Inspired by psychometric principles in cognitive and social psychology, we propose three attack approaches, i.e., Disguise, Deception, and Teaching. Incorporating the corresponding attack instructions, we built two benchmarks: (1) a bilingual dataset with biased statements covering four bias types (2.7K instances) for extensive comparative analysis, and (2) BUMBLE, a larger benchmark spanning nine common bias types (12.7K instances) for comprehensive evaluation. Extensive evaluation of popular commercial and open-source LLMs shows that our methods can elicit LLMs’ inner bias more effectively than competitive baselines. Our attack methodology and benchmarks offer an effective means of assessing the ethical risks of LLMs, driving progress toward greater accountability in their development. Our code, data and benchmarks are available at https://yuchenwen1.github.io/ImplicitBiasEvaluation/. |
@article{wen2024evaluating, title={Evaluating Implicit Bias in Large Language Models by Attacking From a Psychometric Perspective}, author={Wen, Yuchen and Bi, Keping and Chen, Wei and Guo, Jiafeng and Cheng, Xueqi}, journal={arXiv preprint arXiv:2406.14023}, year={2024} }
Institute of Computing Technology, Chinese Academy of Sciences, 2023.09 ~ Present
State Key Laboratory of AI Safety
Ph.D. student in Computer Science and Technology, supervised by Prof. Wei Chen and Prof. Keping Bi.
Nankai University, 2019.08 ~ 2023.06
College of Software
B.Eng. in Software Engineering
National Scholarship(top 2%), 2022.12
Tianjin Governmental Scholarship(top 0.5%), 2021.12
Outstanding Undergraduate Thesis of NKU(top 3%), 2023.06
Outstanding Graduate of NKU(top 3%), 2023.06