Ph.D. Student
Institute of Computing Technology
Chinese Academy of Sciences
Email: yuchenwen1@gmail.com
I am a second-year Ph.D. student in Computer Science at Institute of Computing Technology (ICT), Chinese Academy of Sciences (CAS) supervised by Prof. Wei Chen and Prof. Keping Bi. Before that, I received my B.E. degree in Software Engineering from Nankai University (NKU) in June 2023 as an Outstanding Graduate.
Research Interests: My primary research interests include natural language processing (NLP), large language models (LLMs), and AI Safety. Specifically, my current research focuses on:I am willing to collaborate and/or answer any questions about my research. If you are interested in research collaboration or have any inquiries about my experience, please don't hesitate to contact me.
As Large Language Models (LLMs) become an important way of information seeking, there have been increasing concerns about the unethical content LLMs may generate. In this paper, we conduct a rigorous evaluation of LLMs' implicit bias towards certain groups by attacking them with carefully crafted instructions to elicit biased responses. Our attack methodology is inspired by psychometric principles in cognitive and social psychology. We propose three attack approaches, i.e., Disguise, Deception, and Teaching, based on which we built evaluation datasets for four common bias types. Each prompt attack has bilingual versions. Extensive evaluation of representative LLMs shows that 1) all three attack methods work effectively, especially the Deception attacks; 2) GLM-3 performs the best in defending our attacks, compared to GPT-3.5 and GPT-4; 3) LLMs could output content of other bias types when being taught with one type of bias. Our methodology provides a rigorous and effective way of evaluating LLMs' implicit bias and will benefit the assessments of LLMs' potential ethical risks. Code and datasets are available at https://github.com/yuchenwen1/ImplicitBiasPsychometricEvaluation. |
@article{wen2024evaluating, title={Evaluating Implicit Bias in Large Language Models by Attacking From a Psychometric Perspective}, author={Wen, Yuchen and Bi, Keping and Chen, Wei and Guo, Jiafeng and Cheng, Xueqi}, journal={arXiv preprint arXiv:2406.14023}, year={2024} }
University of Chinese Academy of Sciences , 2023.09 - Present
Institute of Computing Technology
Ph.D. student in Computer Science and Technology, supervised by Prof. Wei Chen.
Nankai University , 2019.08 - 2023.06
College of Software
B.E. in Software Engineering
National Scholarship(top 2%) - 2022.12
Tianjin Governmental Scholarship(top 0.5%) - 2021.12
Outstanding Undergraduate Thesis of NKU(top 3%) - 2023.06
Outstanding Graduate(top 3%) - 2023.06