Machine Learning Approach to Personality Assessment and Its Application to Personnel Selection: A Brief Review of the Current Research and Suggestions for the Future

As we enter the digital age, new methods of personality testing-namely, machine learning-based personality assessment scales-are quickly gaining attraction. Because machine learning-based personality assessments are made based on algorithms that analyze digital footprints of people’s online behaviors, they are supposedly less prone to human biases or cognitive fallacies that are often cited as limitations of traditional personality tests. As a result, machine learning-based assessment tools are becoming increasingly popular in operational settings across the globe with the anticipation that they can effectively overcome the limitations of traditional personality testing. However, the provision of scientific evidence regarding the validity psychometric soundness and the fairness of machine learning-based assessment tools have lagged behind their use in practice. The current paper provides a brief review of empirical studies that have examined the validity of machine learning-based personality assessment, focusing primarily on social media text mining method. Based on this review, we offer some suggestions about future research directions, particularly regarding the important and immediate need to examine the machine learning-based personality assessment tools’ compliance with the practical and legal standards for use in practice (such as inter-algorithm reliability, test-retest reliability, and differential prediction across demographic groups). Additionally, we emphasize that the goal of machine learning-based personality assessment tools should not be to simply maximize the prediction of personality ratings. Rather, we should explore ways to use this new technology to further develop our fundamental understanding of human personality and to contribute to the development of personality theory.

that take advantage of the richness of big data and the analytical power of machine learning that are touted as being allegedly free from human errors and biases.
As test vendors develop and market their own versions of various machine learning-based psychological assessment tools, there is an increasing need for researchers to provide empirical evidence that inform their usability (although ideally, the order would be in reverse).
In fact, there is a prevalent concern among measurement researchers that organizations might be overlooking the critical need for empirical evidence that support the psychometric soundness, construct validity, fairness, and legal defensibility of machine learning-based psychological assessment tools that inform their use in practice. Thus, we believe it is timely for a comprehensive review paper to summarize the current state of research and to identify the existing gaps in machine learning-based personality assessments that future research should explore.
The current paper provides a brief review of the current research on social media text mining and its application to personality assessment. Specifically, we integrate major discussions in recent reviews that emphasize the need for psychometric and theoretical validity evidence of big data personality assessment methods (Alexander et al., 2020;Bleidorn & Hopwood, 2019;Tay et al., 2020) with technical issues that researchers and practitioners need to consider in conducting text mining research (e.g., social media text analysis methods, text preprocessing). Additionally, throughout the paper, we provide readers with references to various user-friendly softwares and guidelines that   for over 70,000 participants and found that they accurately predicted self-ratings of Big Five personality factors measured using the 100-item Note. k = number of studies included in the meta-analysis; N = total sample size; r = sample-weighted mean correlation between computer-based personality assessment and self-reports; SDr = standard deviation of correlation estimate; 95% CI = 95% confidence interval. In computer language, uppercase and lowercase words are treated as being distinct. So even though there is no semantic difference between a word that is capitalized and the same word that is not (e.g., "Personality" and "personality"), phrase "not happy" is clearly different than when the words "not" and "happy" are separately treated as independent words and interpreted independently. Yet, without proper handling of negation, each instance of "not happy" will be counted the same as an instance of "not" and an instance of "happy" and fail to record the proper meaning of the text (Hickman et al., in press). A simple (but effective) way to address this issue is to append a special character to each negation so that text analysis will distinguish between a word with vs. without negation (e.g., replacing "not" with "not_", so that instances of "not happy" will be replaced with "not_happy" 다" could mean "to fly" when it is used alone, but it could also mean (roughly translated) "to display" or "to express an emotion" when it is used in a different context (e.g., "눈물이 난 다", "화가 난다"   Part of the reason for the interest and excitement towards machine learning approach to personality assessment is for this very expectation that it can be used to capture more subtle and even concealed aspects of human personality (e.g., negative personality traits) in a manner that is less prone to human errors, biases, and cognitive fallacies, which should lead to improved prediction. To continue to improve the research on machine learning-based personality assessment, we as a field need to engage in more in-depth theoretical discussions about what should be considered the "ground truth" about one's personality, how it can be captured, and predicted using machine learning approach.

Concluding Comments
With the advent of machine learning-based personality assessment tools, we are seeing some of the same issues being raised that were also raised in the "good old daze" of profuse number of personality constructs and measurements (Hough, 1998). We need history to repeat itself.
The field of psychology has left the early days of dust-bowl empiricism when a measure was deemed useful so long as it predicted any outcomes of importance. Now, in the era of theory-driven research, we need to examine whether machine learning-based personality assessment scores can withstand the rigor of fundamental construct validation process.
Moreover, research needs to quickly catch up to practice by examining whether the applied use of machine learning-based personality assessments (e.g., high stakes personnel selection) provide results that are effective, fair, and legally appropriate. In that regard, the past and present development in personality theory and measurement can serve as a useful guiding principle for the direction of the future of machine learning-based personality assessment and its application to practice.
The current paper brings together the latest research from multiple areas of social media text mining approach to personality assessment.
We hope that readers will find the current paper helpful in developing an integrative understanding and appreciation for various issues that should be taken into consideration in both research and applied aspects of text mining approach to personality assessment (and machine learning approach to psychological assessment in general).