This thesis describes our work on Automatic Personality detection from text in the Big Five Personality trait dimension. We take text data from conversational and written transcriptions, along with social media data consisting of Facebook status updates, and Twitter tweets, with the users labeled with the Big Five scores as input and try to identify the user’s personality profile. Our model is a deep learning framework, consisting of a Convolutional Neural Network (CNN), that takes the vector representation of the words as input to extract the necessary features and then pass on to a fully connected layer followed by softmax for binary classification of each of the five traits. In addition to recognizing personality from text in English, we develop a bilingual model that tries to classify personality in two languages, using bilingual embeddings to take advantage of the relatively larger amount of data available in English. We show improvement in our multilingual experiments on Chinese. We further expand our multilingual work to other languages, using a twitter dataset in four languages: English, Spanish, Dutch and Italian, consisting of user tweets, and the users labeled with personality scores. However, we find that our previous approach of using multilingual embeddings do not give a substantial improvement in the multilingual results. This shows that words that have similar contextual meaning in different languages may not correspond to the same personality traits, since people may express personality using different words, depending on their cultural or language differences. Therefore, we propose GlobalTrait, a personality alignment method for the multilingual embeddings, such that words that correspond to the same personality trait across languages are closer together in the vector space. By applying such alignment to the embeddings and using them as input to our model, we achieve higher F-score results for our multilingual purposes. This method enables us to use the relatively larger amount of data available in high-resource languages such as English to help us recognize personality in other low-resource languages.
| Date of Award | 2019 |
|---|
| Original language | English |
|---|
| Awarding Institution | - The Hong Kong University of Science and Technology
|
|---|
GlobalTrait : recognizing personalities in multiple languages using aligned embeddings
SIDDIQUE, F. B. (Author). 2019
Student thesis: Master's thesis