Thesis - Portfolio

Title: Navigating Ideological Impact: Exploring Negativity Bias on YouTube

My final project at the University of Konstanz culminated in the crafting of a master’s thesis regarding the YouTube content posted by U.S. Senators. This work, which received a high grade (1.3), allowed me to develop expertise in web scraping, Natural Language Processing (NLP) techniques, and supervised machine learning. Utilizing both R and Python, I collected, analyzed, and visualized the correlation between sentiment and viewership of YouTube videos. The thesis is available upon request, and the code for the project can be found on my GitHub repository here.

Abstract:
In modern politics, social media has become an indispensable tool for politicians to share their opinions and engage with their constituents. Across disciplines, there is a well-documented negativity bias in the human psyche, meaning that negative information is prioritized in our cognitive processes. In response to the negativity bias, this study delves into the underexplored terrain of YouTube to investigate how U.S. senators employ negativity to broaden their audience. With YouTube’s reputation as a space for diverse opinions, this research also scrutinizes the impact of ideological factors on video views. Employing an innovative approach, a comprehensive dataset of 44,100 video captions from current U.S. senators is collected and subjected to sentiment analysis using the RoBERTa neural language model. Through a linear mixed-effects model, the results suggest that senator videos containing more negative discourse increase viewership, a trend that persists even after controlling for individual characteristics. However, conclusive evidence regarding the definitive impact of ideological factors on baseline viewership remains elusive.

Although the code is located in the GitHub repository, the journey was filled with intricate challenges and invaluable learning experiences. One notable achievement was developing R markdown files that detail various project steps. The first page outlines the process of collecting YouTube captions for a given list of videos. I engineered a script to automate a browser, which systematically traversed a list of URLs and downloaded available captions. This effort resulted in the accumulation of 44,100 text files, each corresponding to a video. Overcoming hurdles, such as handling pop-ups and advertisements while webscraping, proved to be a significant challenge. One particularly rewarding accomplishment was successfully implementing Ad-Blocker functionality into my automated browser.

Scraping Data with RSelenium (R): Link.
Replication of Figures (R): Link.
Replication of Tables (R): Link.