Timeline
12 weeks
Nov - Jan '24
Disciplines
Data Science
NLP
Sentiment Analysis
Data Visualization
Responsibilities
Project Author
Data Cleaning
Review Analysis
Report Writing
Tech
Python
NLTK
Back
Just how good was The Last of Us Part 2? - January 2024
Download the ipynb file
View the full report
OVERVIEW
Timeline
12 weeks
Nov - Jan '24
Disciplines
Data Science
NLP
Sentiment Analysis
Data Visualization
Responsibilities
Project Author
Data Cleaning
Review Analysis
Report Writing
Tech
Python
NLTK
BACKGROUND
The Last of Us is my all-time favorite game, and when Part II came out it sparked a wave of extreme reactions; praise, hate, celebration, and outrage. I was fascinated by how one game could be so universally acclaimed yet so divisive at the same time. Some called it a masterpiece; others demanded refunds.
I wanted to find a way to get a definitive answer. Sales numbers alone don’t tell the whole story, so I set out to explore what the data really says. Are the 10+ million copies sold a true measure of its success? How much do critical acclaim and user sentiment actually align? And what can we learn from how players responded?
This project was my way of approaching the debate with more than just opinions, using data science to try and answer a question I’ve had since release: Just how good is The Last of Us Part II?
1
Define the Question
How successful is The Last of Us Part II? Multiple dimensions of success; sales, reviews, awards, and sentiment.
2
Collect the Data
Gather publicly available datasets; Metacritic reviews, sales figures, award counts, and performance metrics.
3
Clean & Preprocess
Filter and clean the data using Python, removing null entries, stopwords, punctuation, and non-English reviews.
4
Analyze & Visualize
Perform sentiment analysis using VADER and TextBlob. Create multiple visualizations to illustrate unique findings.
5
Interpret the Results
Compare The Last of Us Part II against similar titles. Evaluate the relationship between sentiment, sales, and awards.
PROGRAMMING
score.py
subjectivity.py
negative.py
sales.py
import pandas as pd
import matplotlib.pyplot as plt
def getReviewsDataframe(filepath):
dfStart = pd.read_csv(filepath)
dfEnglish = dfStart[dfStart['language'] == 'English']
dfEnglish.reset_index(drop=True, inplace=True)
dfSimplified = dfEnglish.drop(['type_review', 'views', 'votes', 'language'])
dfSimplified = dfSimplified.dropna(subset=['review'])
return dfSimplified
def getAllScores(df):
scores = df['score'].value_counts()
scoresDf = pd.DataFrame({'count': scores.values}, index=scores.index)
scoresDf = scoresDf.sort_index()
return scoresDf
#Taken from Lázaro's own study
sentimentCSVFilePath = 'tlou2_reviews.csv'
df = getReviewsDataframe(sentimentCSVFilePath)
scores = getAllScores(df)
#Chart styling altered on portfolio output for visual consistency
scores.plot(kind='bar', color='skyblue', rot=0)
plt.xlabel('Score')
plt.title('Distribution of Scores')
plt.show()
import pandas as pd
import matplotlib.pyplot as plt
from textblob import TextBlob
def getReviewsDataframe(filepath):
dfStart = pd.read_csv(filepath)
dfEnglish = dfStart[dfStart['language'] == 'English']
dfEnglish.reset_index(drop=True, inplace=True)
dfSimplified = dfEnglish.drop(['type_review', 'views', 'votes', 'language'], axis=1)
dfSimplified = dfSimplified.dropna(subset=['review'])
return dfSimplified
def getSubjectivityScores(df):
subjectivityScores = []
for review in df['review']:
blob = TextBlob(review)
subjectivityScores.append(blob.sentiment.subjectivity)
return subjectivityScores
# Taken from Lázaro's own study
sentimentCSVFilePath = 'tlou2_reviews.csv'
df = getReviewsDataframe(sentimentCSVFilePath)
subjectivityScores = getSubjectivityScores(df)
#Chart styling altered on portfolio output for visual consistency
plt.hist(subjectivityScores, bins=20, color='mediumpurple', alpha=0.8)
plt.title('Subjectivity Distribution')
plt.xlabel('Subjectivity Score')
plt.ylabel('Number of Reviews')
plt.grid(True)
plt.show()
import pandas as pd
import matplotlib.pyplot as plt
from wordcloud import WordCloud
def getReviewsDataframe(filepath):
dfStart = pd.read_csv(filepath)
dfEnglish = dfStart[dfStart['language'] == 'English']
dfEnglish.reset_index(drop=True, inplace=True)
dfSimplified = dfEnglish.drop(['type_review', 'views', 'votes', 'language'], axis=1)
dfSimplified = dfSimplified.dropna(subset=['review'])
return dfSimplified
def getNegativeStringOfReviews(df):
filteredDf = df[df['score'] < 5]
combinedReviews = ''
for index, row in filteredDf.iterrows():
if not pd.isnull(row['review']):
combinedReviews += row['review']
return combinedReviews
sentimentCSVFilePath = 'tlou2_reviews.csv'
df = getReviewsDataframe(sentimentCSVFilePath)
negative_text = getNegativeStringOfReviews(df)
#Chart styling altered on portfolio output for visual consistency
wordcloud = WordCloud(
width=600,
height=400,
background_color='black',
colormap='Reds',
max_words=100
).generate(negative_text)
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.title('Negative Review Keywords')
plt.show()
import pandas as pd
import matplotlib.pyplot as plt
def getGameSalesDataframe(filepath):
dfStart = pd.read_csv(filepath)
dfStart.set_index('Rank', inplace=True)
dfSimplified = dfStart.drop(['basename', 'Genre', 'ESRB_Rating', 'Publisher', 'Developer',
'VGChartz_Score', 'Year', 'Last_Update', 'url', 'status',
'Vgchartzscore', 'img_url'], axis=1)
salesColumns = ['Global_Sales', 'NA_Sales', 'PAL_Sales', 'JP_Sales', 'Other_Sales']
dfFiltered = dfSimplified.dropna(subset=salesColumns, how="all")
dfFiltered = dfFiltered[dfFiltered[salesColumns].apply(lambda x: (x > 1.0).any(), axis=1)]
dfFiltered.reset_index(drop=True, inplace=True)
return dfFiltered
def getGameSalesAbove(dataframe, value):
df = dataframe
df = df[df['Global_Sales'] > value]
return df
def addTLOU2ToDataframe(dataframe):
record = {'Name': 'The Last of Us Part II', 'Platform': 'PS4', 'Critic_Score': '10.0', 'Global_Sales': '10.0'}
dataframe.loc[len(dataframe)] = record
gameSalesCSVFilePath = 'game_sales_data.csv'
gameSalesDf = getGameSalesDataframe(gameSalesCSVFilePath)
gameSalesDf = getGameSalesAbove(gameSalesDf, 5.0)
addTLOU2ToDataframe(gameSalesDf)
gameSalesDf['Global_Sales'] = pd.to_numeric(gameSalesDf['Global_Sales'], errors='coerce')
gameSalesDf = gameSalesDf.loc[gameSalesDf.groupby('Name')['Global_Sales'].idxmax()]
gameSalesDf = gameSalesDf.sort_values(by='Global_Sales', ascending=False)
#Chart styling altered on portfolio output for visual consistency
plt.figure(figsize=(14, 10))
plt.plot(gameSalesDf['Name'], gameSalesDf['Global_Sales'], label='Global Sales')
highlightIndex = gameSalesDf.index[gameSalesDf['Name'] == 'The Last of Us Part II'].tolist()[0]
plt.annotate("The Last of Us Part II",
xy=(15, gameSalesDf.loc[highlightIndex, 'Global_Sales']),
xytext=(15, 12),
arrowprops=dict(facecolor='black', shrink=0.05),
horizontalalignment='left',
verticalalignment='top')
plt.title('Global Sales of Games')
plt.xlabel('Game Name')
plt.ylabel('Sales in Millions')
plt.xticks(rotation=90)
plt.tight_layout()
plt.show()
output.py
Analysis
A majority of reviews are clustered at either extreme; 0 or 10. This suggests that players weren’t engaging with the game in a nuanced way; instead, they reacted in absolutes, likely fueled by emotional responses to key narrative moments. The middle ground is noticeably sparse, which aligns with the broader discourse labeling the game as one of the most “polarizing” AAA titles of its time.
Most reviews fall on the higher end of the subjectivity scale, indicating that players wrote in emotional, opinion-driven language rather than fact-based critique. This emphasizes the personal and often passionate tone of the responses, reflecting how the game impacted players on an emotional level more than a mechanical or technical one.
Common words like “Game", Joel”, "Story” and "Character" dominate the negative reviews. This suggests that dissatisfaction was less about gameplay or polish, and more centered on narrative choices, especially character fates and plot decisions. The language used in these reviews also supports the earlier subjectivity findings, reinforcing how emotionally charged the criticism was.
Despite critical acclaim and strong branding, TLOU2’s global sales trail behind its peers. While 10+ million units is objectively a success, the comparison reveals that its commercial performance didn’t reach the heights of other major titles or Sony exclusives. This may point to how controversy affected word-of-mouth traction, or simply reflect broader market conditions at the time.
FINDINGS
Despite controversy, positive reviews outnumbered negative ones nearly 2 to 1
High subjectivity means player reviews were more personal, emotional, and opinion-driven
Dissatisfaction was narrative-based, with "Joel", "Story", and "Abby" being the most common words
Commercial performance fell short, with comparable sequels far outpacing TLoU2’s 10.2 million copies sold
It remains one of the most critically awarded and decorated games of its generation
All findings suggest that making The Last of Us Part III is justifiable; if not inevitable
THE END