Password Dictionary Analysis: Ultimate Wordlist of USA Passwords
In the digital age, where our lives are increasingly connected with technology, the importance of securing our online accounts cannot be overstated. Yet, despite the constant warnings and cybersecurity breaches making headlines, many individuals continue to rely on weak and easily guessable passwords.
In the United States, as in many parts of the world, specific passwords persistently emerge as the most commonly used despite the known risks they pose. This article explores America’s most used passwords, shedding light on the patterns they have and the consequences of using them.
Our Password List
In cybersecurity, password lists stand out as invaluable resources for understanding the dynamics of online security practices. These lists comprise a comprehensive compilation of commonly used words, phrases, and character combinations employed as passwords by individuals across the digital landscape. While security practitioners can learn from these lists and try to limit the usage of such weak passwords, threat actors can utilize such valuable resources to initiate their attacks.
We discovered the shortcomings of frequently used passwords in our article here. That research created an overview of the most commonly used passwords worldwide and laid the foundation for further research. The password dictionary, named srwordlist.txt, has been compiled by SOCRadar analysts, drawing from sources such as dark web forums and Telegram channels. This list encompasses 1,788 of the most frequently used passwords. For a more in-depth look, we extracted the frequently used passwords in the USA and analyzed them in this article.
The dataset we are looking at for the USA part of our research has the most popular 250 passwords. In addition, we also have another feature named ‘hit_count’, which indicates the frequency of occurrence of each password.
Both End of the List
The list is ordered based on the occurrence of each password. While the most frequent passwords occurred as many times as 66 thousand, the passwords at the bottom were detected around a thousand times.
It’s important not to mistake the last 10 passwords as more secure than the first 10. They simply have been “caught” fewer times. It’s important to understand that when an attacker utilizes a dictionary-based strategy, none of the passwords in a list can be considered strong enough.
Password Complexity
According to the National Institute of Standards and Technology (NIST), specific parameters must be followed to ensure the strength of passwords. These parameters generally are as follows:
- Ensuring the password is at least 8 characters long (with longer ones preferable).
- Combining letters and numbers.
- Using both uppercase and lowercase letters.
- Utilizing special characters.
To assess the strength of each password, we decided to give them scores based on the amount of parameters they satisfy. For each parameter they are following, they will get 1 point. But, if they are shorter than 8 characters, no matter how many parameters they follow, their score will be 0. To get our results, we used the code below in Jupyter Notebook.
# A function to calculate the score and parameters for each password
def calculate_score_and_parameters(password):
score = 0
uppercase = 0
lowercase = 0
numbers = 0
symbols = 0
# If password has less than 8 characters, return 0 for all parameters
if len(password) < 8:
return pd.Series({‘score’: 0, ‘uppercase’: 0, ‘lowercase’: 0, ‘numbers’: 0, ‘symbols’: 0})
# Check if password contains uppercase letters, lowercase letters, numbers, and symbols
if any(char.isupper() for char in password):
score += 1
uppercase = 1
if any(char.islower() for char in password):
score += 1
lowercase = 1
if any(char.isdigit() for char in password):
score += 1
numbers = 1
if any(not char.isalnum() for char in password):
score += 1
symbols = 1
return pd.Series({‘score’: score, ‘uppercase’: uppercase, ‘lowercase’: lowercase, ‘numbers’: numbers, ‘symbols’: symbols})
# Calculate the score and parameters for each password
new_columns = df[‘password’].apply(calculate_score_and_parameters)
df = pd.concat([df, new_columns], axis=1)
display(df)
The code we used returned more information about the list we have. Now, we can do more analysis with this dataset. First, we can check the average strength of the passwords in this list.
Since many passwords are shorter than 8 characters (which shows us another thing about the strength of the most common passwords), the average score is closer to 0 than 4. To see the average score of passwords longer than 8 characters, we can ignore the ones that got 0 and then try again.
There were 5 parameters, one of them being longer than 8 characters, and 4 points to gather in total. The average score we got is 1.5/4. This alone shows the weakness of the passwords in this list.
Usage of password parameters
As we mentioned earlier, typically, a strong password includes symbols, numbers, uppercase and lowercase letters, with a minimum length of 8 characters. Let’s examine the distribution of these criteria.
As you can see from the table above, more than half of the passwords in this list are shorter than 8 characters, and only 0,8% of them satisfy three parameters at once, while none of them can get 4 points as their final score. 134 of the passwords contain less than 8 characters in this list of 250 passwords.
While none of the passwords in this list use symbols, a tiny fraction utilize at least one uppercase letter, and approximately 37% have numbers.
Distribution of password lengths
Analyzing the length of passwords is crucial because it provides essential information about the strength of the passwords.
import matplotlib.pyplot as plt
# Calculate the length of each password
df[‘password_length’] = df[‘password’].apply(len)
# Specify the groups for password lengths
groups = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
# Group passwords based on their length
passwords_grouped = df.groupby(pd.cut(df[‘password_length’], bins=groups, right=False)).size()
# Plot the distribution of password lengths
plt.bar(range(len(passwords_grouped)), passwords_grouped, align=’center’, tick_label=[f'{i}’ for i in groups[:-1]])
plt.xlabel(‘Password Length’)
plt.ylabel(‘Frequency’)
plt.title(‘Distribution of Password Lengths’)
plt.show()
By understanding the distribution of password lengths, organizations can identify common trends, such as the prevalence of short or easily guessable passwords, and take proactive steps to enforce more robust password policies. We examined the consequences of short passwords in the following chapter. Using short passwords for your accounts or neglecting to enforce password policies within your organization can lead to catastrophic consequences.
Consequences of Using Weak Passwords
You are wrong if you depend on small tricks or tiny changes in your password or if you think the length alone is enough to ensure its strength. You will have a weak password if you rely on one or two parameters and ignore the rest. And even if you satisfy all the parameters, if that password is exposed already, it is no different from using no password.
To ensure the strength of your password, you can use services like PasswordMonster to see if it is strong enough or if it is exposed earlier. When you type in your password, you get a time frame that will take for attackers to decrypt your password. It also explains why your password is so weak under that time indicator. For maximum security, you are advised to enter passwords similar to yours here rather than your original one.
Using weak or already exposed passwords in enterprises and organizations can leave systems vulnerable to breaches, resulting in severe consequences. Employees play a pivotal role in the security posture of enterprises and organizations, particularly regarding password management. Using weak passwords can inadvertently open the door to cyber threats, putting the employee and the organization at risk.
Unfortunately, employees may resort to weak or exposed passwords for various reasons, often compromising organizational security. First and foremost, a lack of awareness regarding cybersecurity best practices can lead employees to underestimate the importance of strong passwords and the potential risks associated with weak ones. Moreover, convenience plays a significant role, as individuals generally choose easily memorable passwords to shorten their login procedures.
Resistance to change and overconfidence in one’s immunity to cyber threats can lead attackers to obtain passwords from illicit sources. For example, suppose an employee’s password for their work email address is the same as the password for services they use in their private life. In that case, an attacker can quickly check dark web sources to see if they can obtain that password. Then, try to log in to that employee’s work account.
To prevent such attacks, you can check SOCRadar’s Account Breach Check tool to control the leaks and take preventive measures before hackers reach your organization.
Conclusion
After analyzing the srwordlist_us_250.txt, we can conclude that the most popular passwords protecting emails and other personal information are extremely weak. Choosing a secure password, refraining from reusing passwords across different platforms, and activating two-factor authentication are crucial steps for password hygiene. However, two additional aspects also require attention: platform security and the risk of stealer logs.
To address these challenges, organizations must implement comprehensive education and awareness programs coupled with user-friendly security solutions to empower employees to prioritize strong password practices and mitigate the risk of security breaches. Unfortunately, the platforms your organization uses for daily operations might have vulnerabilities, and you might not be able to fix them. But you can do something with stealer logs. Stealer logs contain sensitive data like usernames, passwords, and credit card numbers. With SOCRadar’s Extended Threat Intelligence platform, you can use our Threat Hunting module to sift through extensive volumes of data in stealer logs to detect credentials and neutralize threats before they cause harm to your organization.