SOCRadar® Cyber Intelligence Inc. | Inevitable Tool in Pentesters’ Arsenal: Password Dictionary Lists
Home

Resources

Blog
Apr 01, 2024
14 Mins Read

Inevitable Tool in Pentesters’ Arsenal: Password Dictionary Lists

Newbie penetration testers often struggle to grasp the significance of the initial findings reported by most Dynamic Application Scanning Tools (DASTs). For reference, here’s a screenshot from a DAST tool finding “Login Page Identified”.

A possible login page notification fired by a DAST tool, Invicti

A possible login page notification fired by a DAST tool, Invicti

In fact, the login screen serves as the gateway to the kingdom, holding vast treasures within, and passwords are the keys that unlock these doors.

Numerous possibilities arise upon encountering an entry point: Injecting payloads to exploit XSS vulnerabilities, checking for the existence of SQL injection vulnerabilities, triggering Server-Side Request Forgery, and so forth.

However, before contemplating breaking through the door, seasoned veterans often prioritize the potential for easy access. Armed with thousands of potential keys, time for examination, and a touch of luck, winning the game becomes quite feasible!

Readers familiar with hacker Kevin Mitnick’s “The Art of Intrusion” may recall a scene where the hacker spent countless hours in a brute force attack attempting a generated password dictionary list, trying each in turn, and successfully breached the system.

Today’s readers are more fortunate. Instead of relying solely on computer-generated password dictionary lists, login attempts can be made using the most common, real, and frequently used passwords obtained from dumped passwords. The positive news is that within the infosec industry, many individuals share categorized password dictionaries, for instance, according to country.

These passwords may be obtained from previously hacked databases, pilfered through stealer logs, and so forth. From a defensive perspective, penetration testers can utilize these lists to check for breached accountsidentify the use of leaked and common passwords, and report their findings to contribute to a more secure world.

Kaspersky’s password checker shows how many times a password appeared in leaked databases

Kaspersky’s password checker shows how many times a password appeared in leaked databases

The effectiveness of password dictionaries hinges on several factors: The prevalence of weak passwords, the tendency to reuse a single password across multiple platforms, storing passwords in clear text within databases, and more.

Additionally, the success of password dictionaries is underscored by the absence of rate limits on login operations or poorly implemented lockout mechanisms on many websites. In such cases, the time limit is virtually unlimited, allowing one to attempt login endlessly until the correct password is discovered, constrained only by the computational power available.

In the inaugural installment of this article series, we will explore a thoroughly curated password dictionary sourced from various outlets, cataloging the 1,700 most common passwords along with their respective frequencies.

Furthermore, by employing various data analysis techniques, we will elucidate correlations between password complexity and distribution patterns, and thus, offer mitigation strategies for developers who may approach this topic with trepidation.

A Password Dictionary, Meticulously Curated for Pentesters: srwordlist.txt

The meticulously curated password dictionary, named srwordlist.txt, has been compiled by SOCRadar analysts with rigorous care, drawing from sources such as dark web forums and Telegram channels. This comprehensive list encompasses 1,788 of the most frequently used passwords.

Throughout this groundbreaking expedition, we will employ tools such as Jupyter Notebook and the Python pandas library, leveraging the insights gleaned from the srwordlist.txt password dictionary list to unlock the doors of numerous accounts.

The original structure of the file consists of two columns shown in the output of the shape method of DataFrame

The original structure of the file consists of two columns shown in the output of the shape method of DataFrame

The columns consist of a “password” column, containing unique passwords gathered from various password dictionaries, and a column named “hit,” indicating the frequency of occurrence of each password within the entire collected list.

A sample of password frequency

A sample of password frequency

The list is ordered based on each password’s “hit” value. Upon inspecting the first five records from the list, it becomes evident that the entries are arranged in descending order according to the frequency of each password.

Top 5 password frequencies

Top 5 password frequencies

Penetration testers may find it difficult to contain their excitement and may be eager to commence testing with this list immediately. However, for readers who wish to embark on this journey with us and explore future projections, we encourage you to stay tuned for upcoming installments.

A Little Bit of Feature Engineering with the Data

Performing some feature engineering on the data could provide valuable insights into the nature of the passwords and the trends of potential victims.

While weak passwords can indeed be easily guessed by brute force attacks given sufficient computing power and time, it’s important to recognize that the reasons for a password appearing on such a list may extend beyond its inherent weakness. In today’s complex cyber landscape, vulnerabilities such as phishing attacksweb application breaches, and stealer logs can compromise passwords, even if users adhere to best practices in password hygiene.

Password Monster shows how easy it is to guess the password “playboy1” in 0.08 seconds. This password is in the top 5 on our list.

Password Monster shows how easy it is to guess the password “playboy1” in 0.08 seconds. This password is in the top 5 on our list.

Indeed, exploring the relationship between weak passwords and their presence frequency in a dictionary is a crucial aspect of understanding password security.

While weak passwords are more susceptible to being cracked through brute force attacks, their appearance in password dictionaries may also stem from other factors such as common usage patterns, simplicity, or susceptibility to social engineering tactics. Thus, analyzing this relationship can provide valuable insights into the effectiveness of password security measures and inform strategies for enhancing password strength and resilience against various forms of cyber threats.

From NIST Special Publication 800-63B

From NIST Special Publication 800-63B

To measure each password’s complexity in the list, we can adopt the guidelines provided by NIST (National Institute of Standards and Technology) for secure passwords. These guidelines typically include:

  1. Ensuring the password is at least 8 characters long (with longer passwords preferable).
  2. Including a combination of letters and numbers.
  3. Incorporating both uppercase and lowercase letters.
  4. Utilizing special or complex characters.

In our feature engineering process, we will implement the following Python code to derive new attributes from the existing data:

import re
def is_strong_password(password):
	min_length = 8
	max_length = 64
	score = 0
	# Check password length
	if min_length <= len(password) <= max_length:
    	score += 1
	# Check for character composition using regular expressions
	for pattern, weight in [(r"[A-Z]", 2), (r"[a-z]", 2), (r"d", 2), (r"[!@#$%^&*()-_+=]", 3)]:
    	if re.search(pattern, password):
        	score += weight
	# Penalty for common patterns
	#common_patterns = ["password", "123456", "qwerty"]
	#if any(pattern in password.lower() for pattern in common_patterns):
   	# score -= 5
	# Penalty for passwords consisting only of repeating patterns of digits
	if re.match(r"d+$", password) or re.match(r"(d)1+$", password):
    	score -= 5
   	 
	unique_chars = len(set(password))
	if unique_chars >= 10:
    	score += 2
	elif unique_chars >= 5:
    	score += 1
	return score

The code systematically assesses each condition based on industry standards such as those outlined by NIST. According to these criteria, the highest complexity score a password can attain is 12 points.

Given the clarity of the code accompanied by explanatory comments, we’ll proceed directly to the feature engineering steps:

Our focus now shifts to deriving two additional attributes from our existing data: “score” and “pwd_len” (password length).

The “score” attribute will be computed using the previously defined function, mapping each password to its corresponding score.

Upon executing the following feature extraction operations, our dataset gains further granularity, rendering it more descriptive:

Frequency mapping in password dictionary lists

Frequency mapping in password dictionary lists

By leveraging the powerful functionality offered by Python’s pandas library, particularly its correlation function (corr()), we gain the ability to swiftly explore the relationships between each column within our dataset. This invaluable tool provides us with a comprehensive overview of the correlations present among the various attributes, allowing us to discern patterns and dependencies with ease.

Correlations among the attributes 'hit' (indicating password occurrence in different lists), 'score', and 'pwd_len' (password length).

Correlations among the attributes ‘hit’ (indicating password occurrence in different lists), ‘score’, and ‘pwd_len’ (password length).

A negative correlation between ‘pwd_len‘ and ‘hit’ suggests an intriguing relationship: As one attribute increases, the other tends to decrease. This implies that longer passwords are less likely to appear frequently across various lists. It’s indicative that shorter passwords might be more common across different sources, potentially due to their simplicity or widespread use.

Conversely, a positive correlation emerges between ‘pwd_len’ and ‘score’. This finding suggests that as password length increases, so does its complexity score. This correlation aligns with intuition, as longer passwords typically offer more security. However, it’s noteworthy that the correlation value isn’t exceptionally high. This discrepancy can be attributed to our complexity calculation function, which recognizes that password length alone doesn’t guarantee complexity. For instance, even though a password consists of eleven identical characters (e.g., ‘11111111111’), the repeated pattern incurs a penalty point, thereby lowering the overall complexity score by -5. Hence, while length contributes to complexity, it’s not the sole determinant.

The code sections which apply a penalty for repeating patterns of digits

The code sections which apply a penalty for repeating patterns of digits

Now that we’ve computed our scores, it’s time to revisit our tables, focusing particularly on the top 10 most frequently occurring passwords across different lists.

Top 10 occurring passwords

Top 10 occurring passwords

It comes as no surprise to find that the top positions in the list are dominated by less complex passwords. Examining the distribution of passwords based on complexity paints a more detailed picture. While the spectrum is undoubtedly varied, the overarching trend remains consistent.

Password Length-Frequency distribution

Password Length-Frequency distribution

Following our exploratory data analysis of the curated password dictionary list, it’s imperative to discuss how penetration testers can effectively harness this resource and outline measures to address or mitigate potential dictionary attacks.

How Can Pentesters Make Use of Password Dictionary Lists?

Pentesters can leverage password dictionary lists as powerful tools in their arsenal for assessing the security of web applications. Upon identifying a login form within the tested web application, pentesters can initiate various strategies to evaluate its robustness and potential vulnerabilities.

After detecting a login form in the tested web application, pentesters can look for breached accounts belonging to the domain. SOCRadar’s free to use Account Breach Check service can help penetration testers for this purpose.

Account Data Breach Check service by SOCRadar

Account Data Breach Check service by SOCRadar

Before attempting to brute-force a registered account in the tested application using a password dictionary list, a pentester must first ascertain whether the login form enforces a rate limit.

Rate limiting entails imposing a specific threshold on the number of login attempts allowed within a certain time frame for a client. Typically, such limits are implemented to circumvent volumetric login attempts on the page. Fortunately, web applications often reveal whether rate limiting is in effect following a certain number of failed attempts, usually through error messages.

A screenshot from Laravel application which shows rate limiting

A screenshot from Laravel application which shows rate limiting

Rate limits can manifest in two forms: Hard lock and soft lock. While a soft lock restricts users from making further attempts for a specified time interval, a hard lock imposes stricter measures, often necessitating users to contact the web application’s administration to regain access to their account or IP address.

An example of a soft lock mechanism can be found on the Disruptive Technologies support page

An example of a soft lock mechanism can be found on the Disruptive Technologies support page

While some web applications implement rate limiting based on IP addresses, others apply it to individual user accounts, requiring account owners to take further actions through different channels. An example of a hard lockout mechanism can be observed on Microsoft’s support page.

An example of a hard lock out mechanism

An example of a hard lock out mechanism

In scenarios where rate limiting is based on IP addresses, attackers may attempt to bypass this restriction by utilizing the X-Forwarded-For request header. This header is commonly used by forward proxies to convey the client’s origin IP to the origin server behind a proxy. Incorrect implementation may result in confusion, allowing attackers to continue their attempts indefinitely.

Sophisticated attackers may also employ data centers or residential proxies to rotate their IP addresses, enabling them to circumvent IP-based rate limiting measures.

Penetration testers must utilize these methods when testing web applications, not only to assess resistance against brute force attacks, but also to evaluate the system’s resilience against IP spoofing techniques.

You can download the password dictionary file here.

Be One Step Ahead of Attackers by Applying Countermeasures Beforehand

Developers must stay proactive in safeguarding their systems against evolving attack techniques by implementing preemptive countermeasures.

In the ongoing battle between attackers and system owners, it’s unrealistic, if not impossible, to anticipate every tactic employed by malicious actors. Rather than engaging in a futile cat-and-mouse game with hackers, developers can fortify their web applications’ security posture by adhering to established best practices.

One fundamental measure is ensuring that passwords are not stored in clear text. Instead, developers should employ robust hashing and salting techniques using proven algorithms to securely store passwords. Additionally, passwords should be safeguarded during transmission by implementing SSL/TLS protocols to encrypt data in transit.

For login forms and any operation that alters the application’s state, developers should implement rate limiting based on the attacker’s real IP address. While attackers may possess the capability to rotate IP addresses using botnets or rented proxies, developers can counter such tactics by instituting hard locking mechanisms after a specified number of failed login attempts. In such instances, account owners would be required to take further actions via an alternative channel to regain access.

Promoting secure password practices among users is paramount. Developers should guide users in selecting secure passwords by suggesting criteria for strong passwords during registration and password change processes. Encouraging the use of password managers can also enhance password hygiene, as these tools facilitate the adoption of unique and complex passwords for each platform. Developers should avoid hindering users from utilizing password managers by allowing pasting functionality in password fields.

Furthermore, developers should proactively prevent the use of breached passwords by users. This can be achieved by integrating data breach services into the registration and password change workflows, enabling real-time checks against breach databases. It’s essential to continuously monitor such breach databases and promptly inform users in the event of any appearance of leaked credentials, thereby fostering a culture of transparency and accountability in password security.