SOCRadar® Cyber Intelligence Inc. | 23 Billion Rows of Stolen Records: What You Need to Know?

Home

Resources

Blog

Feb 26, 2025

9 Mins Read

23 Billion Rows of Stolen Records: What You Need to Know?

Update: What Does the Alleged Leak Data Contain?

Infostealer malware continues to pose a severe threat, with billions of stolen records circulating in cybercriminal markets. A recent analysis of a stealer log showed the vast scale of these breaches, exposing sensitive user credentials and personal data worldwide.

Troy Hunt has ingested 1.5TB of stealer logs, known as “ALIEN TXTBASE,” into Have I Been Pwned. These logs contain 23 billion rows, including 493 million unique website and email address pairs, affecting 284 million unique emails.

SOCRadar has been actively monitoring this Telegram channel and continues to track its activity. Numerous alerts have been generated from this channel for relevant users in the past.

Sign up for SOCRadar Free CTI to check if your IP address, domain, or email appears in these leaked communications, see where the leaks occurred, furthermore receive alerts as SOCRadar user for similar incidents in the future.

Where Do the 23 Billion Stolen Records Come From?

Security researcher Troy Hunt recently examined 23 billion rows of data extracted from infostealer logs, unveiling a staggering amount of compromised credentials and personal information. This dataset, originating from the “Alien” TXTBase collection, shows the vast volume of data siphoned by various malware strains. Infostealers, such as RedLine and Raccoon, continue to be widely used by cybercriminals to harvest login credentials, autofill data, and even cryptocurrency wallet information from infected devices.

An example ALIEN TXTBASE dump

The leaked records provide a detailed insight into how infostealer malware operates. Rather than relying on large-scale breaches of corporate databases, these threats infect individual devices, silently collecting stored passwords, session cookies, and other sensitive data. The stolen information is then sold or shared within cybercriminal communities, fueling a cycle of account takeovers and fraud.

What is the ‘Alien’ TXTBase Dataset?

Contrary to a possible assumption that the dataset originates from a single breach, it is actually a collection of multiple files shared within a Telegram channel. These files, formatted as URL:login:password, were gathered from a total of 744 individual files.

The dataset analyzed by Troy Hunt was the result of consolidating these files into a unique dataset. Initially, only two of these files were examined, but the full dataset was later compiled from the entire collection shared within the channel. This approach highlights how cybercriminals systematically aggregate and distribute stolen credentials over time, expanding the scope and impact of infostealer malware.

Telegram channel monitored by SOCRadar

As mentioned, the dataset is a collection of stolen data sold through a Telegram channel, which might not only distribute real credentials but also engage in scams. The channel may generate fake or non-existent emails and phone numbers to inflate the value of the data, potentially using these fake entries to deceive buyers.

For another in-depth analysis, explore SOCRadar’s whitepaper, featuring 70 million stealer log snapshots that provide insights into stealer malware and support proactive defense strategies.

What Does the Alleged Leak Data Contain?

The cybersecurity community has been scrutinizing the ALIEN TXTBASE data leak, and a detailed analysis seems necessary because there are significant doubts about the authenticity and reliability of this dataset. While unique numbers and Troy Hunt’s article highlight how much the leaked alleged data is reliable, we are going to dive deeper into the data to demonstrate examples.

First of all, SOCRadar has been monitoring the channel for a while, and relevant alerts have been generated for its users before. The Telegram channel mentioned didn’t surface now, but with its recent high popularity and its collective data set being included in Have I Been Pwned, the data spread further than usual.

The combolist data shared in popular hacker forums

As mentioned above, since the data is so large, we said the channel owner can generate fake credentials to bloat up the data and keep up with the demand. Thus, as exactly mentioned, much of the data seems bloated, and there are even made-up credentials.

If you received a “Have I Been Pwned” notification, a SOCRadar alert, or a similar alert from another source regarding an “ALIEN TXTBASE Stealer Logs” breach claiming that your email and password were compromised, there’s a likelihood that it’s a false positive.

The seller of this dataset fabricates account and password combinations by using old, previously exposed email addresses from past breaches, but that doesn’t mean that it’s safe. There are valid credentials in the leak that have been captured by stealer malware, so that the owner of these should check the alerts and verify!

Moreover, even if there is no active stealer malware infection in a user’s current environment, a single valid credential could still be valuable to attackers attempting to gain access to accounts.

Cybercriminals can leverage such credentials in credential stuffing attacks, where they try reused passwords across multiple platforms, or password spraying, where they test commonly used passwords against many accounts. Additionally, brute force attacks can be employed to systematically guess login details, increasing the risk for users who rely on weak, recycled passwords or partially leaked credentials like e-mail address and old password combos can still give an idea.

Let’s examine the data more closely. When a random email address from the dataset is searched using SOCRadar’s Threat Hunting, it appears in multiple Telegram groups throughout 2024, indicating that it is simply a reshared entry rather than newly compromised information.

SOCRadar’s Threat Hunting is available for free users

Many similar credentials in the dataset show that they have been taken from old breaches or past combolists.

So, the dataset mostly appears to contain previously shared or leaked information, although some entries are highly likely to be fake.

Some email domains have no MX records and are non-existent.

We found that among the large amount of randomly selected data, there are more old records than fake ones.

In conclusion, while the ALIEN TXTBASE dataset has generated significant attention, a deeper analysis reveals that much of the data is either recycled from old breaches or fabricated. Despite the alarming numbers, many of the compromised credentials have already been exposed in previous leaks or appear to be made up entirely.

However, it’s important to note that some valid credentials do exist within the leak, which could still pose a threat. Even without active malware infections, these valid credentials can be exploited in credential stuffing, password spraying, or brute force attacks.

How Is the Threat of Stealer Logs and Cybercrime Distribution Evolving?

The Alien TXTBase dataset, which contains 23 billion stolen records, underscores the vast scale and growing impact of infostealer malware. While the primary focus remains on the stolen credentials and personal data, it’s important to recognize that these records are part of a much broader cybercrime ecosystem. Cybercriminals, relying on various tools and communication channels, continue to expand the reach of these attacks.

Though Telegram once played a pivotal role in distributing stealer logs and other stolen data, its influence has waned with the growing use of alternative platforms. As mentioned in recent discussions on Telegram’s shifting role in cybercrime, Telegram’s peak time may have passed due to stricter policies and increased monitoring. However, the platform remainsan influential and significant avenue for cybercriminals who still rely on it for sharing stolen data, including the Alien TXTBase leak. Despite its decline, Telegram is still a key player in the underground market for now, and the data shared within these channels can have serious consequences.

This shift to ot, hacker forums, dark web markets or existing legal platforms like Discord, X (Twitter), Signal or others, stolen data is still being shared and sold, maintaining the pressure on security professionals to adapt.

What Are the Best Defense Mechanisms Against Stealer Logs?

Enable Multi-Factor Authentication (MFA): Always activate MFA for sensitive accounts to protect against stolen credentials, adding an essential layer of security even if login information is compromised.
Update and Strengthen Passwords Regularly: Ensure passwords are unique and complex for each service to minimize the impact of a single leak. Regularly changing them reduces the window of opportunity for attackers.
Utilize Password Managers: Use password managers to securely store and manage your credentials, making it less likely that stealer malware can access them.
Monitor for Leaked Credentials: Leverage services like SOCRadar’s Dark Web Monitoring to detect if your credentials appear in compromised logs, enabling proactive measures.
Ensure Strong Endpoint Security: Regularly scan devices for malware, especially from suspicious downloads, phishing attempts, or other malicious sources, and ensure that endpoint security software is kept up to date to detect and block evolving threats.
User Education and Awareness: Regularly train employees on identifying phishing attempts, malicious downloads, and other attack vectors commonly used by infostealer malware.

The danger posed by infostealers and data leaks remains ever-present. Cybercriminals will continue to exploit any available platform to maximize their reach. It’s critical to remain vigilant and implement proactive defense strategies to protect sensitive data from being exposed and exploited.