SOCRadar® Cyber Intelligence Inc. | The Black Box of GitHub Leaks: Analyzing Companies’ GitHub Repos
Home

Resources

Blog
Tem 25, 2023
12 Mins Read

The Black Box of GitHub Leaks: Analyzing Companies’ GitHub Repos

This research aimed to investigate the files that companies might have accidentally uploaded to GitHub and identify any sensitive information that could be present in the uploaded projects; therefore, the focus was on selecting popular Technology Consulting, Finance, Software, Marketing, Information Security, Cyber Security, Telecommunication companies with high commit counts

Regarding the method and process of examining GitHub accounts, we prepared 91 different GitHub search queries in the initial stages of the research to identify sensitive data that could have been shared on GitHub. We then determined the list of public repositories from the selected sample of relevant companies. Subsequently, we conducted searches on GitHub’s advanced search page by including these repositories of the companies (e.g., ‘org:SOCRadar AND (query)’).

While exploring the GitHub repositories, we encountered numerous secret keys, passwords, and other sensitive data. These accidental disclosures highlight the potential security risks companies may face if they unknowingly expose their valuable information on public platforms like GitHub.

In addition to conducting thorough examinations within GitHub repositories, client-side technologies are also available for scanning projects. For example, tools like Talisman, Git hooks, or IDE extensions can be utilized to enhance security measures

Hidden Treasures on GitHub: Accidental Disclosure of Company Data by Companies

Talisman is a pre-commit hook framework that can be integrated into your Git workflow, providing automated checks for sensitive information before it is committed. Git hooks, including pre-commit and post-commit hooks, allow you to run custom scripts or actions during various stages of the Git process. 

Furthermore, IDE extensions, such as the GitHub Security Code Scanning extension for Visual Studio Code, offer additional capabilities to identify and address potential security vulnerabilities directly within your integrated development environment. 

It is also essential to mention GitHub dorks, specialized search queries that attackers can use to quickly identify repositories with specific patterns or vulnerabilities. By being aware of these tools, techniques, and best practices, you can fortify your GitHub repositories and actively protect against potential data breaches.

The importance and impact of companies’ cybersecurity vulnerabilities, exemplified by incidents like the SolarWinds breach (The attack, which took place in 2020 with access obtained through credentials obtained from GitHub, targeted large-scale organizations and government agencies through malware (Sunburst) placed on SolarWinds’ Orion Platform.), highlight the need to address the risks associated with accidental file disclosures on GitHub. 

Github repositories can become entry points for unauthorized access and data breaches if not adequately protected. Evaluating the security vulnerabilities of these repositories and analyzing companies’ information security policies and measures related to Github usage can help organizations strengthen their cybersecurity framework and mitigate the potential impact of such disclosures.

Big Companies, Small Mistakes: Accidental Data Exposures on GitHub

In the past SolarWinds Supply Chain Backdoor Attack, the whole world again realized the importance of carefully checking the data shared in public environments. Today, some tools can detect whether the data we upload to public repositories contains private information. An example is the Push Protection for Secrets feature that GitHub released in early 2022. With this feature, uploading a project or partial files to the repository offers additional protection by scanning the keys and tokens. 

GitHub repository security features
GitHub repository security features

However, in some cases, algorithms often cannot capture the keys that must be kept private keys due to their different structures. This problem may also be encountered because push protection is not used. For large companies with many repositories, such an error involves significant risk and can cause material and moral damages to the company.

In addition, SOCRadar offers our Enterprise customers our real-time scanning feature and helps to protect public repositories with our scanning service Source Code Leakage Monitoring, even if push protection is inactive. We aim to minimize potential risks by regularly checking our customers’ data and making near real-time notifications.

Sensitive information found within the GitHub repositories
Sensitive information found within the GitHub repositories

GitHub Breach Warning: Companies Information Disclosure Risks

API keys and Tokens

During our research investigations, we made a remarkable discovery within a bank’s GitHub repository. We identified an API key for phone verification, an access point to the bank’s secure system. Surprisingly, our tests revealed that the key was still active and operational. The image below showcases a code script that includes an unmasked API key for a Telegram bot and its corresponding chat ID.

Unmasked Telegram bot API key and chat ID, GitHub
Unmasked Telegram bot API key and chat ID

In corporate environments, where organizations strive to maximize security investments in their infrastructure, a seemingly small oversight can quickly turn into a battleground outside the institution’s protective walls. In addition to the exposed API key mentioned earlier, we uncovered 13 distinct instances of API keys and tokens. Even if some of these keys have expired, they can be easily seen as a result of a targeted attack or random research that will coincide with the period in which they were uploaded.

One noteworthy discovery pertains to a bank’s OTP (One-Time Password) application, which relied on communication with a Telegram bot’s API. This integration introduced a significant security vulnerability, allowing unauthorized individuals to access and view sensitive data. The exposed API key jeopardized the bank’s SMS services and compromised the confidentiality of users’ PINs exchanged during chat interactions. Such an incident undermines the integrity of the bank’s services and exposes end-users privacy, making it an urgent concern that demands immediate action.

The banking OTP screen
The banking OTP screen

This situation underscores the critical importance of securely managing and safeguarding API keys. It is a stark reminder that even the slightest mishandling of such keys can have severe consequences for organizations and their customers. To prevent unauthorized access and mitigate potential breaches, it is essential to implement robust security measures and access controls. By doing so, organizations can ensure the confidentiality and integrity of sensitive data, maintaining the trust of their customers while upholding their commitment to data protection.

Exposed bot token, github
Exposed bot token

Cloud Servers and Credentials

During the continuation of the research, the discovery was made concerning 37 separate database schema files and their accompanying banner information. In the provided visual, crucial details such as the URL address, port number, username, and password for a MySQL cloud server were identified. 

This finding raises substantial concerns as the exposure of such sensitive database credentials can lead to severe security risks. Unauthorized individuals who gain access to these credentials may infiltrate the database, manipulate or steal valuable data, or even execute malicious actions that compromise the integrity and confidentiality of the entire system.

Exposed credentials, github
Exposed credentials

Furthermore, an additional discovery involving 13 different authentication keys, secret keys, and credentials. These findings expose vulnerabilities in the authentication mechanisms used by organizations. Suppose these keys and credentials fall into the wrong hands. In that case, attackers can abuse them to impersonate legitimate users, gain unauthorized access to systems, and potentially carry out unauthorized activities with elevated privileges. This poses a significant risk to the confidentiality, integrity, and availability of sensitive information and critical systems.

More database credentials that were exposed, github
More database credentials that were exposed

GitHub in the Dark: Overlooked Risks in Shared Data

In continuation of the GitHub research, it was observed that several files containing private keys were identified as high-risk findings by various vulnerability scanning tools (e.g., 1, 2). While these files may not be considered as highly risky by the organizations, the presence of an id_rsa file (contains SSH Private Key), for instance, can lead to unauthorized access, potential identity theft attacks, or even the decryption of encrypted communication if the private key was used for establishing end-to-end security. Consequently, such findings reveal significant risks that organizations should not take lightly.

Several large companies serving numerous organizations have identified significant data vulnerabilities in their GitHub repositories. One specific finding involves the Azure Django web service, which uploaded the secret_key to GitHub without proper masking or protection. 

The secret_key is critical in securing the Django app because it performs authentication, encryption, and other vital operations. If the secret_key becomes publicly available on a platform like GitHub, it poses a significant risk of unauthorized access to the Django app. Malicious actors could use this key to compromise user sessions, steal sensitive data, or perform unauthorized actions.

Azure Django Web Service’s secret_key was exposed, github
Azure Django Web Service’s secret_key was exposed

To mitigate such risks, it is crucial to prevent uploading sensitive information like the secret_key to version control platforms like GitHub. Instead, secure practices and robust security management policies should be implemented. Storing sensitive values, such as the secret_key, in secure server environment variables or configuration files accessible only to authorized users is recommended. By following these practices, companies can protect their critical data, maintain the integrity of their applications, and ensure compliance with security standards and regulations.

In addition to the measures mentioned above, it is imperative to either mask such configuration files or explicitly exclude them from version control using the .gitignore file. Masking sensitive information within configuration files involves replacing the actual values with placeholders or encrypting the data to prevent unauthorized access. Alternatively, explicitly excluding these files from version control ensures they are not inadvertently shared or exposed in the repository. By implementing these practices, organizations can significantly reduce the risk of sensitive information leakage and enhance the overall security of their applications deployed on platforms like GitHub.

Hidden Treasures on GitHub

During the research, several additional findings were obtained, indicating the presence of various sensitive information within the GitHub repositories. These findings include 1,336 private keys in the form of .pem files, which can provide unauthorized access to systems and services. Moreover, 37 occurrences of database files or database banners were identified, potentially exposing critical information about the database structure and configuration.

Furthermore, the research uncovered 13 authentication keys, secret keys, and credentials essential for accessing and securing various systems and services. Additionally, 6,796 BASH files and DNSSEC Trusted Key were discovered, potentially containing scripts and configurations with significant implications for system security.

Exposed DNSKEY, github
Exposed DNSKEY

Two instances of .htpasswd files were identified among the findings, which could grant unauthorized access to protected resources. Moreover, 21 API key cases were detected, providing access to third-party services, and should be safeguarded to prevent misuse. The research also yielded nine instances of cloud application or database server information, which could reveal critical details about the organization’s infrastructure. 

Exposed API keys, github
Exposed API keys

One instance of customer data was identified, highlighting the importance of protecting customer information from unauthorized access and potential data breaches. Additionally, one application production configuration file was discovered, emphasizing the significance of securing production environments and safeguarding sensitive configurations.

Lastly, two instances of Linux OS secrets were found, highlighting the need for robust security practices and secure management of operating system credentials and configuration files. 

These findings underscore the importance of thorough security assessments, regular audits, and robust security controls to prevent unauthorized access, data breaches, and potential exploitation of sensitive information.

Always Be Vigilant with SOCRadar

During the continuation of the same research, a significant discovery was made concerning 37 separate database schema files and their accompanying banner information. In the provided visual, crucial details such as the URL address, port number, username, and password for a MySQL cloud server were identified. This finding raises substantial concerns as the exposure of such sensitive database credentials can lead to severe security risks. Unauthorized individuals who gain access to these credentials may infiltrate the database, manipulate or steal valuable data, or even execute malicious actions that compromise the integrity and confidentiality of the entire system. 

Furthermore, an additional discovery was made involving 13 different authentication keys, secret keys, and credentials. These findings expose vulnerabilities in the authentication mechanisms used by organizations. Suppose these keys and credentials fall into the wrong hands. In that case, attackers can abuse them to impersonate legitimate users, gain unauthorized access to systems, and potentially carry out unauthorized activities with elevated privileges. This poses a significant risk to the confidentiality, integrity, and availability of sensitive information and critical systems.

SOCRadar’s Source Code Leakage Monitoring
SOCRadar’s Source Code Leakage Monitoring

Organizations can enhance their data security by utilizing SOCRadar’sSource Code Leakage Monitoring feature. This proactive solution scans public and private repositories, including GitHub, to identify potential code leaks and accidental disclosure of sensitive information.

SOCRadar employs advanced algorithms and machine learning techniques to detect exposed private keys, authentication credentials, API keys, and other critical data in real-time. By receiving alerts from SOCRadar, organizations can promptly respond to potential risks, take necessary mitigation steps, and prevent malicious actors from exploiting vulnerabilities.

By leveraging SOCRadar’s Source Code Leakage Monitoring, organizations can proactively protect sensitive data, strengthen their security posture, and ensure compliance with data protection regulations.

Receive alarms about code repositories on SOCRadar
Receive alarms about code repositories on SOCRadar

In conclusion, adopting secure practices when using GitHub is paramount for protecting sensitive data and mitigating potential risks. While minor mistakes or oversights may appear insignificant, they can accumulate and create significant issues. 

To illustrate this, let’s consider a puzzle. Each small puzzle may seem unimportant, but they form a coherent picture. Similarly, each small oversight or vulnerability in a GitHub repository may not seem concerning. 

However, when they combine, they can create a chain reaction that leads to significant security breaches and data compromises. Therefore, it is crucial to treat even minor security vulnerabilities seriously and maintain a proactive and vigilant approach to the security of GitHub repositories. By doing so, organizations can ensure the integrity of their code, protect sensitive information, and prevent potential chain reactions of security incidents.