SOCRadar® Cyber Intelligence Inc. | Tools and Features That Can Be Used To Detect Sensitive Data Leaks From Github – Part 1


Apr 30, 2020
6 Mins Read

Tools and Features That Can Be Used To Detect Sensitive Data Leaks From Github – Part 1

Back in the time when there was no Github, developers used to share codes in local servers of the company, or even with flash drives. But imagine you are out of the company’s LAN and do not have access to the codes. Github was founded to make developers’ life easier and to save a lot of time.

Github is used by all developers, starting from self developers to big company employees. Sometimes developers forget important access codes, such as database access passwords and similar sensitive information. This is a key reason why Github has turned into one of the most important osint tools when searching for company information.

This blog series will be divided into two parts and we will discuss some popular osint tools. Some of them are personal or organizational target-specific, some others are combined with dorks to detect leaked accounts.

Before getting into the real topic – brief update – Github’s core features are now for free, FOR EVERYONE.


Developed in Python3, gitGrabber is a tool that processes last indexed files from the current repos on GitHub. It digs for sensitive data used for online services like Google, Amazon (AWS), Paypal, Github, Mailgun, Facebook, Twitter, Heroku, Stripe, and Twilio. It prints on the screen the information that it considers significant, like token, git URL, etc., or transmits notifications over the Slack channel.

It not only checks the repo of the location we provide as an input to run the script, but also the other related repos. For this reason, it scans a bit slowly.

There is a keyword list available. One can search the whole content with the standard keyword file, search the file names, and do searches as nullencode, for instance (BROWSER_STACK_ACCESS_KEY). These keywords can be enhanced at any time. Any word given in quotation marks (“ ”) is considered as a new keyword, and if there is a leak in the repos related to that word, this way it will be detected.

Due to the regex spelling in the script, false-positive values may also appear, and there is a note that you can add regex if you want. For instance, a scan with default values was made for “x-company”, which lasted for 2 hours. However, no leak was detected.

This tool is suitable for target-oriented searches.

Digging process can be done with the following queries:
python3 -k wordlists/keywordsfile.txt -q ””
python3 -k wordlists/filename_keywords.txt -q ””
python3 -k wordlists/nullenc0de_keywords.txt-q ””
The keyword Yahoo was used to detect that the tool is actually working. In the git URLs that were opened, the keyword used in the search (yahoo), token and online service (Twilio and others) were all included.×356.png×657.png

There is a possibility of getting false positives, so every detected URL should be reviewed.


This tool provides the opportunity to search in GitHub and GitLab by giving the name of the organization, user name and similar parameters. It supports not only private repo scans, and repos that require key-based authentication.

This tool is a good tool for identifying whether there is information that will create a leak in the organization’s repos directly. However, the repos of the organization that users have created cannot be identified.

A sample scan was carried out in a few x-company repos. However, no leak was detected. To use this tool, one will need the GitHub usernames of the organization and its employees. For this reason, all employees’ and organization GitHub usernames should be removed.

In the following example is given one of the usernames of x-company.

Digging process can be done with the following parameters:
–repo Gets the Repo URL.
–github-user Gets the Github username
–github-org Can get the organization Github username


Compared to the other tools, Gitminer is a more content-rich tool, it does more advanced searches for content on Github. However, if the search criteria are not well-set, it can also extract a lot of irrelevant data.

It does not search in repo names or URLs, like the other tools. Gitminer searches everywhere. The modules it supports are as given in the picture: Asterisk, Gmail, WordPress, Senhas, passwords, root, joomla, ssh, mup.

Important data, both general and company-specific, can be obtained by using these modules. For example, when searched for passwords that contain the config extension and, it brought a leak about x-company.

In another example, user account leaks were detected. Simple regex was used for the detection of those leaks (‘(||’). In the file contents, it was requested to search the keyword “x-company”.

File name, file extension, keyword can be given while doing a search, and they can be combined using logical operators like AND, OR. When it comes to company-related searches regex must be used because 100 pages of URL information are usually given. Despite that, regex found parameters provide more precise results.

Here are some sample dorks that can be used:

** python3 –query ‘extension:php “root” in:file AND “” in:file’ -m passwords -c cookie.txt → This dork is used to search for passwords related to a university, in files with the extension php that contain in it.

** python3 –query ‘ “dbpasswd” in:file OR “password” in:file’ -m passwords -c cookie.txt -r ‘(firma domaini)’ → This dork is used to search for ‘dbpasswd’ or ‘passwords’ words inside file. If you give the company name with regex, it will also be included in file. URLs identified using regex can easily be spotted in the output.

** python3 –query ‘filename:configuration extension:php “public password” in:file’ -m joomla -c cookie.txt → This dork is used to search for passwords in Joomla configuration files.


By downloading the repo from GitHub, Gitrob performs sensitive file search with its commit, looks at names such as key, password, config, log. We can also set a parameter defining how many commits we will go back. Gitrob checks not only enterprise repos but also employees’ repos.

The files that are considered as important for x-company are as follows.

When the contents of the files are opened, we can see what is important and digs this file. This will be an advantage when it comes to false-positives elimination.