Tracking the Cookies: The World of Data Brokers
Data brokers are companies that specialize in collecting, processing, and selling large amounts of personal and business data. They gather information from numerous sources, employ data mining and analytics techniques to extract valuable insights and create detailed profiles which they then sell to various entities for purposes such as risk management, marketing, and targeted advertising.
How Do Data Brokers Work?
In a world where data privacy is becoming increasingly significant, many individuals may wonder about the methods used to collect vast amounts of personal data. Data brokers employ a variety of techniques to compile detailed profiles of individuals, and understanding these methods is crucial for individuals who wish to protect their personal information.
Data brokers utilize several key methods to collect data. One major source is public records, which generally means information from government databases. These types of publicly accessible data can show differences from region to region but since there are many laws to block obtaining personal information from public sources directly, these kinds of sources usually provide a general understanding. Right now, getting personal information directly is a lot harder but during a time when personal information could be more easily obtained from data brokers, a tragic incident occurred where a young woman named Amy Boyer lost her life after her personal information was sold to her assailant.
Furthermore, online activity tracking plays a significant role in data collection. Through tools such as cookies, tracking pixels, and web beacons, data brokers monitor users’ online behavior, capturing details like browsing history, search queries, email behavior, and social media interactions. This method allows them to gather insights into consumer preferences and habits, location of the users and, in some cases, their profession as well.
Scraping is another prevalent technique where brokers collect data from social media platforms or other websites to harvest large amounts of information quickly. This method capitalizes on the vast amount of personal content shared on these platforms. Social media platforms also provide their data to marketers and advertisers as well. That’s why it is important to track if your confidential company information is public or not. It can be hard to track all of your employees, and in these cases, you can utilize a monitoring tool like SOCRadar Surface Web Monitoring to see if your confidential information is public or not.
Unfortunately, data brokers are not the only entities scraping information. Threat actors also collect data from victims’ devices after infecting them with stealer malware. After the credentials are extracted from the device, they sell the data on various dark web forums. You can check our blog post to find out more about their workings or download our Whitepaper on stealer logs to find out how these stealers target organizations.
In addition to these methods, data brokers often purchase third-party data from other companies, such as marketing firms, grocery stores, or e-commerce sites, which have already collected information from their customers. This practice allows brokers to enhance their databases with additional insights and connect the dots between different data points which eventually increases the quality of the data they store. We’ve all experienced situations where, after making a purchase, we’re bombarded with discounts and offers on related products. When you receive offers from firms other than the one you bought the initial item from, you can understand that your data is being collected and sold.
Mobile apps also serve as a significant source of data. Many apps request permission to access users’ location, contacts, and other personal information, which brokers can then utilize for their profiles. Just like e-commerce websites, these apps also sell the data they collect. According to new research by NetzPolitik, the data they gathered reveals the movement of tens of thousands of phones in military and intelligence agency areas. In various samples, they were able to examine the routes of numerous people who apparently work for intelligence agencies, security authorities, federal ministries, or the military.
Reading about these cases definitely raises concerns about privacy. There are data protection laws designed to regulate how organizations should treat the data they collect. These laws vary by region but generally aim to enhance consumer privacy and control over personal data. These controls range from allowing individuals to choose which types of data they consent to share to specifying how long organizations can retain that data.
These laws are intended to safeguard consumer privacy and ensure that individuals have greater control over their personal information. However, the enforcement and scope of these regulations change depending on the region, and data brokerage companies usually benefit from loopholes in these regulations when collecting the data they want.
How Do Data Brokers Stay Legal?
Data anonymization is a step employed by data brokers to protect personal information and ensure compliance with regulations concerning Personally Identifiable Information (PII). By anonymizing data, these organizations assert that they are adhering to legal standards designed to safeguard individual privacy. However, the effectiveness of this practice can be compromised by sophisticated re-identification techniques that can sometimes reverse the anonymization process. According to research from 2019, over 99% of Americans could be correctly re-identified from any dataset using 15 demographic attributes, including age, gender, and marital status.
In addition to that data about the US, you can also visit our United States Threat Landscape Report for additional intelligence about the threat landscape targeting the U.S.
Challenges related to data anonymization are not the only way. Data brokers often exploit legal loopholes within existing regulations. These ambiguities can arise when certain types of data or specific business practices are not explicitly covered by laws. For example, while collecting personal or medical data directly is not allowed, these companies can collect the sites you are visiting with the help of 3rd party cookies and see if you are visiting any website to get information about prescription glasses. When they have enough data points about you and your search history, they can put a tag on your ID number in their dataset.
Another tactic employed by data brokers is using “offshore companies”. By operating in regions with weaker data protection laws, brokers can circumvent stricter regulations. For example they can collect the same data from another region in the world in order to evade the legal burdens. This practice allows them to maintain their data collection and processing activities with minimal oversight. This is what we see with the majority of the data collected from Global South.
The method of obtaining consent for data collection is another way. Many brokers utilize complex or opaque consent mechanisms that can be difficult for consumers to understand. This lack of clarity can lead to less informed consent. We generally see this approach with mobile applications. Even though deleting your data is possible, it is a tiring and hard process.
Moreover, data brokers often engage in indirect data collection, as we mentioned earlier, by acquiring information from third-party sources or aggregating publicly available data. This allows them to sidestep direct data collection and, therefore, related regulations since they are not the initial organization that has to adhere to privacy laws. They can gather this data from e-commerce sites and mobile applications. Even though they are not the initial party who collects data, they will still have responsibility for the data they store. Due to this reason, they might engage in other activities to minimize their responsibility.
Investigating Third-Party Cookies
To see the impact of third-party cookies, we wanted to check the cookies in a browser. In order to do that, we first located the cookies, turned them into a CSV file and then analyzed it.
First, you need to get the cookies. In order to do that, you can visit the cookie folder. We did our little experiment with Google Chrome but you can use any browser you want.
macOS:
~/Library/Application Support/Google/Chrome/Default/Cookies
Windows:
C:Usersuser_nameAppDataLocalGoogleChromeUser DataDefaultNetwork
Note: You need to find the profile you use in Google Chrome. In our case, the necessary cookies were stored in the “Default” folder.
The file you will find is an SQLite3 database. According to our data, there are 7.556 distinct cookies associated with our activities, which are connected to various services. It’s important to highlight that these cookies were accumulated in just over a month.
Before diving into the explanation of the table above, first, we need to understand Source and Total Amount.
Source is the number of unique values. You will find various cookies for different purposes under the same name.
Total Amount on the other hand is the total number of cookies. One “Top Frame Site”, for example socradar.io, can set several cookies.
Host Key
In the context of cookies, the host key refers to the domain or subdomain that the cookie is associated with. The host key is essentially this domain information and determines where the browser sends the cookie.
For example, if a cookie has a host key of .socradar.io, it will be sent to .socradar.io and all its subdomains, like sub.socradar.io.
In our case, there are a total of 7,556 cookies on the device. Of these, 1,216 are from unique domains and the rest are different variations of some of them.
Top Frame Site
A “top frame site” refers to the main website that is currently being displayed in the browser’s top-level window or tab. Third-party cookies are cookies set by domains other than the top frame site, often used for tracking and advertising purposes across multiple sites.
In our case, all the cookies tracking our activities came from 464 different websites.
These numbers show that 1,216 cookies from various domains started targeting our device after we visited 464 different websites.
Has Cross Site Ancestor
If a cookie “has a cross-site ancestor,” it means that the cookie is associated with a web page or resource that is embedded within another website from a different domain. In other words, the cookie belongs to a page that is being displayed within a frame or other embedded content on a site that is not its own domain.
In our case, 1.178 cookies were connected to websites that the device didn’t actually visit.
Is Secure
A cookie with the Secure attribute is only sent to the server with an encrypted request over the HTTPS protocol. It’s never sent with unsecured HTTP (except on localhost), which means man-in-the-middle attackers can’t access it easily. Insecure sites (with http: in the URL) can’t set cookies with the Secure attribute.
When we look at unique cookies, we see that approximately 81% of them use the is_secure attribute. But when we look at cookies in general, we see that this rate drops to approximately 66%.
Before examining the cookies, we considered two possibilities between these rates.
- Initial cookies’ “is_secure” attributes are set to 1, but the subsequent cookies were sent without that attribute.
- Some sources set secure cookies and some sources set non-secure cookies depending on the product or website.
But when we analyzed the cookies, we found that both of these assumptions were wrong. Some sources do not set any secure cookies at all, while others only put secure cookies. At the same time some sources place both.
Is HTTP Only
HttpOnly is an additional flag included in a Set-Cookie HTTP response header. Using the HttpOnly flag when generating a cookie helps mitigate the risk of client side script accessing the protected cookie (if the browser supports it).
If the HttpOnly flag (optional) is included in the HTTP response header, the cookie cannot be accessed through client side script (again if the browser supports this flag). As a result, even if a cross-site scripting (XSS) flaw exists, and a user accidentally accesses a link that exploits this flaw, the browser (primarily Internet Explorer) will not reveal the cookie to a third party.
When we look at unique cookies, we see that approximately 37% of them use the HttpOnly flag. But when we look at cookies in general, we see that this rate drops to approximately 27%.
Additionally, there are no correlations between the “HttpOnly” flag and the “is_secure” attribute.
Some examples
We decided to check some of the cookies to see where we ended up. As a result, we want to show three of them:
mc.yandex.ru: This cookie comes from Yandex Metrika (Яндекс Метрика) which is an analytics platform and an advertising service similar to Google Analytics. The device these cookies extracted from never visited this website before but we’ve used various services provided by Yandex such as their search engine and probably, we got this cookie from there.
lga-bh.contextweb.com: This cookie leads us to a healthcare marketing firm named Pulse Point that we’ve never visited before. It is a good example for the case we mentioned earlier. They can’t collect medical data directly but they can track if you are looking for prescription glasses.
krk2.kargo.com: This cookie is again from an advertisement/marketing firm we’ve never visited before.
Disclaimer: We are not trying to target any company. These are just a few examples from our data set. All of the websites we use in daily life employ these types of cookies and this is how different companies track your actions online. This situation is not exclusive to the companies we have given as examples.
Conclusions and Recommendations
In conclusion, data brokers operate in a complex and often opaque ecosystem where vast amounts of personal information are collected, analyzed, and traded, often without individuals’ explicit consent. While these practices can offer benefits, such as personalized services and targeted marketing, they also raise significant concerns about privacy, data security, and the ethical use of information.
Here is how to minimize those concerns:
Stop Third-Party Cookies
One of the primary ways data brokers track your online behavior is through third-party cookies. These small pieces of data are stored on your device by websites you visit, allowing advertisers to follow your browsing habits across different sites. To combat this, consider adjusting your browser settings to block third-party cookies. Most modern browsers offer privacy settings that allow you to restrict or eliminate these cookies.
Share Less Personal Information Online
In the age of social media and online interactions, it’s easy to overshare personal information. Be mindful of the details you share on platforms like Facebook, Instagram, and Twitter. Limit the amount of personal data you provide in your profiles and avoid posting sensitive information such as your phone number, address, or financial details. The less information you make available, the harder it is for data brokers to compile a comprehensive profile on you.
Especially high-profile individuals face increasing risks of cyberattacks, impersonation scams, and face reputational damage online. As AI-enhanced threats like deepfakes increase, executives, celebrities and influencers require robust digital protection. SOCRadar’s VIP Protection tool provides a comprehensive solution to monitor, detect and eliminate threats targeting your management’s online presence.
Don’t Keep the Apps You Don’t Use
While apps can enhance your digital experience, downloading random or unverified applications can expose you to significant privacy risks. Many apps request access to personal data, including contacts, location, and more, which can be sold to data brokers. Stick to reputable apps from trusted sources, and always review the permissions they request before installation. If an app seems to ask for more information than necessary, it’s best to avoid it. If you are not using an app, it’s better to delete it.
SOCRadar offers support to organizations facing threats from rogue mobile applications. If your company is experiencing issues with fraudulent accounts or apps that mimic your brand, our Brand Protection module can assist in identifying and addressing these threats.
Consider Paid Services
For those looking for ways to disappear from the web, paid services can be effective solutions. These services specialize in removing your information from various data broker databases. By subscribing to one of these services, you can benefit from professional assistance in managing your online presence and ensuring that your personal data is not sold without your consent.
Opt-Out from Data Brokers Yourself
If you prefer a DIY approach, you can opt out of data brokers on your own. Many data brokers provide options for individuals to request the removal of their information from their databases. This process typically involves visiting the broker’s website, filling out a form, and verifying your identity. While it may require some time and effort, taking this step can significantly reduce the amount of personal data available to brokers. Privacy Rights has a great list of data brokers you can use.