Microsoft AI Repository Exposes 38TB of Data: A Tale in AI and Cloud Security

Wiz Research recently unveiled a startling incident involving Microsoft’s AI research team: an accidental exposure of 38 terabytes of sensitive data. This case brings forth essential questions and lessons about data security, especially when operating in the realm of artificial intelligence and cloud storage.

The Configuration Mishap

The issue arose when Microsoft’s AI research team shared a bucket of open-source training data through GitHub. They used Azure’s Shared Access Signature (SAS) tokens to generate a URL for data sharing.

*Definition of Shared Access Signature (SAS) from Azure’s website.*

Normally, these tokens can limit access to specific files or folders, but in this instance, the team misconfigured the SAS token. Instead of sharing just the intended data, the URL granted access to the entire Azure storage account, revealing an extra 38 terabytes of confidential information.

38 tb data exposed from cloud storage — *Inadvertent disclosure of 38TB of confidential information via a SAS token (Source:* *Wiz Research*)

The Stakes Were High

What makes this incident particularly concerning is the type of data exposed: disk backups of two Microsoft employees’ workstations, private keys, passwords, and a massive collection of 30,000 internal Microsoft Teams messages from 359 Microsoft employees.

Worse yet, the SAS token configuration allowed for “full control,” which means an attacker could not only read but also write, delete, or overwrite files.

confidential documents in cloud — *A limited selection of confidential documents was discovered in the computer backups. (Source:* *Wiz Research*)

What Could Have Gone Wrong?

While the team’s original aim was to share AI models for image recognition, the slip could have been disastrous. The AI models are stored in a CKPT file format, processed by TensorFlow, and serialized using Python’s pickle format, known to be susceptible to arbitrary code execution. An attacker could easily inject malicious code into the AI models, compromising every user who trusts Microsoft’s GitHub repository for these models with the rights to write or rewrite the files via the misconfigured SAS.

On surface-level inspection, Microsoft’s Azure storage account would appear private. The SAS tokens provided a false sense of security, as they made the data look inaccessible while masking the very real exposure.

Cloud Security: The Importance of Properly Configured Buckets

The evolution of technology has propelled the vast majority of organizational assets to the cloud. From services, databases, and IT tools to applications, the digital realm is now heavily cloud-centric. With this shift, the surface vulnerability to cyber-attacks has also broadened, especially when the migration process isn’t executed with utmost precision. Among these vulnerabilities, misconfigured or publicly accessible cloud buckets have emerged as a significant threat.

Securing vital assets like confidential data, source codes, and databases is paramount. Yet, incidents persist. Notably, in July, Proud Makatizen’s website, a platform by the City of Makati in the Philippines, experienced a leak. This was attributed to an Amazon Web Services S3 bucket misconfiguration, leading to the exposure of over 600,000 files.

Similarly, a 2020 study highlighted that 6% of all Google Cloud buckets were vulnerable, underlining the gravity of the situation given Google Cloud’s extensive global usage.

Another pertinent case is the BlueBleed leak, discovered by SOCRadar. Termed as a monumental B2B breach, it led to the exposure of sensitive data from over 65,000 entities across 111 countries, all traced back to a single misconfigured Azure Blob Storage.

Fortifying Cloud and Supply Chain Security with SOCRadar

As organizations increasingly rely on cloud infrastructures and third-party components, the need for vigilant security measures escalates. SOCRadar offers two targeted modules to meet these demands.

SOCRadar’s Cloud Security Module (CSM) detects new user-owned cloud storage, monitors bucket statuses, and sends real-time alerts for any changes. This proactive approach ensures your cloud assets are continuously monitored, reducing the risk of vulnerabilities.

*SOCRadar Cloud Security Module (CSM) under Attack Surface Management*

Complementing cloud security, the Supply Chain Intelligence module keeps a watchful eye on third-party components used by your organization. It provides up-to-date news and sends alerts for potential threats, enabling quick risk assessment and mitigation.