Accidental Microsoft AI 38TB Data Leak by Researchers

AI researchers at Microsoft have made a huge mistake. According to a new report from cloud security company Wiz, the Microsoft AI research team accidentally leaked 38TB of the company’s private data. 38 terabytes. That’s a lot of data. Microsoft AI data leak included full backups of two employees’ computers. These backups contained sensitive personal data, including passwords to Microsoft services, secret keys, and more than 30,000 internal Microsoft Teams messages from more than 350 Microsoft employees.

So, how AI data breach happen?

The report explains that Microsoft AI data leak resulted as the team uploaded a bucket of training data containing open-source code and AI models for image recognition. AI employee data leak occurred as the users who came across the Github repository were provided with a link from Azure, Microsoft’s cloud storage service, in order to download the models.

One problem that lead to AI data breach

The link that was provided by Microsoft’s AI team gave visitors complete access to the entire Azure storage account. And not only could visitors view everything in the account, they could upload, overwrite, or delete files as well.

Wiz says that this occurred as a result of an Azure feature called Shared Access Signature (SAS) tokens, which is “a signed URL that grants access to Azure Storage data.” The SAS token could have been set up with limitations to what file or files could be accessed. However, this particular link was configured with full access resulting in AI data breach.

Was AI data breach due to the shareable link?

Adding to the potential issues, according to Wiz, is that it appears that this AI data breach was expected as the data has been exposed since 2020.

The link was deliberately included with the files so that interested researchers could download pretrained models that part was no accident. Microsoft’s researchers used an Azure feature called “SAS tokens,” which allows users to create shareable links that give other people access to data in their Azure Storage account.

Wiz contacted Microsoft earlier this year, on June 22, to warn them about their discovery. Two days later, Microsoft invalidated the SAS token, closing up the issue. Microsoft carried out and completed an investigation into the potential impacts in August.

Microsoft on AI data leak

Providing a statement, Microsoft claimed “no customer data was exposed, and no other internal services were put at risk because of this issue.”

Microsoft also explained that it rescans all its public repositories, but its system had marked this particular link as a “false positive.” The company has since fixed the issue, so that its system can detect SAS tokens that are too permissive than intended in the future. While the particular link Wiz detected has been fixed, improperly configured SAS tokens could potentially lead to data leaks and big privacy problems. Microsoft acknowledges that “SAS tokens need to be created and handled appropriately” and has also published a list of best practices when using them, which it presumably (and hopefully) practices itself.

Wiz says. Microsoft assures in its own report of the incident, however, that “no customer data was exposed, and no other internal services were put at risk.”