This is “A MoFo Privacy Minute,” where we will answer the questions our clients are asking us in sixty seconds or less.
Question: What risks should companies consider when storing unstructured data?
Answer: Storing unstructured data introduces both new and old data privacy and security risks, including: (1) heightening data breach risk and legal exposure; (2) unintended disclosures via AI; and (3) compliance with data transfer requirements.
Background
Unstructured data refers to data that lacks a predefined format, such as email communications, Slack messages, PDFs, shared folders, video recordings, and presentation decks. This data tends to be generated continuously, if often untagged or categorized, and largely unmanaged. Despite their informal nature, unstructured data repositories often contain the same types of sensitive information as structured systems to which data security and access controls are rigorously applied.
While unstructured data has historically been a frequent target of data breaches, the prevalence of generative AI and new U.S. data transfer regulations are introducing new risks. Companies may wish to consider addressing these risks as part of a comprehensive data governance strategy.
Unpacking the Risks
As noted above, unstructured data poses the following risks:
- Heightened Data Breach Risk and Legal Exposure: Unstructured data often contains personal data, trade secrets, material non-public information, or other confidential business information (“important data”). A data breach affecting this data often exposes data that was never identified as sensitive and thus not properly protected. In response to a cybersecurity incident, companies are forced to devote time and resources to investigating the affected data to determine if there are statutory or contractual obligations to provide notice to individuals, customers, business partners, and regulators, and addressing reputational damage caused by the exposure of sensitive information.
- Unintended Disclosures via AI: Generative AI tools integrated into company systems may unintentionally expose important data when trained on or given access to unstructured data. If unstructured data repositories are not properly permissioned, an AI tool may pull in the unstructured important data in responding to a prompt, exposing sensitive personal information or company trade secrets in ways that were not intended or anticipated. This exposure may lead to breaches of confidentiality agreements, insider-trading risks, or violations of data privacy laws.
- Compliance with Data Transfer Requirements: Recent U.S. Department of Justice regulations restrict access to U.S. sensitive personal data by persons or entities linked to China, Russia, and other countries of concern. To comply, companies are mapping their data flows and conducting due diligence on customers and vendors who may access bulk U.S. sensitive personal data. Unstructured data complicates this diligence, as companies may not know whether such data contains regulated information and, as such, may not be able to determine whether they are in compliance with the regulations.
Recommended Steps for Companies
Companies may wish to take a proactive, risk-based approach to managing unstructured data as part of a broader data governance and compliance strategy:
- Implement Monitoring and Usage Guidelines: Establish internal guidance discouraging employees from storing or transmitting sensitive data, particularly data that is subject to legal or contractual obligations, in unstructured formats.
- Conduct Data Due Diligence: Identify sources of unstructured data (e.g., cloud collaboration platforms, email servers, shared drives) and evaluate their contents to determine if they contain confidential information or personal data subject to data privacy, cybersecurity, or national security laws or regulations.
- Apply Tiered Access Controls: Where eliminating unstructured data repositories is not possible, enforce appropriate permissions and roles-based access and use data classification tools to apply restrictions based on content sensitivity.
- Enforce Document Retention and Deletion Policies: Many organizations retain unstructured data far longer than necessary, expanding the attack surface and increasing legal risk. Implement policies that ensure timely deletion of non-essential data in unstructured repositories similar to those applied to structured managed systems, consistent with applicable retention schedules.