A MoFo Privacy Minute: Managing Cybersecurity Concerns When Fine-Tuning LLMs
A MoFo Privacy Minute: Managing Cybersecurity Concerns When Fine-Tuning LLMs
This is A MoFo Privacy Minute, where we answer the questions our clients are asking us in sixty seconds or less.
Question: Our organization wants to fine-tune a Large Language Model (LLM) for a specific domain or area of interest. Does this create any additional cybersecurity risks and how should we approach risk mitigation?
Answer:
LLMs have emerged as key tools for organizations seeking to enhance operational efficiency and to bolster innovation. These models, powered by advanced machine learning algorithms, can process and generate content across modalities (including text, images, and video), making them invaluable across various sectors.
Fine-tuning an LLM involves adapting a pre-trained LLM so that it has a more nuanced understanding of a problem. This can be achieved by conducting further training on the pre-trained LLM using a smaller, domain-specific dataset, which refines the model’s ability to understand and generate more relevant, targeted output. For example, in the healthcare industry, a fine-tuned LLM could analyze patient data more effectively to assist in specific tasks, like diagnosing medical conditions or enhancing the quality of care.
However, research indicates that fine-tuning an LLM can increase its susceptibility to security incidents.
Research conducted by Cisco[1] revealed that fine-tuned LLMs are more likely to be vulnerable to prompt injection attacks,[2] which can compromise the safety guardrails built into base LLMs. Especially in regulated sectors like financial services or healthcare, the consequences of such manipulation could be severe.
Fine-tuned LLMs are more focused and calibrated to follow specific instructions relevant to the domain or area of interest for which the LLM has been adjusted, making them less resilient to unexpected or malicious inputs because they are optimized for particular types of interactions rather than a variety of novel inputs. On the other hand, base LLMs are trained on broader datasets and designed to handle a wider range of inputs. For example, if an LLM has been fine-tuned on financial records, the LLM is more likely to be able to recognize and respond to a prompt from a threat actor regarding financial records because the fine-tuning has biased the LLM to assess the prompt as legitimate. On the other hand, the base LLM would be less likely to interpret and understand such a specific prompt, and therefore less likely to provide the requested financial records.
Organizations should consider the following steps to mitigate cybersecurity risk:[3]
Regardless of whether an organization chooses to use RAG or fine-tuning to tailor an LLM for a specific domain or area of interest, organizations should ensure that they continue to invest in cybersecurity governance alongside their efforts to innovate or risk creating further legal and regulatory issues.
[1] Cisco, 2025 Annual Report – the State of AI Security.
[2] Prompt injection attacks occur when threat actors use input prompts to alter the model’s behavior or outputs. This can result in the extraction of sensitive information or the production of harmful outputs (such as malware).
[3] Specific steps to mitigate the risks of deploying AI have also been set out in the Joint High-Level Risk Analysis on AI co-signed by a consortium of international cybersecurity bodies at the Paris AI Summit (February 2025).
[4] Isabel Barberá, AI Privacy Risks & Mitigations – Large Language Models (LLMs), European Data Protection Board Support Pools of Experts Program (March 2025).
Practices