DeepSeek’s Flagship AI Model Under Fire for Security Vulnerabilities

31.01.2025

0 97 5 minutes read

R1, the latest large language model (LLM) from Chinese startup DeepSeek, is under fire for multiple security weaknesses.

The company’s spotlight on the performance of its reasoning LLM has also brought scrutiny. A handful of security research reports released in late January have highlighted flaws in the model.

Additionally, the LLM critically underperforms in a newly launched AI security benchmark designed to help security practitioners and developers test LLM applications for prompt injection attacks that can lead to exploitation.

DeepSeek-R1: Top Performer with Security Issues

Like OpenAI’s o1, DeepSeek-R1 is a reasoning model, an AI trained with reinforcement learning to perform complex reasoning.

As of January 31, 2025, R1 is ranked sixth on the Chatbot Arena benchmark, one of the most recognized methods to evaluate the performance of LLMs.

This means R1 performs better than leading models such as Meta’s Llama 3.1-405B, OpenAI’s o1 and Anthropic’s Claude 3.5 Sonnet.

However, DeepSeek’s latest model performs poorly in WithSecure’s Simple Prompt Injection Kit for Evaluation and Exploitation (Spikee), a new AI security benchmark.

Read more: Chinese GenAI Startup DeepSeek Sparks Global Privacy Debate

WithSecure Spikee Benchmark

This benchmark, launched on January 28, is designed to test AI models for their resistance to prompt injection attacks with real AI workflow use cases.

In practice, researchers at WithSecure Consulting assessed the susceptibility of LLMs and their applications to targeted prompt injection attacks, analyzing their ability to distinguish between data and instructions.

Speaking with Infosecurity, Donato Capitella, AI Security Researcher at WithSecure Consulting, explained: “Unlike existing tools that focus on broad jailbreak scenarios (e.g. asking an LLM to build a bomb), Spikee prioritizes cybersecurity threats such as data exfiltration, cross-site scripting (XSS), and resource exhaustion, based on real-world outcomes and pentesting practices.”

“Instead of focusing on broad prompt injection scenarios, we try to evaluate how a hacker can target an organization or a tool that an organization has built or relies on, with an LLM,” he added.

At the time of writing, the WithSecure Consulting team has tested 19 LLMs against an English-only dataset of 1912 entries built in December 2024, including common prompt injection patterns observed in its pentesting and security assurance practice.

The above table shows the attack success rate (ASR) in different scenarios (bare prompting, w/ system message, w/ spotlighting). A lower ASR indicates an LLM that is more proficient at distinguishing data from instructions and, thus, less vulnerable to the prompt injection patterns in the dataset. Source: WithSecure Consulting

The researchers evaluated LLM use cases based on four scenarios:

Bare Prompt: The LLM is used in a workflow or application without being given any instructions
With System Message: The LLM used in a workflow or application is provided with specific rules intended to protect prompt injection attacks
With Spotlighting: The LLM used in a workflow or application is provided with data markers indicating where to apply the task given in the original prompt
With System + Spotlighting: The LLM used in a workflow or application is given specific rules intended to protect prompt injection attacks and data markers to instruct it where to apply the task given in the original prompt

Capitella noted that adding specific rules and data markers can help protect LLM workflows or applications from prompt injection attacks that would otherwise succeed when the LLM is used alone.

DeepSeek-R1 ranks 17^th out of the 19 tested LLMs when used in isolation – with an attack success rate (ASR) of 77% – and 16^th when used alongside pre-defined rules and data markers, with an ASR of 55%.

By comparison, OpenAI’s o1-preview ranks fourth when used in isolation – with an ASR of 27% – and tops the ranking when used alongside pre-defined rules and data markers, the tests showing no successful attacks against the LLM.

According to Capitella, a poor score means that DeepSeek’s team responsible for building R1 “neglected safety and security training to make the model resistant to types of attacks that we’ve observed.”

Instead, they likely focused on achieving certain scores in specific LLM performance benchmarks.

“Organizations willing to use DeepSeek-R1 in their workflows should carefully consider which use cases they want to use it for, which data they plan to give it access to and what they might be exposing this data to,” the researcher added.

Security Reports Highlight DeepSeek-R1’s Vulnerabilities

Additionally, security reports have started to show that R1 also has many security weaknesses that could expose any organizations deploying the LLM.

According to a January 27 report by consultancy Kela Cyber, DeepSeek-R1 is highly susceptible to cyber threats, making it an easy target for attackers exploiting AI vulnerabilities.

Kela Cyber’s testing revealed that the model can be easily jailbroken using a variety of techniques, including via the “Evil Jailbreak’ method, which exploits the model by prompting it to adopt an ‘evil’ persona.

Red teamers were able to jailbreak OpenAI’s GPT 3.5 using this technique in 2023. OpenAI has since implemented the proper guardrail to render Evil Jailbreaks inefficient on later models, including GPT-4 and GPT-4o.

The output generated by DeepSeek explains how to distribute malware for execution on victim systems. Source: Kela Cyber

The Palo Alto Networks research team, Unit 42, has found that DeepSeek’s R1 and V3 models are vulnerable to three distinct jailbreaking techniques: Crescendo, Deceptive Delight and Bad Likert Judge.

Crescendo is a well-known jailbreaking technique leveraging an LLM’s own knowledge by progressively prompting it with related content, subtly guiding the conversation toward prohibited topics until the model’s safety mechanisms are effectively overridden.

Deceptive Delight and Bad Likert Judge are two novel techniques developed by Unit 42.

The former is a straightforward, multi-turn jailbreaking technique where an attacker bypasses an LLM’s safety measures by embedding unsafe topics among benign ones within a positive narrative.

The latter jailbreaking technique manipulates an LLM by asking it to evaluate the harmfulness of responses using a Likert scale, a measurement of agreement or disagreement toward a statement. The LLM is then prompted to generate examples aligned with these ratings, with the highest-rated examples potentially containing the desired harmful content.

Unit 42 shared their findings in a report published on January 30.

AI security firm EnkryptAI performed a red teaming exercise on several LLMs using three security frameworks: OWASP Top 10 for LLMs, MITRE ATLAS and the US National Institute of Standards and Technology’s AI Risk Management Framework (NIST AI RMF).

The LLMs tested by EnkryptAI included DeepSeek-R1, Open AI’s o1, OpenAI’s GPT-4o and Anthropic’s Claude-3-opus.

The red teamers found that compared to OpenAI’s o1 model, R1 was four times more vulnerable to generating insecure code and 11 times more likely to create harmful outputs.

A fourth report by AI security firm Protect AI saw no vulnerabilities in the official version of DeepSeek-R1 as uploaded on the AI repository HuggingFace. However, the researchers found unsafe fine-tuned variants of DeepSeek models that have the capability to run arbitrary code upon model loading or have suspicious architectural patterns.

Infosecurity reached out to DeepSeek for comments, but the company has not responded as of publication time.

Photo credit: Michele Ursi/Robert Way/Shutterstock

Read more: DeepSeek Exposed Database Leaks Sensitive Data

31.01.2025

0 97 5 minutes read