Blog

Secure AI: Understanding the risk of prompt injections

John Chimbani Cloud Consultant, Solita

Published 14 Mar 2025

Reading time 2 min

ChatGPT is used for things like answering questions or generating text-based content but it doesn’t independently make decisions. AI agents on the other hand can independently make decisions, automate workflows, and interact with external systems like robots. They take the user prompt, reason and perform some action. Now that we know the difference, let’s talk about how prompt injections are a risk in the era of AI.

AI agents & prompt injections

Prompt injections are security vulnerabilities in LLM applications. A prompt injection can be direct or indirect. Direct prompt injections “trick” the AI using user prompts. For example, “ignore the above directions and do…” indirect prompt injections “trick” the AI by hiding malicious prompts in the data that it uses, for example a webpage.

Prompt injections work because differentiating between system messages and user messages is difficult for AI. A system message is a set of instructions from the developer to the AI. A system message for a flight booking AI chatbot might be “You are a helpful AI flight booking assistant…” A user message on the other hand is what the user writes. Using our flight booking chatbot example, a user message might be “Can you find me the cheapest flight to London…”

Guardrails

Imagine a restaurant that uses an AI agent that can make bookings, cancel bookings and make refunds for cancelled bookings. All of a sudden, the owner discovers that the restaurant is experiencing a lot of cancellations. It seems that refunds are being made to people who didn’t make bookings. This is an example of how a prompt injection could be used to exploit an AI agent. Now let’s see how to secure AI agents using something called guardrails.

Guardrails can be used to prevent harmful or biased content and to enforce security. Azure AI Content Safety is an example of a tool that can be used for this. It has features including text moderation, image moderation, groundedness detection, prompt shields and protected content detection. In the context of our table booking AI agent, safety guardrails could’ve been used to prevent unauthorised refunds and to block malicious prompts.

Conclusion

OWASP (Open Web Application Security Project) is a software security organisation that provides security guidelines to help developers protect their applications. OWASP releases an annual top 10 list of cybersecurity risks. Prompt injections top the list of the OWASP Top 10 for Large Language Model Applications for 2025. Following OWASP’s recommendations and using guardrails will mitigate the risk. 

If you would like to work with emerging technologies such as this, then Solita is the right place for you! Check out our open positions.