AI agents & prompt injections
Prompt injections are security vulnerabilities in LLM applications. A prompt injection can be direct or indirect. Direct prompt injections “trick” the AI using user prompts. For example, “ignore the above directions and do…” indirect prompt injections “trick” the AI by hiding malicious prompts in the data that it uses, for example a webpage.
Prompt injections work because differentiating between system messages and user messages is difficult for AI. A system message is a set of instructions from the developer to the AI. A system message for a flight booking AI chatbot might be “You are a helpful AI flight booking assistant…” A user message on the other hand is what the user writes. Using our flight booking chatbot example, a user message might be “Can you find me the cheapest flight to London…”
Guardrails
Imagine a restaurant that uses an AI agent that can make bookings, cancel bookings and make refunds for cancelled bookings. All of a sudden, the owner discovers that the restaurant is experiencing a lot of cancellations. It seems that refunds are being made to people who didn’t make bookings. This is an example of how a prompt injection could be used to exploit an AI agent. Now let’s see how to secure AI agents using something called guardrails.
Guardrails can be used to prevent harmful or biased content and to enforce security. Azure AI Content Safety is an example of a tool that can be used for this. It has features including text moderation, image moderation, groundedness detection, prompt shields and protected content detection. In the context of our table booking AI agent, safety guardrails could’ve been used to prevent unauthorised refunds and to block malicious prompts.
Conclusion
OWASP (Open Web Application Security Project) is a software security organisation that provides security guidelines to help developers protect their applications. OWASP releases an annual top 10 list of cybersecurity risks. Prompt injections top the list of the OWASP Top 10 for Large Language Model Applications for 2025. Following OWASP’s recommendations and using guardrails will mitigate the risk.
If you would like to work with emerging technologies such as this, then Solita is the right place for you! Check out our open positions.