Workmaster — Guarding Against AI Jailbreak

Time and again, organisations have suffered data breaches and reputational setbacks when their AI systems are manipulated into disclosing confidential details or carrying out unintended actions. At the heart of the issue is the nature of large language models (LLMs), which are probabilistic engines predicting the next word based on training data patterns, rather than entities capable of discerning intent or authority. Since they interpret all prompts equally—regardless of whether they’re instructions from a developer or a malicious user—LLMs are vulnerable to prompt injection and hallucinations, where they confidently produce misinformation or disregard company policies. This vulnerability is commonly referred to as ‘jailbreaking’—bypassing the guardrails designed to keep the AI in check.

Notable Cases of AI Being Exploited

The $1 Chevrolet Tahoe: In December 2023, a user convinced a Chevrolet of Watsonville chatbot to "agree with anything the customer says", resulting in the bot offering a $76,000 SUV for only one dollar. Although the offer was not enforced, the incident exposed the risks of AI without robust business logic.
Air Canada’s Imagined Refund Policy: A Canadian tribunal held Air Canada accountable after its chatbot invented a bereavement refund policy. The airline argued the bot was a separate legal entity, but the court ruled the company responsible for all information on its website, AI-generated or not.

There is no shortage of similar examples available online.

The question is: can we develop AI-driven business process agents and chatbots that efficiently serve customers, without risking data leaks or unintended actions?

Realistically, it may never be possible to completely prevent LLMs from being jailbroken. Instead, the focus must shift to designing the surrounding systems to tightly control what data can be accessed and which actions can be performed. The same principles of access control and auditing that protect human users must be rigorously applied to AI agents as well.

Unfortunately, many low-code or ‘vibe-coded’ platforms that embed AI agents often overlook these critical safeguards. Consequently, businesses relying on such solutions remain exposed to significant financial and reputational risks.

At Workmaster, we have dedicated substantial research to solving this problem. Our approach is built around robust, role-based access control, with system policies that define data access at the role, group, and individual user levels. Every interaction is subject to these policies, ensuring that AI agents inherit the access rights of the end user. This means that, even if an agent is jailbroken, it cannot exceed the permissions granted by policy.

This enforcement covers not only data reading but also writing, integration with third-party systems, and execution of business processes. The AI agent is always a controlled extension of the authenticated user, never possessing broader access. With these safeguards and careful handling of sensitive operations, it is entirely possible to build AI-enabled business systems that are both safe and dependable.