CalypsoAI Inference Red-Team

Reports are ready for their close-up

We’re continuing to refine and improve the Red-Team report to make it more accurate and informative:

Clearer summary and scoring at the top, including number of recommendations.
Consistent presentation of data: All attack results show the number (or presence) of vulnerabilities compared to the total number of attacks.
Improved presentation of the vulnerabilities by intent category data.
Visibility into results from agent attack prompts (these are agentic single-turn attacks based on custom intents).
Detail view of results for each custom intent.
Mitigation actions are now “recommendations”.
New print button.
Relocation of “View raw data” button to the top to increase the scroll area.

Printable report

Sometimes the best way to communicate is still with paper or PDF. With that in mind, we’ve created a printer-friendly version of the Red-Team report that you can print or save to share with others.

Every attack in a single campaign

In earlier versions, users had to choose between Standard campaigns with Signature attacks and Custom campaigns with Agentic Warfare. Now every attack type is available in a single campaign.

There are 4 attack types available:

Signature attacks—out-of-the-box, curated and tested single-turn prompts based on common malicious intents. Every month we release a new pack of 10,000+ signature attacks in a prompt pack.
Operational attacks—traditional denial-of-service and denial-of-wallet application attacks reformulated for AI.
Agentic Warfare—dynamic, multi-turn attacks based on user-provided custom prompts that learn from and adjust to the model’s responses.
Agent attack prompts—dynamic, single-turn attacks based on user-provided custom intents that leverage the same attack vectors used in Signature attacks. (These were confusingly also called “Signature attacks” in earlier versions, so we’ve changed the name.)

When creating an attack campaign:

By default, the latest signature prompt pack is selected, along with all attack vectors and converters in that pack.
Operational attacks are not enabled by default, but must be manually selected.
Agentic Warfare and agent attack prompts are disabled until at least one custom intent has been provided.
As you make selections, a summary appears at the bottom of the panel.

Two new Agentic Warfare attacks

With Agentic Warfare, the system adopts the behavior of a human adversary, trying different techniques to achieve a malicious intent, and fine-tuning prompts based on the LLM's responses. Like human adversaries, Agentic Warfare keeps trying until it achieves its goal, and it will backtrack and re-calibrate when it perceives it’s going down a dead end.

In this release, we’re launching two sophisticated new multi-turn attacks that exploit the LLM’s human-like cognitive abilities and flaws in reasoning—FRAME and trolley.

FRAME (find rational arguments and make excuses). This attack bypasses model safeguards with rational arguments and excuses that surface a cognitive dissonance in the model—the belief that the intent is beneficial and should not be blocked by safeguards.
Trolley. Drawing on the famous “trolley problem” from ethics, this attack presents the model with the dilemma of having to choose between two bad outcomes, one or both of which might bypass its safeguards.

Cover more use cases in Agentic Warfare

Now you can add up to 5 custom intents in the same Agentic Warfare campaign. Combined with the new attacks (above), this means you can address more use cases, evaluate more possible attack vectors, and get greater value out of each attack run.

New attack vector: Refusal suppression

The April prompt pack includes a new attack vector: refusal suppression. This attack puts constraints on how the model can respond when refusing to fulfil the original prompt, which makes harmful or unauthorized responses more likely. Refusal suppression attacks are available with the April prompt pack and in the Agent Attack Prompts section in this release.

Report scheduling

Red-Team users can now create a recurring schedule for attack runs and reports. This is particularly useful for:

Model governance. Teams can run the “All attacks” campaign monthly on the model they’re using to see if it’s resilient to the latest attacks.
Build red-teaming into standard application testing. Run a custom suite of attacks on your AI system as part of your sprint or testing cycles
App vulnerability reporting. Set up the system to run scheduled vulnerability reports on all your employee- or customer-facing AI applications.

With Red-Team scheduling you can:

Select a date in the future to run a single report.
Set up a recurring schedule of reports with the same campaign and targets.
Set a daily or weekly schedule, or select your own weekly interval up to 52 weeks.
Limits runs to a set number or select an end date.
See a summary of the schedule before you save.
Cancel and update an existing schedule.
Filter reports by scheduling status.
See all the scheduled or completed reports in a scheduling series by using the “related reports” filter.

CalypsoAI Inference Defend

5x improvement in scanner latency

If you’ve ever used a slow chat interface, or had to sign the check for out-of-budget GPUs, then you’ve felt the pain of latency. CalypsoAI’s research team went on a mission to make our scanners faster and came back with a massive 5x speed boost, while still maintaining the same or better levels of accuracy.

New PII and prompt injection packages

As part of the 5x latency improvement we’re releasing all-new versions of our out-of-the-box PII and prompt injection packages. These scanners have been re-engineered to maximize efficiency and speed with accuracy of at least 90%.

On-prem users will need to update their version to get the new scanners; SaaS users will be updated automatically. If individual scanners are active or inactive in existing deployments, those settings will be preserved upon upgrade, with one exception: In the new package, the address and full name scanners have been combined into one (“Full name and postal address”). On upgrade, this scanner will be inactive by default and customers will need to activate it manually if they want to use it.

Perceptive users may noticed that the “Adversarial attacks” scanner is missing from the new Prompt injection package. In the course of improving scanner latency, the research team was able to include this coverage in the package as a whole without reducing fidelity.

Bug fixes

In Attack campaigns, prompt packs now correctly act as a filter on the list of attack vectors below. When selecting an individual prompt pack, only the attack vectors present in that pack are shown below.
In some cases, switching between Standard and Agentic Warfare tabs when creating a new campaign caused incorrect selections to be shown in the Standard tab. This has been fixed.
We fixed a problem with Prompt History downloads, where a Project filter was not reflected in the download contents.
We improved the visibility of collapsed sections on the Scanners page.
Custom GenAI scanners weren’t showing a description in their corresponding Playground card. This has been fixed.
Red-Team report names now have a maximum length of 100 characters.
We fixed a couple of UI issues that caused the navigation to stay or become in the collapsed state unexpectedly.
We made some inconsistent button labels consistent on the Scanners screen.
In some cases, users could not save a custom scanner after switching between scanner types. This is fixed.
We’ve improved the instructions in the “Upload a file” function for GenAI custom scanners to be more understandable.
Users can now cancel an in-progress attack run.
We uncovered a few edge cases related to the Custom Roles behavior that have been fixed.
Editing one campaign unintentionally put all other campaigns into edit mode as well. This has been corrected so that only the selected campaign is editable; all others are in view mode, as expected.
We rewrote an incorrect explanation text associated with what were previously called “Signature attacks,” now called “Agent prompt attacks,” in the Agentic Warfare campaign tab.

Known issues

Some customers may experience slow load times on the Red-Team Reports page and the Custom Roles page.
The CalypsoAI scanner packages table still shows headings for “Scan,” “Status,” and “Enforce” even those controls have moved away from packages onto individual scanners.
While editing a Campaign, if you leave without saving, changes to custom intents are saved anyway.
Double-clicking on the Create Campaign button will create two identical campaigns. Some might consider this a feature, but we’ll be fixing it soon.
Users creating a custom provider will encounter an error when they try to add a logo to that provider.
The Campaign description has a character limit, but it’s not enforced on save.
App projects are missing the API key icon in the tile UI.
When creating a custom regex scanner, especially when pasting in code, invisible spaces at the end of the string are saved instead of trimmed automatically. This can make it hard to debug malfunctioning regex.