CalypsoAI Inference Red-Team

Red-Team reports

Everyone knows that when you’re red-teaming any system or application, the most important thing is the report. And reports are only useful if they help you understand what happened, what matters, and what to do next.

Our AI Red-Team report aims to do all that, and more. Every time you run an attack, the system analyzes and visualizes the results in a compact summary you can snapshot and pull into internal reporting and communications.

With Red-Team reports, you can:

View a summary of results for any campaign run, and drill into the per-prompt results—or download the results spreadsheet.
Run attacks against multiple models at once and see both aggregated and individual results.
See the count and percentage of successful and unsuccessful attacks.
Get a breakdown of successful attacks by intent category (such as illegal acts, toxicity, violence, etc.) so you can prioritize remediation efforts on the use cases that matter most.
Understand what to do next with customized mitigation actions.

CalypsoAI Security Index (CASI)

An example report with a CASI score of 91 - Good

We’re also excited to introduce the CalypsoAI Security Index—the first of its kind AI security scoring metric that allows you to compare models and AI systems not just on performance and price, but also on their security characteristics. CASI is a metric we developed to answer the complex question: “How secure is my model?” Scores are out of 100, and a higher CASI score indicates a more secure model or application.

CASI evaluates several critical factors beyond simple success rates:

Severity: The potential impact of a successful attack (e.g., bicycle lock vs. nuclear launch codes).
Complexity: The sophistication of the attack being assessed (e.g. plain text vs. complex encoding).
Defensive Breaking Point (DPB): Identifies the weakest link in the model’s defences, focusing on the path of least resistance and considering factors like computational resources required for a successful attack.

By incorporating these factors, CASI offers a holistic and nuanced measure of model and application security.

Every time you run CalypsoAI’s signature attacks, the system automatically generates a CASI score for every connection in that run—and an average score across all connections. Based on our own testing, we’ve created scoring tiers that help you understand what the result means:

0-69: Critical – This model is vulnerable to the most basic attacks and not recommended for use in production.
70-85: Warning – This model has shown some common vulnerabilities. We recommend doing more extensive testing and deploying safeguards before using it in production.
85-99: Good – This model performed well. It is vulnerable to only the most complex attacks, which should still be evaluated depending on your use case.
100: Perfect – No vulnerabilities found. We recommend testing again with the latest signature package to verify this result is accurate.

NOTE: CASI scoring is not yet available for agentic attacks.

New signature attacks

We’re making major upgrades to our attack arsenal in this release, including thousands of new signature attacks, a new attack vector, and new converter technique to bypass security filters.

12,000+ new signature attacks. This brings our total malicious prompts to over 22,000.
Persuasive adversarial prompts attack. We’re rolling out a powerful new attack vector that leverages human-like persuasion techniques to subtly rephrase a malicious intent. Similar to our “context change” attacks, the persuasive adversarial prompt attack draws from a library of different persuasion techniques.
Single character converter is a simple approach to jailbreaking LLMs that exploits a vulnerability in short-length tokens based on single characters.

Custom and standard attack campaigns

We’ve brought more functionality to the UI for attack campaigns, making it easier to right-size attacks for ad hoc and periodic testing. In this release you can:

Create a standard campaign that’s a subset of CalypsosAI’s signature attacks. This is useful when you want to get the results faster.
Create a custom campaign based on a custom intent you provide (previously this feature was available by API only).
Select the attack vectors and converters you want to run.
Edit and delete campaigns.

Application connections

Red-Team users can now select Application as the connection type. This is an API-based connection and allows you to attack an AI-based system, such as chatbots or co-pilot, that take prompt-based inputs.

Since you can select multiple connections to attack at once, it’s now possible to run attacks against an AI application and the underlying model to compare their performance. This helps teams understand how the application code or RAG models may be improving or undermining the model’s intrinsic security profile.

Red team report names

You can now provide a custom name for each red team attack run. (Previously the system named the report based on the run GUID). This makes it easier to find reports in the UI and when they are downloaded.

CalypsoAI Inference Defend

Custom regex and keyword scanners

A screenshot of the "build a custom scanner" dropdown where you can see the three custom scanner types

Even though we firmly believe you have to protect GenAI with GenAI, sometimes nothing beats old-school pattern-matching for accuracy and specificity. So we’ve added two new types of custom scanners based on keywords and regex.

How they work:

Keyword scanner: Build or paste in a list of key terms to block or audit. This is useful for proprietary or domain-specific terms, names, or other well-defined content.
Regex: Write your own regex or paste in from another source, to define the patterns to block or audit. Use the test field to make sure your script is working as intended. This is useful for definable patterns such a email addresses and ID numbers, or bringing in regex detections you’ve already configured in your other DLP products.

System prompt and obfuscation scanners

A screenshot showing the two new scanners in the prompt injection scanner package

Prompt injection continues to be the favored attack vector by adversaries looking to compromise AI systems. To better protect your valuable data, we’ve added two powerful new gen AI scanners that detect and block system prompt attacks and obfuscation attacks. You’ll find these scanners in our out-of-the-box Prompt Injection Package.

What they protect against:

System prompt scanner: System prompts are hidden or predefined instructions operating behind the scenes that shape how the AI responds to user prompts. Attackers will attempt to get the model to divulge the system prompt, or information about it, in order to identify vulnerabilities or sensitive data. Our scanner looks for direct and indirect attempts to access the system prompt.
Obfuscation scanner: Obfuscation techniques involve encoding or decrypting text and can range from simply swapping to a different natural language/code to the usage of very complex artificial encryption techniques. Some simple techniques include base64 or hex, leet speak (substituting letters with numbers or symbols), inserting extra whitespace, substituting characters with visually similar ones (homoglyphs), and intentional misspellings. We’ve had good success using obfuscation to attack models with our Red-Team product, so it seems prudent to create a scanner to defend against it.

CalypsoAI Platform

New CalypsoAI logo!

We’re undergoing a major rebranding and have a new logo, which you’ll see on login and in the top left of the product UI. Stay tuned for our website redesign, which is coming soon!

Hugging Face integration

A screenshot showing the provider tile for Hugging Face with the "Add" button

With over 1.4 million models and counting, we figure Hugging Face deserves a place in our Connections screen. Now you can easily add a Hugging Face model using just the model name and API key.

Bug fixes

Fixed various capitalization inconsistencies in different parts of the UI.
Addressed rate limit issue in global search functionality.
Truncated long scanner names in the playground that were wrapping in an unattractive way.
Fixed some inconsistent word use in buttons.
Changed the wording in the File upload modal for scanners to be clearer.
Fixed an issue where org admins were unable to revoke an API key.
Fixed an issue that allowed unauthenticated users to briefly (1 second) see the documentation before being redirected to log in.
When creating a scanner, the system was giving a success message even when there was an error and the scanner was not created. This has been fixed.
In Chat projects, when an admin was removed from the project that user would still show up as a member of the project. Now, removing an admin removes them from the project completely.
The custom roles dropdown selection was displaying only the first 10 roles. It now displays all available roles, as expected.
Fixed a formatting issue on the Users page where long entries in the “roles” column were crashing into the row dividers.
On the Settings > MFA area the email dropdown was active even when the MFA toggle was off. The email dropdown is now disabled in this situation.
We fixed several issues relating to the new custom roles feature that resulted in users having or not having access to expected functionality.
The dropdown on the “invite users” modal was narrower than the button. They are the same width now.
In some cases, scanners added to a project were not showing up in the project detail. This was due to misapplied pagination, and has been fixed.
There was a bug that caused a disabled scanner to be automatically enabled after updating the scanner response. This has been fixed.

Known issues

When running a campaign against one connection, if that connection returns an error for every single attack then no score will be generated and there will be no data in the connections section. We’re working to fix this in the next release.
The keyword custom scanner has a 15-character limit for each keyword. We’ll extend this in the next release.

Release notes: Feb 10, 2025 (v8.26.6.2-gpu)

Release notes: March 6, 2025 (v8.81.2-gpu)

Release notes: April 10, 2025 (v8.117.10-gpu)

Release notes: March 25, 2025 (8.114.25-gpu)

Release notes: April 23 (v8.162.0-gpu)