Data Governance & Protection
The CalypsoAI platform is designed to secure bi-directional traffic between applications or users and generative AI (GenAI) models. It can also be used as an asynchronous API call for scanning content without involving a model.
Scanners on the CalypsoAI platform perform bi-directional scanning of both content and data: Outbound prompts are scanned, as are the LLM provider’s returned responses. The scanners can also be deployed inline or out of band.
Each scanner is built to detect specific types of content or data for specific purposes, as shown below. It’s important to understand the use of Large Language Models (LLMs) and GenAI and how they comprise a new, continually changing threat vector. CalypsoAI is committed to ensuring that our scanners are consistently updated to provide the most comprehensive protection.
CalypsoAI Out-of-the-Box Scanners
Our Out-of-the-box Scanners include “packages” of multilayered, GenAI-powered scanners that search in CalypsoAI’s GenAI Scanning Engine for very specific, related content, as well as individual scanners that focus on a single element or topic.
When activating CalypsoAI, these Out-of-the-box Scanners are automatically on and provide immediate protection that ensures approximately 80% security across standard needs. These scanners can be turned off/on at any time according to the user’s preference.
Scanner Packages
We offer two scanner packages: The Prompt Injection Package and the PII Package, as shown below. Each scanner in a package searches for very precise content.
The scanners in the Prompt Injection Package filter for indirect prompt Injection attempts, adversarial attacks, and jailbreak attempts.
The scanners in the PII Package filter for a variety of information described as personally identifiable information (PII), including credit card numbers, birth dates, driver license numbers, email addresses, internet provider (IP) addresses, passport numbers, telephone numbers, and Social Security Numbers.
Including street addresses is an option.
Combining these related scanners into suites enables them to deliver a comprehensive set of nuanced results. These scanners can be set to either block or audit content in prompts and responses.
Block mode prevents the scanned and flagged content from being sent to external models.
Audit mode reviews and notes the content, and sends it to the external models.
Individual Scanners
Our individual content scanners detect and block sensitive information and malicious content in prompts, and protect against malicious content in responses. Each scanner addresses a specific type of content, as shown below and detailed in the following subsections.
Blocked Term Policy
The Blocked Term Policy Scanner allows organizations to establish and manage a customized, editable list of terms (banned words, sentences, and Regular Expressions [Regex]) to be blocked when included in a prompt. This feature will scan user prompts submitted to LLMs, detect any content from the banned list, and block the prompts accordingly to ensure compliance with the organization’s policies and prevent the generation of undesirable content.
DLP: Personally Identifiable Information (PII) Policy
The DLP: PII Policy Scanner allows thresholds to be set for identifying and acting on PII included in a prompt. The PII included by default includes names, phone numbers (U.S., UK, Canada), email addresses, Social Security Numbers (U.S.), and credit card information. Including physical/geographical addresses (U.S., UK, Canada) is optional.
The scanner inspects all prompts for potential PII content. The scanner can be configured to block or audit the prompt when PII is detected. In the event of a block action, the prompt is prohibited from being sent to the LLM's API. Simultaneously, an alert will be sent to the user, informing them of the potential violation. The event is logged for later review or audit purposes.
The DLP: PII Policy Scanner is composed of two detectors: a Regex-based detector and a custom Named Entity Recognition (NER) model.
The Regex-based detector can be considered a rule-based model and is used to detect entities with highly specific patterns, such as email addresses and credit card numbers. When a Regex pattern is identified, the detector searches for additional words that provide entity-specific context and assigns a confidence score based on a range of factors, such as the pattern’s uniqueness or its likelihood of occurring in a specific context.
The custom NER model detects PII types with diverse formats not consistently captured using Regex patterns. Our scanner uses the custom NER model to detect street addresses, but the model could be expanded to detect a wide range of entities.
DLP: Source Code Policy
The DLP: Source Code Policy Scanner allows overall sensitivity thresholds to be set for content and allows users to identify the source code languages and import statements to be blocked from inclusion in prompts. The scanner is designed to detect and address source code included in user prompts to or responses from an LLM. This feature of the CalypsoAI platform has a dual purpose:
Protecting proprietary code: The scanner identifies instances of proprietary code included in the prompts or responses. It prevents such code from leaving or entering the organization’s controlled environment and provides insight into the code specifications, such as the language used and the type of libraries included.
Safeguarding against malicious code: The scanner detects potentially harmful code elements, such as rogue import statements, outdated libraries, and other potential Common Vulnerabilities and Exposures (CVEs) that might be returned by the LLM.
Legal Content Detection
The Legal Content Detection Scanner allows thresholds to be set for identifying and acting on a customizable array of sensitive legal information and intellectual property (IP) that must remain within the boundaries of the organization. Unauthorized access or exposure to such sensitive content, especially to external parties like LLM providers, could result in serious privacy breaches, legal disputes, or loss of competitive advantage. The scanner weighs component words and their frequencies, meaning it can detect snippets of legal code if, put together, would constitute a large part of the overall prompt. Snippets of legal code that constitute a small part of a large prompt will not be detected. The scanner supports the following content classes:
Contracts
Legal Law
Terms of Service and End-User License Agreement (EULA)
Non-Disclosure Agreements
The flagging mechanism adds the probabilities for all flagging classes and compares this sum against the threshold. Three sensitivity levels—Low, Medium, and High—correspond to how sensitive the scanner is to the class of data it's trying to detect, resulting in a high threshold for the Low sensitivity mode, and respectively lower thresholds for the Medium and High settings.
Malware: Source Code Policy
The Malware: Source Code Policy Scanner allows overall sensitivity thresholds to be set for content and allows users to identify the source code languages and import statements to be blocked from inclusion in prompts to or responses from an LLM.
Prompt Injection Policy
The Prompt Injection Policy Scanner allows thresholds to be set for identifying and acting on language in a prompt sent by malicious users attempting to exploit the system. The malicious actors place harmful content or code in the prompt with the goal of bypassing internal safety controls, which poses a security risk and can compromise the integrity and functionality of the LLM. This scanner uses the latest Natural Language Processing (NLP) technologies to scan prompts and detect patterns indicative of prompt injections. This scanner identifies and blocks such attempts, thereby maintaining the security and integrity of the LLM. The Prompt Injection Policy Scanner is composed of two detectors: a Regex-based detector and a Transformer-based Classification model.
The Regex-based detector identifies prompt-injection style exploits.
The Transformer-based Classification model returns a classification of a prompt injection or benign prompt and raises a flag on detection from Regex or classification from the model. It backs up the Regex, if needed.
Secret Detection Policy
The Secret Detection Scanner mitigates the risk of deliberate leaking or inadvertent sharing of confidential company information typically known as “secrets.” It does so by scanning prompts for specific content before the prompts are sent to an LLM. The Secret Detection Policy Scanner detects 158 known patterns, for example, API keys, passkeys, and other information typically restricted to internal, authorized company use. The list of secrets detected, shown below, can not be edited by users; however, the scanner response can be customized.
Toxicity Policy
The Toxicity Policy Scanner allows thresholds to be set for identifying and acting on a default set of content types included in a prompt. Toxic content in user prompts can create a negative user experience or even cause harm to users, and risks violating organizational policies. The categories of toxic content are:
Toxicity: Negative or harmful content displayed, including hate speech, cyberbullying, or harassment
Severe Toxicity: A level of harmful behavior in online comments that poses a significant threat to an individual's well-being or safety
Obscene: Use of vulgar or offensive language
Sexually Explicit: Depiction or description of sexual activity or content in a detailed or graphic manner
Insult: A disrespectful or offensive remark toward an individual or group
Threat: An expression of intent to harm or cause damage to someone
Identity Attack: An attempt to harm someone's reputation or sense of self by attacking their personal or professional identity
Audit Scanners
Audit Terms
The Audit Terms Scanner allows organizations to establish and manage a customized, editable list of terms (banned words, sentences, and Regular Expressions [Regex]) to be audited when included in a prompt. This feature will scan user prompts submitted to LLMs, detect any content from the list, and flag the prompts accordingly to ensure compliance with the organization’s policies.
Demographic Auditing (Not Customizable)
Demographic Auditing Scanner analyzes user prompts and tracks the inclusion of terms or topics that identify demographic characteristics or are typically associated with bias or stereotypes. This scanner is not customizable and audits, but does not block, content. This feature detects various types of content, including, but not limited to, terms related to cultures, ethnicity, gender, sex, sexual orientation, age and generation, disability, politics, nationality, and religion that violate an organization’s policy. This feature enables organizations to better understand, moderate, and cater to preferences when using LLMs, ultimately enhancing the user experience and ensuring compliance.
Name Entity Auditing
The Name Entity Auditing Scanner analyzes user prompts and tracks the inclusion of names or other terms associated with specific entities (persons, organizations, locations, etc.). The scanner detects users’ interests, but can also detect private or confidential information that could reveal details about the organization’s relationship with the named entity, in violation of organizational policy. This scanner audits, but does not block, content. This scanner enables organizations to better understand, moderate, and cater to preferences when using LLMs, ultimately enhancing the user experience and ensuring compliance.
Sentiment Recording
Sentiment Recording Scanner, where allowed, analyzes user prompts and tracks the emotional polarity and sentiment of prompt content (e.g., whether the tone of the prompt is positive, negative, sad, etc.). The scanner can derive a user’s current mood and identify content that could indicate potential workplace issues. This scanner enables organizations to better understand, moderate, and cater to preferences when using LLMs, ultimately enhancing the user experience. This scanner audits, but does not block, content.
Topic Auditing
The Topic Auditing Scanner tracks the inclusion in prompts of terms or topics not typically associated with business functions. The scanner categorizes user prompts and LLM responses into distinct topics to allow two functionalities:
User Insight Generation: Categorizing prompts into distinct topics can provide valuable insights into how users interact with the LLM, what topics are most popular, and how these trends change over time. This data can be used to optimize the LLM to better serve the organization’s needs and to provide a more tailored user experience.
Topic Moderation: The scanner can be configured to allow or disallow identified topics based on the organization’s specific needs or requirements. This helps maintain the appropriateness of LLM interactions within the organization’s controlled environment.
This scanner audits, but does not block, content. It can provide reports and analytics based on the topics identified.
Custom Scanners
Custom Scanners are created by the customer and tailored to specific business needs, use cases, or time-limited situations. In creating their custom scanners, companies can dial up their security from the initial 80% achieved with the Out-of-the-Box Scanners to achieve optimal control. Each scanner can be created, published, activated, and then used for as long as necessary. For example, a scanner created to detect the name of a confidential project, competitor, or acquisition can be activated and remain in use until the information is no longer considered confidential, at which time the scanner can be deleted or unpublished. Alternatively, a scanner created to detect specific company terminology can be activated and remain activated for the system’s lifespan.
Scanners are created and tested in the Playground before being activated to ensure they are fit for purpose and successfully detect the specified content outlined. Custom Scanners can be duplicated, edited, or deleted at any time. When the edit feature is enables, Custom Scanners are automatically deactivated, unpublished, and moved to the Playground.
Using the CalypsoAI Playground
The Playground is a secure space for developing and experimenting with custom scanners. The user creating the scanner names it and describes it according to the specific prompt content it is intended to block or flag.
The user creating the scanner must test it against a model to ensure it detects the identified content according to the user’s intent, and to ensure the scanner is fit for purpose. When the scanner blocks a prompt or response while in the Playground, a message is displayed. The user creating the scanner can customize this message.
Publishing a Custom Scanner
After thorough testing, the user can publish the Custom Scanner, moving it from the Playground to the production environment. During the deployment phase, the user configures the scanner action to Block or Audit, customizes the scanner response message, and then activates the scanner, enabling identified users (individuals, named groups, or the entire organization) to use it. Active Custom Scanners can be edited after unpublishing them, which returns them to the Playground.
Summary of Methods