Individual Scanners
Our individual content scanners detect and block sensitive information and malicious content in prompts, and protect against sensitive or undesirable content in responses. Each scanner addresses a specific type of content, as shown below and detailed in the following subsections.
Policy Scanners
Status
Our policy scanners can be set to block or audit the prompts and/or responses.
Block: The prompt is prohibited from being sent to the LLM's API and the user receives a message explaining that the prompt content violates the policy. The event is logged for later review or audit purposes.
Audit: The prompt is reviewed and, if content in violation of the policy is present, the event is logged for later review or audit purposes. The prompt is allowed to be sent to the LLM's API. The user is not notified of any violation.
Sensitivity
Some scanners enable admins to set a sensitivity threshold. The flagging mechanism adds the probabilities for all flagging classes and compares this sum against the threshold. Three sensitivity levels—Low, Medium, and High—correspond to how sensitive the scanner is to the class of data it's trying to detect, resulting in a high threshold for the Low sensitivity mode, and respectively lower thresholds for the Medium and High settings.
Active
A scanner can be activated or deactivated by an admin with the appropriate permissions. When activated at the global level, the scanner will run on every interaction with every model across all groups and projects. Shift the toggle to the right to activate and to the left to deactivate.
Action
Clicking the three dots icon on the line for a scanner displays the options available to the admin(s).
The option to edit the scanner response is the same for each scanner. The Scanner Response is the message a user sees when their prompt is blocked. The admin can use the default message for the scanner, shown below, or change the message to be more specific.
The advanced configuration options for each scanner differ and will be presented in the detailed explanations below.
Blocked Term Policy
The Blocked Term Policy Scanner allows organizations to establish and manage a customized, editable list of banned words, sentences, and regular expressions (regex). It scans user prompts submitted to LLMs, detects banned content, and blocks non-compliant prompts to uphold organizational policies and prevent undesirable interactions.
Click the three dots icon and then Advanced Configuration to edit the list. Add or delete terms in the text box, then click Save.
DLP: Personally Identifiable Information (PII) Policy
The DLP: PII Policy Scanner allows admins to set thresholds for identifying and acting on PII included in a prompt. The default PII the scanner acts on includes names, birth dates, phone numbers (U.S., UK, Canada), email addresses, Social Security Numbers (U.S.), and credit card information. Including physical/geographical addresses (U.S., UK, Canada) is optional.
The scanner inspects all prompts for potential PII content. The scanner can be configured to block or audit the prompt when PII is detected. The admin can set the scanner sensitivity to Standard or Sensitive, according to the organization’s preferred level of scrutiny.
Click the three dots icon and then Advanced Configuration. Select the sensitivity level and indicate whether addresses should be included in the scan, then click Save.
This scanner includes two detectors: a regex-based detector and a custom Named Entity Recognition (NER) model.
The regex-based detector can be considered a rule-based model and is used to detect entities with highly specific patterns, such as email addresses and credit card numbers. When a regex pattern is identified, the detector searches for additional words that provide entity-specific context and assigns a confidence score based on a range of factors, such as the pattern’s uniqueness or its likelihood of occurring in a specific context.
The custom NER model detects PII types with diverse formats not consistently captured using regex patterns. Our scanner uses the custom NER model to detect street addresses, but the model could be expanded to detect a wide range of entities.
DLP: Source Code Policy
The DLP: Source Code Policy Scanner allows overall sensitivity thresholds to be set for content and allows users to identify the source code languages and import statements to be blocked from inclusion in prompts. The scanner is designed to detect and address source code included in user prompts to or responses from an LLM.
Click the three dots icon and then Advanced Configuration. Select the sensitivity level and select the languages to be included in the scan. Indicate whether import statements should be included in the scan, then click Save.
This policy scanner serve dual purposes:
Protect proprietary code: The scanner identifies instances of proprietary code included in the prompts or responses. It prevents such code from leaving or entering the organization’s controlled environment and provides insight into the code specifications, such as the language used and the type of libraries included.
Safeguard against malicious code: The scanner detects potentially harmful code elements, such as rogue import statements, outdated libraries, and other potential Common Vulnerabilities and Exposures (CVEs) that might be returned by the LLM.
Legal Content Detection
The Legal Content Detection Scanner allows thresholds to be set for identifying and acting on a customizable array of sensitive legal information and intellectual property (IP) that must remain within the boundaries of the organization. Unauthorized access or exposure to such sensitive content, especially to external parties like LLM providers, could result in serious privacy breaches, legal disputes, or loss of competitive advantage. The scanner weighs component words and their frequencies, meaning it can detect snippets of legal code that, if put together, would constitute a large part of the overall prompt. Snippets of legal code that constitute a small part of a large prompt will not be detected. The scanner supports the following content classes:
Contracts
Legal Law
Terms of Service and End-User License Agreement (EULA)
Non-Disclosure Agreements
Click the three dots icon and then Advanced Configuration. Select the sensitivity level and select the types of documentation to be included in the scan, then click Save.
Malware: Source Code Policy
The Malware: Source Code Policy Scanner allows overall sensitivity thresholds to be set for content and allows users to identify the source code languages and import statements to be blocked from inclusion in prompts to or responses from an LLM.
Click the three dots icon and then Advanced Configuration. Select the sensitivity level and select the languages to be included in the scan. Indicate whether import statements should be included in the scan, then click Save.
Prompt Injection Policy
The Prompt Injection Policy Scanner allows sensitivity thresholds to be set for identifying and acting on language in a prompt sent by malicious users attempting to exploit the system. The malicious actors place harmful content or code in the prompt with the goal of bypassing internal safety controls. This poses a security risk and can compromise the integrity and functionality of the LLM. This scanner uses the latest Natural Language Processing (NLP) technologies to scan prompts and detect patterns indicative of prompt injections. This scanner identifies and blocks such attempts, thereby maintaining the security and integrity of the LLM.
Click the three dots icon and then Advanced Configuration. Select the sensitivity level, then click Save.
The Prompt Injection Policy Scanner is composed of two detectors: a regex-based detector and a transformer-based classification model.
The regex-based detector identifies prompt-injection style exploits.
The transformer-based classification model returns a classification of a prompt injection or benign prompt and raises a flag on detection from regex or classification from the model. It backs up the regex, if needed.
Secret Detection Policy
The Secret Detection Scanner mitigates the risk of deliberate leaking or inadvertent sharing of confidential company information typically known as “secrets.” It does so by scanning prompts for specific content before the prompts are sent to an LLM. The Secret Detection Policy Scanner detects 158 known patterns, for example, API keys, passkeys, and other information typically restricted to internal, authorized company use. The list of secrets detected, shown below, can not be edited by users; however, the scanner response can be customized.
Toxicity Policy
The Toxicity Policy Scanner allows thresholds to be set for identifying and acting on a default set of content types included in a prompt. Toxic content in user prompts can create a negative user experience or even cause harm to users, and risks violating organizational policies. The categories of toxic content are:
Toxicity: Negative or harmful content displayed, including hate speech, cyberbullying, or harassment
Severe Toxicity: A level of harmful behavior in online comments that poses a significant threat to an individual's well-being or safety
Obscene: Use of vulgar or offensive language
Sexually Explicit: Depiction or description of sexual activity or content in a detailed or graphic manner
Insult: A disrespectful or offensive remark toward an individual or group
Threat: An expression of intent to harm or cause damage to someone
Identity Attack: An attempt to harm someone's reputation or sense of self by attacking their personal or professional identity
Click the three dots icon and then Advanced Configuration. Configure the sensitivity threshold for each category, then click Save.
Audit Scanners
Each Audit Scanner detects specific content in prompts, as shown below and detailed in the following subsections. The events are flagged for later review or audit purposes, but the prompts are not blocked. The user is not notified of a violation.
Audit Terms
The Audit Terms Scanner allows admins to establish and manage a customized, editable list of banned words, sentences, and regex to be audited when included in a prompt. This feature will scan user prompts submitted to LLMs, detect any content from the list, and flag the prompts accordingly to ensure compliance with the organization’s policies.
Click the three dots icon and then Advanced Configuration to edit the list. Add or delete terms in the text box, then click Save.
Demographic Auditing
The Demographic Auditing Scanner analyzes prompts and tracks the inclusion of terms or topics that identify demographic characteristics or are typically associated with bias or stereotypes. This feature detects various types of content, including, but not limited to, terms related to cultures, ethnicity, gender, sex, sexual orientation, age and generation, disability, politics, nationality, and religion that violate an organization’s policy. This feature enables organizations to better understand, moderate, and cater to preferences when using LLMs, ultimately enhancing the user experience and ensuring compliance.
Name Entity Auditing
The Name Entity Auditing Scanner analyzes prompts and tracks the inclusion of names or other terms associated with specific entities, such as persons, organizations, locations, etc. The scanner detects users’ interests, but can also detect private or confidential information that could reveal details about the organization’s relationship with the named entity, in violation of organizational policy. This scanner enables organizations to better understand, moderate, and cater to preferences when using LLMs, ultimately enhancing the user experience and ensuring compliance.
Sentiment Recording
The Sentiment Recording Scanner, where allowed, analyzes prompts and tracks the emotional polarity and sentiment of prompt content (e.g., whether the tone of the prompt is positive, negative, sad, etc.). The scanner can derive a user’s current mood and identify content that could indicate potential workplace issues. This scanner enables organizations to better understand, moderate, and cater to preferences when using LLMs, ultimately enhancing the user experience.
Topic Auditing
The Topic Auditing Scanner tracks the inclusion in prompts of terms or topics not typically associated with business functions. The topics include: arts & culture, business & entrepreneurs, celebrity & pop-culture, diaries & diary life, family, fashion & style, film, TV & radio, food & dining, gaming, learning & educational, music, news & social concern, other hobbies. relationships, science & technology, travel & adventure, youth & student life.
The scanner categorizes prompts and LLM responses into distinct topics to allow two functionalities:
User Insight Generation: Categorizing prompts into distinct topics can provide valuable insights into how users interact with the LLM, what topics are most popular, and how these trends change over time. This data can be used to optimize the LLM to better serve the organization’s needs and to provide a more tailored user experience.
Topic Moderation: This helps maintain the appropriateness of LLM interactions within the organization’s controlled environment.
This scanner can provide reports and analytics based on the topics identified.