At Halodoc, Slack serves as our essential communication and collaboration platform, facilitating smooth teamwork and effective information exchange. However, ensuring the security of sensitive data within Slack channels poses a significant challenge. The problem lies in effectively monitoring sensitive information shared in Slack channels to prevent unauthorised access, data breaches, and insider threats.
When Slack channels lack proper monitoring, they become vulnerable to various security risks that can potentially harm an organisation's data and reputation. One significant concern is the exposure and leakage of sensitive information. Users may unintentionally share confidential data or personally identifiable information (PII), leading to potential data breaches and privacy violations.
What is ‘Slack PII Monitoring’?
Slack PII Monitoring is a Python program specifically designed to interact with Slack bots, focusing on monitoring Slack's public channels while excluding private channels and direct messages. The program's primary objective is to uphold user privacy by identifying instances of inadvertent exposure of Personally Identifiable Information (PII). Through meticulous examination of various elements within the Slack platform, including messages, code snippets, and shared files, this comprehensive tool ensures data security and user confidentiality.
By spotlighting the impact of the "Slack PII Monitoring" project, the blog aims to raise awareness about the risks associated with PII exposure and credential leaks. It showcases how this monitoring solution empowers organisations to protect sensitive information, strengthen data privacy, and create a more secure digital workspace for Slack users.
Implementation of ‘Slack PII Monitoring’
This flowchart illustrates the process of detecting PII in Slack messages using a program that utilises multiple PII keywords and detectors. The program begins by establishing communication with the Slack API, enabling it to access and process messages. Next, it retrieves messages containing the specified PII keywords, filtering relevant messages for further analysis.
For each retrieved message, the program employs detectors to identify instances of PII. If a detector confirms the presence of PII, the program triggers an alert, sending a message to a designated Slack channel informing the incident response team of the detected PII information. If no PII is detected, the program simply proceeds to the next message.
Establishing a connection to the Slack API
To establish a connection with Slack using Python and OAuth 2.0, the first step is to create a Slack app and define the required scopes. Then, set up OAuth 2.0 for the app. Next, obtain the Slack bot token from the Slack app dashboard. In the Python project, install the necessary library, which is slack_sdk, using the package manager of your choice. Once the library is installed, create a client object as demonstrated in the code snippet.
Integrating Slack App with Slack channels
To read messages in Slack public channels, it is necessary to install the Slack app in the channels that you want to scan. To streamline this process across the entire workspace, we have developed a method that retrieves a list of all public channels created within the last 24 hours. This method automatically adds the Slack App to these channels, allowing seamless access to messages within the designated channels.
Retrieving messages containing PII keywords
In order to retrieve messages containing Personally Identifiable Information (PII), we have identified specific PII keywords. Utilising the search messages method provided by the Slack API, we conducted a recursive search for each keyword to retrieve messages sent in Slack public channels within the last 24 hours.
- List of PII keywords used to retrieve messages: We have included English and Indonesian PII keywords by observing previously shared data in the Slack workspace.
- Extracting Messages from Channels: To extract messages from Slack public channels, we utilise the search_messages() method by providing a keyword as the query value.
Example: When querying the Slack API with the query as "DateOfBirth," the system intelligently retrieves messages containing related keywords such as "DateOfBirth," "Date of Birth," "Date-of-Birth," and "Date_of_Birth."
Detecting Personally Identifiable Information in Messages
Detecting PII in messages occurs through two levels: the first is keyword-based detection, and the second is regular expression-based detection.
Level - 1: Keyword-based PII detection
In keyword-based detection, the algorithm creates a list of all the personally identifiable information (PII) keywords present in the message. For every detected keyword, a weightage of one is assigned.
Level - 2: Regular Expression-Based PII Detection
We have categorised PII keywords into two types:
Unstructured PII: These are keywords for which regex cannot be applied to identify whether the message contains actual PII values. Examples include names and addresses.
Structured PII: These are PII categories for which regex can be applied to identify whether the message contains PII values. Examples include phone numbers, email addresses, and dates of birth.
At this stage, we will pass the message and the list of PII keywords detected from level 1. We will focus on structured PII keywords and check if the regex pattern corresponding to each keyword is present in the message. If a regex pattern is found in the message, we will increase the weightage of that specific keyword to two. For instance, if 'phone number' is a PII keyword in the list for a message, and we discover the corresponding regex pattern in the message, the weightage assigned to 'phone number' will be increased to two.
In determining the sensitivity of a message, we rely on a list of keywords and their respective weights. If all structured keywords have a weightage of one and the total weightage of all the structured and unstructured keywords is less than four, the message is classified as non-sensitive. However, if any single structured keyword has a weightage of two and the total weightage of all keywords is greater than or equal to four, the message is considered as sensitive. Additionally, if there are two structured keywords with a weightage of two, the message is also treated as sensitive. This approach helps us efficiently assess the sensitivity of the messages we handle.
Sending a Slack notification to the security team
If a message is identified as sensitive, the program will send a Slack notification to the incident response team, providing details about the user, channel, and the message itself. Sending a Slack message is accomplished using the chat_postMessage() method.
A typical Slack alert will appear like this,
Incident Handling Procedures for PII Detection
In response to a detected Personally Identifiable Information (PII) alert, our system provides four distinct response buttons:
Critical Button: Triggered when the message contains more than five hundred real-world PII records. Sends a warning notification to the user, updates the database as a critical message, and deletes the message.
Sensitive Button: Activated when the message contains fewer than five hundred real-world PII records. Sends a warning notification to the user, and updates the database as a sensitive message.
Non-sensitive Button: Utilised when the message includes any number of PII records, but the values are dummy or staging data. Updates the database as a non-sensitive message.
False Positive Button: Employed when the message lacks any PII values. Updates the database with the message labelled as a false positive. False positive alerts are analysed to enhance the detection algorithm.
Upon receiving a PII detection alert, the incident response team will do the manual triaging to assess the alert and determine whether it falls into the categories of critical, sensitive, non-sensitive, or false positive. This approach will ensure a thorough analysis of each alert and contributes to the continuous improvement of our detection algorithm.
In leveraging the data stored within our database, we classify information as critical, sensitive, non-sensitive, or false positive. This classification serves as a robust framework to monitor user activity, refine our algorithms, and generate insightful monthly statistics. The strategic utilisation of this data not only enhances our understanding of user behaviour but also contributes to the continuous improvement of our systems.
In summary, the "Slack PII Monitoring" initiative at Halodoc represents a proactive approach to enhance data security on Slack channels. The Python program, through meticulous monitoring and response mechanisms, effectively detects and addresses instances of Personally Identifiable Information (PII) exposure. The incident handling procedures, marked by distinct response buttons, ensure swift and accurate categorization, contributing to continuous improvement. This innovative solution not only safeguards sensitive information but also underscores Halodoc's commitment to creating a secure digital workspace for its users.
Scalability, reliability, and maintainability are the three pillars that govern what we build at Halodoc Tech. We are actively looking for engineers at all levels and if solving hard problems with challenging requirements is your forte, please reach out to us with your resumé at email@example.com
Halodoc is the number 1 all around Healthcare application in Indonesia. Our mission is to simplify and bring quality healthcare across Indonesia, from Sabang to Merauke. We connect 20,000+ doctors with patients in need through our Tele-consultation service. We partner with 3500+ pharmacies in 100+ cities to bring medicine to your doorstep. We've also partnered with Indonesia's largest lab provider to provide lab home services, and to top it off we have recently launched a premium appointment service that partners with 500+ hospitals that allow patients to book a doctor appointment inside our application. We are extremely fortunate to be trusted by our investors, such as the Bill & Melinda Gates Foundation, Singtel, UOB Ventures, Allianz, GoJek, Astra, Temasek, and many more. We recently closed our Series D round and in total have raised around USD$100+ million for our mission. Our team works tirelessly to make sure that we create the best healthcare solution personalised for all of our patient's needs, and are continuously on a path to simplify healthcare for Indonesia.