
When in June 2022, an employee of BIPROGY fell asleep on the street of Osaka and lost COVID-19 tax relief data stored on a USB memory stick, many of us laughed, but some shivered in terror. What may seem funny to many was terrifying to owners and managers of sensitive data because they realized that it may have been their data that got lost. And they knew the consequences of such loss.
Data loss has been with us for a long time, even before the advent of computers. Blueprints, recipes, documents containing sensitive information – all of them could be misplaced or stolen. But nobody doubts the fact that the more digital our society became, the greater the chance that something that should be kept confidential may be lost. And while it’s just a copy, not the original, like in the case of physical information storage, the consequences may be even worse.
Faced with this grim reality, the market needed a solution that would complement other cybersecurity measures and would focus on preventing data loss specifically. Enter DLP – data loss prevention solutions.
What is DLP and How Does DLP Work?
According to Wikipedia, “data loss prevention (DLP) software detects potential data breaches/data exfiltration transmissions and prevents them by monitoring, detecting, and blocking sensitive data while in use, in motion, and at rest.” Gartner provides a similar definition: data loss protection (DLP) describes a set of technologies and inspection techniques used to classify information content contained within an object – such as a file, email, packet, application, or data store – while at rest (in storage), in use (during an operation) or in transit (across a network). DLP tools also have the ability to dynamically apply a policy — such as log, report, classify, relocate, tag and encrypt – and/or apply enterprise data rights management protections.”
In simple terms, DLP software performs two very important functions:
- Helps you identify what data could be considered sensitive (detection)
- Prevents accidental or intentional loss of this data (monitoring/blocking)
Data loss prevention solutions may seem closely related to many other types of cybersecurity tools, and it’s not surprising. That is because other cybersecurity tools greatly help with data prevention, for example:
- Antivirus software helps you avoid malicious programs such as trojans that would let attackers access your sensitive data.
- Web application security software helps you detect and fix security vulnerabilities that could let malicious hackers access your sensitive information.
- Intrusion detection systems help you detect malicious activity in your networks and cut off attackers before they would reach confidential data.
Almost every cybersecurity solution helps in some way to keep your sensitive information secure. However, only DLP, whether as separate products or integrated with other tools, can actually help you automatically detect sensitive data, and it is the DLP technology that prevents data loss if other solutions fail or are helpless.
History and Evolution
While the topic of data security has been around for ages, at least since the early 1980s, the first concepts of specialized data loss prevention solutions go back only to the beginning of the current century. According to a SANS Institute paper from 2008, “The term DLP, which stands for data loss prevention, first hit the market in 2006 and gained some popularity in the early part of 2007.” However, this is only when the term was actually defined, and some of the functionality was available in other software even earlier.
One of the first information security solutions focused on data protection and considered a foundation stone of data encryption was Pretty Good Privacy (PGP), developed by Phil Zimmermann in 1991. One of the primary goals of this software, still available today along with other solutions based on the OpenPGP standard, was to increase the security of email messaging. By using asymmetric encryption, the author provided the means to ensure privacy and establish the authenticity of emails.
Specialized DLP solutions popping up around 2006/2007 were either immediately acquired by software giants at that time or quickly grew into huge, complex monoliths, attempting to cover every single aspect of DLP. At that time, machine learning was mostly a scientific concept and rarely used by commercial software. Therefore, sensitive data identification was based on huge sets of complex regular expressions, which were the source of many headaches for security teams. Also, from the point of view of technology, protecting data in motion via many different protocols and monitoring it both on servers and workstations requires a lot of different technologies and specializations.
The early 2010s were, therefore, the age of DLP monoliths. Big market players offering big products, often as part of big bundles. Big enterprises were buying these big products thanks to big marketing, and most of them had no idea how to use them effectively. Default settings would not always fit diverse environments, and there was just too much to take care of and configure. As a result, these products were often purchased and used, but a lot of sensitive data could slip through their fingers.
When the market realized that this is not the way, different cybersecurity software houses and new players took on one of two approaches:
- Some decided to include DLP in specialized cybersecurity products that cover just a narrow part of the spectrum. For example, an email security manufacturer could easily add some DLP to make sure that sensitive data does not leak through email. However, you could expect data identification to be rather inferior as that is not the specialization of an email-focused software house.
- Others decided to try to tackle just specific DLP technologies. For example, focus on endpoints and user activity instead of analyzing traffic going through the network. This way, specialized manufacturers could ensure that sensitive data identification would be top-notch and that this identified data would be monitored on workstations in order to prevent its transmission via the operating system clipboard or via USB.
Types of Sensitive Data
So, what data is DLP supposed to protect? That question has been a headache for many because the definition of sensitive data can be very different, depending on the use cases. For example, the source code of a commercial application written in C++ is highly sensitive and critical data, but another piece of code, also in C++, but belonging to an open-source application, is not sensitive at all. This is a cause of many false positives.
Luckily, there are certain types of sensitive data that are universal and often specified by national, international, or industry laws. The following are some examples:
- Personally identifiable information (PII) is defined in the United States as “Information that can be used to distinguish or trace an individual’s identity, such as name, social security number, biometric data records, either alone or when combined with other personal or identifying information that is linked or linkable to a specific individual (e.g., date and place of birth, mother’s maiden name, etc.).”
- Personal data (often referred to as personal information in the US) is a term that is broader than PII. Europe’s GDPR defines it as “any information relating to an identified or identifiable natural person; an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person.”
- Sensitive personal information (SPI) is a term used in the California Privacy Rights Act (CPRA). While the act does not exactly define this term, it applies to any kind of personal information collected by businesses and may extend to information such as, for example, IP addresses that would not be considered PII or PD.
- Nonpublic personal information (NPI) is a term coming from the Gramm-Leach-Bliley Act (GLBA), defined as follows: “Nonpublic personal information may include individual items of information as well as lists of information. For example, nonpublic personal information may include names, addresses, phone numbers, social security numbers, income, credit score, and information obtained through Internet collection devices (i.e., cookies).”
In addition to personal information as defined above, some businesses may need to protect specific types of intellectual property, which may be in the form of source code, formulas, diagrams, videos, and more. As you can see, DLP software does not have it easy because it must take into consideration different definitions and different scopes of data depending on the geographical location of the business (due to local laws) and the type of business as well as many other factors. This is why fixed rulesets don’t always work, and machine learning is very helpful to help identify sensitive data.
Data Loss, Data Leaks, Data Breaches: What’s the Difference?
DLP software focuses on preventing data loss. However, there are two other common cybersecurity terms that are used interchangeably: data leak and data breach. The terms are quite similar and have no formal definition. Here is how we understand the difference between these three terms.
Data loss may apply to one of two situations:
- When sensitive information is taken from your computer system and moved outside of your controlled environment, such as your corporate network. It simply means that someone shared or sent sensitive data outside of the systems controlled by your business. This doesn’t have to be due to a cyberattack, it may be a simple mistake or intentional action by an insider. It is sometimes also called data exposure.
- When sensitive data is lost at the source. For example, when your hard drive fails, there is no backup, and there is no way to retrieve this information. However, preventing such occurrences is not the job of DLP software.
Data leaks (data leakages) have several definitions:
- When sensitive information is available in your systems and accessed by an external malicious actor through hacking or vulnerabilities.
- When sensitive information reaches an external destination or recipient intentionally. For example, when someone in your organization sends your sensitive information to a competitor or when someone from the outside manages to get hold of your sensitive information and store it on their computer.
Data breaches also have several definitions, but this one feels the most appropriate:
- A data breach is a cyberattack in which sensitive, confidential, or otherwise protected data has been accessed or disclosed in an unauthorized fashion.
In simple terms, these three terms are used depending on the intent, source, and effect, but all of them apply to exactly the same situation: when sensitive data “gets out.” Data loss focuses on the source of the loss, data leak on the destination that the lost data reaches, and data breach on the clearly malicious intent and scale of data loss. All in all, there’s no need to be worried if you use these terms interchangeably.
DLP Core Features
As we mentioned earlier, the primary function of DLP software is identifying types of data that are considered sensitive data for your organization, monitoring the data in use in real-time, and preventing the data from leaving your systems or being transmitted legally but in an insecure way. Since there are many channels that could be involved in data loss, different types of DLP software focus on these specific channels.
- The core function of all DLP software is to identify sensitive information (data at rest) with as little hassle to the business as possible. Old-school software relying on fixed sets of rules would require DLP administrators to continuously modify these rules and add new ones as new types of sensitive data are identified manually. Modern solutions do it with the help of statistical analysis and machine learning. They use some baseline rules and manual input and, with time, learn to automatically identify data better and better, also helping you to gradually limit false positives.
- Data loss prevention software that focuses on endpoints prevents any data loss initiated at user workstations (laptops, personal computers). For example, it monitors the operating system clipboard to identify sensitive data through data matching and then monitors the target location that the user is trying to paste to. If this is, for example, an instant messenger or a social media site, such an attempt is blocked immediately, generating an alert. However, if it is an internal application, pasting is allowed. Additionally, such software prevents the transmission of data to external media such as USB sticks or enforces the encryption of such media.
- DLP systems that focus on data transmission monitor network traffic in real-time through multiple protocol gateways. They often come with hardware that is part of the internal network and act in a way similar to firewalls. If any kind of sensitive information is detected in HTTP, SMTP, FTP, or other Internet traffic, it is immediately blocked with an alert. Modern transmission-focused DLP solutions focus strongly on cloud DLP and form part of the cloud storage environment instead of the local network.
DLP Policy Framework
With all these definitions, you may be under the impression that data loss prevention is all about technology and software. The time has come to clear that misconception. While DLP solutions are an important part of a DLP policy framework, they should not be its basis but rather an enabler.
The DLP policy framework is your strategic starting point that describes your approach to DLP in different phases. It goes along with your DLP program that describes the tactical aspect of preventing data loss. Such a framework should consider the following aspects:
- DLP planning. Before even starting to think about purchasing a DLP solution, you must thoroughly plan for its introduction as well as plan the supporting trainings, educational sessions, exercises such as drills, webinars, and more. You must also plan and execute thorough data classification. While good DLP tools can help with that and classify data and metadata automatically, they won’t, for example, be able to evaluate the level of sensitivity of specific data or define who should have access to it. Last but not least, at the planning stage, you should consider any regulatory obligations and compliance requirements such as HIPAA, PCI DSS, GDPR, and more, as well as any internal and external audits as required.
- DLP implementation. This step includes more than just implementing and configuring DLP software. Before that, sensitive data identified during the planning stage should be sanitized, redacted, and, if necessary, retired. You can prevent a lot of data loss by removing sensitive elements from digital assets that don’t need to contain them or deleting sensitive data from locations where it’s not required. This is also the right time to set up data access and exchange permissions and controls, as well as the time to analyze any data access logs you may already have in place to identify those that require access.
- DLP maintenance. Unfortunately, even after implementing a top-class DLP tool, you can’t just sit idly and hope that it will prevent all data loss incidents. You should continuously monitor data access to cut off unnecessary permissions, maintain user accounts (especially in the case of any layoffs), monitor for new types of sensitive data, and last but not least, be well-prepared in case of an incident with a suitable reaction and remediation plan.
Building your successful DLP program
A DLP program is the tactical counterpart of your DLP strategy. Of course, such programs greatly differ from organization to organization due to different structures, requirements, and types of data. However, all of them have some aspects in common, which you should consider including in your program for it to certainly be successful:
- Education and awareness. Data loss prevention begins with employees being aware of what is considered sensitive data and what they are allowed to do with it. Without training, many employees, for example, won’t realize that sending a spreadsheet with sensitive data to their wife to help with formatting is considered data loss and may cost the company a huge fine. Boeing learned that the hard way in 2017, losing the personal data of 36,000 employees.
- Following general cybersecurity best practices. For example, the principle of least privilege is a good place to start. It helps you make sure that only the right people have access to sensitive information, and limiting access limits potential losses. It’s much easier to allow an action if requested by currently unauthorized users than prevent an action if someone has extensive access rights.
- Combining with other security policies. DLP must be part of a thorough security policy framework, not just a document on its own. For example, even the best DLP solutions won’t help much if your web apps are full of vulnerabilities and malicious hackers can easily get to your sensitive information through an SQL injection. In such cases, data leaks and data breaches are just a matter of time.
Common trends and reasons for DLP adoption
In the digital age, almost every organization deals with some kind of sensitive data in their electronic systems. Even the smallest businesses may be faced with having to protect customer information such as credit card numbers or even as simple as email addresses and first/last names.
At the same time, information gains more value to cybercriminals. Due to more and more systems working digitally, the potential for identity theft or using sensitive information for access to other systems grows quickly with time. And it’s not just criminal organizations that are after your sensitive data. Easy-to-use ransomware packages and bitcoins are a perfect means for homemade wannabe malicious hackers to try to use your data to make a quick buck.
No wonder cybersecurity, in general, is becoming a necessity for any organization. At the same time, for many years now, we’ve been experiencing something called the cybersecurity talent gap – there simply aren’t enough cybersecurity specialists on the job market and in training to satisfy the needs. That’s why organizations are desperately looking for software that could support limited cybersecurity resources that they may scramble.
While large organizations usually handle cybersecurity on their own, SMBs often opt to employ the services of MSSPs (managed security service providers). However, MSSPs face exactly the same challenges – skills gaps, limited budgets, competition, and customers wanting more for less. All in all, more and more organizations are looking for easy-to-use, effective, and financially sound solutions not just for DLP but for all things cybersecurity.
The situation in the DLP market at the moment is quite dynamic because many organizations, especially the largest enterprises, are still trying to figure out how to get effective DLP with limited teams. Traditional DLP solutions have proven themselves insufficient, which is even confirmed by the decision by Gartner to retire the Magic Quadrant for Enterprise Data Loss Prevention and switch to a Market Guide on that topic. The Leaders of the Magic Quadrant have not provided anything significantly new for years, and the Challengers field is empty.
Why Endpoint Protector?
Leaders are not leaders? Enter the true Challenger – Endpoint Protector. CoSoSys has been in the DLP Magic Quadrant previously but was classified as the Niche Player in the last edition in 2017. Since then, it has established itself on the market as a specialist player who provides best-in-class endpoint DLP.
Endpoint Protector leaves cloud DLP to specialized cloud DLP providers and network DLP to integrated DLP solutions. It focuses on the most sensitive area of potential data loss – the end users. In addition to offering modern data classification technologies with statistical analysis and machine learning, Endpoint Protector helps you make sure that your users don’t share your sensitive data outside and don’t transport it in an insecure way. This helps you protect against data loss due to the activity of malicious insiders as well as unintentional cases due to errors, falling for phishing scams, and negligence.
See for yourself. Give Endpoint Protector a try.
Frequently Asked Questions
Download our free ebook on
Data Loss Prevention Best Practices
Helping IT Managers, IT Administrators and data security staff understand the concept and purpose of DLP and how to easily implement it.