Endpoint-Based OCR for DLP is Closing the Gap in Enterprise Data Protection

Chris Roney October 19, 2023January 30, 2024 Data Loss Prevention Endpoint Protection

When sensitive data is contained within an email or other file types, data loss prevention (DLP) solutions are able to use text inspection and deep packet inspection to identify and block its transfer. However, when that same data is embedded within an image, Optical Character Recognition (OCR) must first be used for text extraction before the data can be assessed by the DLP policy.

Instances of sensitive data being exfiltrated through image files (i.e., jpegs, tiffs, bmps, pdf files) are on the increase. From accidental over-sharing of scanned documents, such as scanned invoices, credit card data, and healthcare insurance applications, to malicious attempts to take photographs/screenshots of confidential on-screen data.

Unfortunately, OCR has been a thorn in the side of cybersecurity teams and compliance officers for years, with many DLP solutions, including Symantec and Microsoft Windows Purview, relying on server-based processing for text extraction. This approach can impact employee productivity, introduce unnecessary data security risks, and even leave DLP policies unable to protect all potential exit points.

However, there are alternatives. DLP solutions such as Endpoint Protector by CoSoSys discover and approach the challenge differently. Rather than rely on server-based processing, OCR text extraction and policy enforcement are handled directly on the endpoint.

Let’s look at why that matters.

1. Speed and Real-Time Processing

Traditional server-based OCR solutions require sending documents to remote servers for processing. This inevitably leads to latency on the employee endpoint as the files must first travel over the internet for inspection. Endpoint-based OCR, on the other hand, processes data locally on the user’s endpoint. This real-time processing significantly reduces wait times, enabling almost instantaneous text recognition and DLP policy matching.

2. The Ability to Scan all Exit Points

Server-based OCR has a major problem; it can only scan images for sensitive text that pass through a server. This means that DLP solutions that rely on this approach are limited to protecting exit points that are, for example, email and web-based only. This limitation creates massive gaps in any data protection, compliance, and DLP strategy since all the other exit points on the endpoint remain unprotected. Endpoint Protector’s OCR is endpoint-based, which means it can configure, identify, and protect text in images across all exit points; including server uploads, network shares, USB drives and removable media, network printers, local/home printers, copy/paste, email, URL uploads, LAN shares, and cloud services, to name a few.

3. Privacy and Security

Server-based OCR solutions involve transmitting sensitive documents and extracted text over the internet to external servers. This process raises potential security risks for organizations that do not allow, or want, their data to be shared outside of their organizational controls. Endpoint-based OCR and DLP mitigate these risks by keeping the data localized on the user’s endpoint. This approach ensures maximum privacy and security, giving organizations complete control over their sensitive information. Endpoint Protector shares event information from the endpoint client to the server through encrypted transit to ensure no sensitive data leaves the endpoint.

4. Cost-effectiveness

Traditional server-based OCR solutions often involve subscription fees or usage-based pricing models, which can accumulate significant costs, especially for businesses processing large volumes of documents. For example, Microsoft Purview users are billed $1 for every 1000 scanned items (it’s worth noting that every page of a .pdf is considered a single page and subject to a charge). There are no such fees incurred with endpoint-based OCR processing including solutions such as Endpoint Protector.

5. Offline Accessibility

A server-based OCR solution will depend heavily on internet connectivity, making it unsuitable for scenarios where users need to work offline. Because endpoint-based OCR operates directly on the user’s endpoint, image files, including .jpg, .bmp, and .png, can be processed, text extracted, and checked against an active DLP policy, even without an internet connection. This offline accessibility ensures uninterrupted productivity, and data protection, irrespective of the user’s location or internet availability.

6. Integration and Customization

Endpoint-based OCR solutions offer greater flexibility when it comes to integration with other applications and customization according to specific business needs. Users can seamlessly integrate endpoint OCR, including API, into their existing formats, workflows, and applications, enhancing overall efficiency. Additionally, developers have the freedom to customize and fine-tune the OCR algorithms to meet the unique requirements of their applications, leading to superior accuracy and reliability.

7. OCR Improvements Coming From Endpoint Hardware

Advancements in CPU performance, as well as hardware-based security frameworks, are quickly making endpoint-based OCR the fastest, most accurate approach to delivering effective DLP. In fact, Endpoint Protector is the first DLP vendor to leverage Apple’s Vision Framework, taking a hardware approach to OCR that delivers a 10x improvement in processing speed and accuracy.

The shift from traditional server-based DLP OCR to endpoint-based OCR for DLP marks a significant milestone in the evolution of document processing and information protection. Embracing this innovative approach not only enables better, more accurate OCR recognition but also empowers users with greater control over their data and enhances overall productivity in today’s digital landscape.

For more insights, check out our webinar below, where CoSoSys CMO Tim Deluca-Smith and Director of North American Sales Chris Roney discuss how endpoint OCR is transforming Enterprise Data Loss Prevention.