When you think data protection and data privacy, you might think of hackers trying to get past your company's firewall and into your computer to steal your data. But by far the main reason why data breaches are so rampant today has little to do with external hackers. The main cause of data breaches is insider threats. Insider threats are trusted employees, contractors, suppliers and partners, who leak private data into the wrong hands. Sometimes insider threats leak intentionally, but the vast majority of the time, it's just people innocently leaking your data without even knowing it.
Because insiders - your employees, contractors, suppliers, vendors - have access to data to do their jobs, it is really hard to prevent them from leaking it! Few good solutions exist today, but the race is on to solve the insider threat problem. The key is to first deeply understand the roots of the insider threat problem.
Although billions have been poured into solving data breaches over just the last few years, Forrester's top cyber prediction for 2021 is that insider threats will be on the rise by 25%. This is due to a rapid evolution of remote working during the COVID-19 pandemic, fear of job loss, and the ease with which data can be moved.
It's So Easy to be an Accidental Insider Threat
When we hear "insider threat" we think of an inside job. A malicious employee decides to go rogue and exfiltrate valuable data for personal gain. This was certainly the case when Uber and Google made headlines when a former Google engineer easily exfiltrated 9 GB of source code and hundreds of Waymo trade secrets, or when Tesla was breached by an employee, exfiltrating over 300,000 files for personal gain. But for every one of these malicious incidents, there are hundreds of incidents where a trusted employee or contractor simply put data in a vulnerable place by accident.
It's so easy to misplace data by accident. Consider if an employee:
That's a LOT of ways to accidentally be an insider threat! In all of these cases, the employee is simply unaware, or makes an honest mistake, or worst case is being a bit negligent in the name of productivity. No wonder there is a push to educate employees on security best practices and create a "culture of security". Education is a good step, though not sufficient in and of itself.
How Do Companies Protect Their Private Data Today?
Ask any cyber security consultant or MSSP and they will explain the current best-practice data protection technology stack:
But there are gaps in this technology stack, allowing accidental and malicious insider threats to leak data regularly.
Data Classification Is Error-Prone
The gaps in today's data protection stack start from the very beginning of the stack - discovering and classifying data. When classification products attempt to detect confidential data automatically, whether using tried-and-true keyword matching and regular expressions or more advanced machine learning techniques, at best they can protect credit card numbers and SSNs that follow a strict format. But the vast majority of confidential data follows no such format. Think intellectual property like free-form documents that describe patented processes, recipes, secret business plans. Think unstructured personal health information, confidential product specs and designs, etc. Maybe one day technology will be able to detect all that stuff automatically, but today detecting what is intellectual property and what is not programmatically is still far off.
The other approach is to ask users to manually classify their data, as with Microsoft AIP (originally Secure Islands classification), Titus, Boldon James and McAfee. For every single e-mail an employee or contractor sends and every single file they create or receive, they must take the time to fill out a small form that identified whether the data is confidential, secret, restricted, HR related, Financial related, health related, etc. While the manual classification approach is growing in popularity, the error rate of asking busy employees to manually classify every e-mail is predictably high, not to mention the nuisance you impose on staff.
The result is that data is so often misclassified that all the next steps - encryption, DLP, CASB - are not effective at protecting the right data.
Encryption Means Nothing
I encrypt an e-mail or a file and send it to you. You decrypt it. Now you have a decrypted copy and you can do whatever you want with it, including accidentally putting it on an unsecured cloud share, sharing it with a friend who shares it with another friend, or even directly sharing it with a malicious user. You can see how this can quickly get out of hand for a company. Think of HBO encrypting the scripts of the Games of Thrones finale. The script is encrypted, but then the writers need to share that script with hundreds of colleagues, partnering companies and individuals that need to decrypt and review the script. The network executives, the actors, the company that designs the costumes, the company that makes the fake blood, etc. All it takes is one bad apple to accidentally misplace the data, and the finale is leaked!
You can also see how the problem is increased many-fold in COVID-19 times when everyone is working from home and decrypting sensitive data on their personal machines. Now every one of the company's employees and contractors could leak the data, and someone will!
And again, with cloud services so ubiquitous today, it might not even be intentional "misplacement of the data". An employee might intentionally put the script into a sanctioned cloud repository, but then the cloud provider is hacked and the data is stolen anyway.
Encryption Needs to be Persistent in the Supply-Chain
Encryption in-and-of-itself means nothing if the person who can decrypt the data is careless or malicious with the decrypted data they now have in their possession, or places it into a cloud application or cloud service that ends up being hacked. Of course, a company can educate its employees and contractors on how to handle sensitive data once it is decrypted, which is why the concept of developing a strong "security culture" is trending.
But educating and ensuring your vendors, suppliers and partners treat your sensitive data carefully is another story. I challenge you to look at the details of just about any public data breach and you will find that very often it is a vendor, supplier or partner that accidentally leaked the data (see some of the most egregious examples in my Market Focus: Supply-Chain Security article, including the recent SolarWinds hack that lead to a 6-month breach of the US Treasury Department).
Two further examples, Deloitte and Accenture, highlight that while the weak-link can be your mom-and-pop HVAC supplier, large suppliers are also a risk. Deloitte and Accenture are widely regarded as experts in data protection and are both paid handsomely to advise on security. The egregious breaches that they suffered, allowing hackers to access their clients' highly sensitive data, were the pinnacle of embarrassment in supply-chain security.
DLP is left in "monitoring mode"
Data Loss Prevention (DLP) is a massive enterprise-scale technology that you program with intricate rules about what types of data can be shared, when, and under what circumstances (your "data protection policies"). DLP is like a traffic cop that is constantly scanning data traffic within your company, as well as what is coming in and going out, and stops users from doing something if it's against the rules.
But as we saw higher up in the data protection technology stack, DLP relies on error-prone forms of data analysis, data discovery and data classification to identify what data should and should not be protected. The security team can crank up the DLP (i.e. get more restrictive on data sharing, even when it makes a mistake about what data should be shared) or turn down the DLP (i.e. get more relaxed about data sharing rules, understanding that it may be making mistakes). When DLP is cranked up, it results in lots of false positives, where employees can't get their jobs done because the DLP blocks them from rightly sending some data to their colleague, or their boss, etc. Employees inevitably complain to the point where DLP gets turned down, and turned down further, until it is ultimately turned down to "monitoring mode". This means the DLP doesn't actually attempt to block anything for fear of getting it wrong and hampering employee productivity. So instead, it just monitors data sharing behind the scenes, knowing that at a minimum if a major breach occurs, it can refer to its monitoring logs to see what happened. Meanwhile, sensitive data leaks freely and the DLP just watches it happen.
DLP Misses Most Non-Office File Formats
Classification, encryption and DLP don't miss all sensitive. To be fair, it's still better to have them than not, just as it's better to wears shorts and a T-shirt when it's snowing outside than no clothes at all! But one real failing of the data protection tech stack are files that are not simply office data, i.e. anything that isn't an e-mail, Word document, Excel file or Powerpoint.
The industry has focused a lot on generic office data, however talk to any enterprise CISO and you quickly learn that their most valuable data is in non-office formats:
Even worse, the majority of large enterprises also rely heavily on line of business ERPs like SAP, as well as their own legacy or home-grown line of business applications at the core of their operations. When data is exported from those applications whether for sharing internally or externally, that is an immediate threat to the business.
Imagine a legacy CAD tool that produces an enterprise’s key industrial designs, however the editor is no longer supported by the vendor. Or a home-grown content authoring tool that no longer has an in-house development team. These legacy applications are so entrenched in business workflows that changing to another application for security reasons is unrealistic, so the enterprise has no choice to find a data protection solution… or simply operate with no protection.
A DLP might be able to spot highly structured, pattern-oriented data like credit card numbers and social security numbers (though even that is not always true). But DLP will miss most forms of intellectual property like product designs, manufacturing blueprints, corporate IP, employee personal information, HR information, etc. This is because IP is often not in a machine-detectable format like a credit card, and furthermore housed in non-office formats like CAD, PSD, image files, screenshots, source code, as well as legacy and proprietary formats that DLP doesn’t handle.
What About Digital Rights Management?
Digital rights management (DRM) comes in different flavours but the concept is that the company that owns the data will embed some information into the data that they encrypt such that even when the data leaves their building and goes into an external user's hands, the company still has control over it. Think of my earlier example of when HBO shares the script of the Game of Thrones finale with some of its partners. HBO owns that data, even when it is in the partner's hands. With persistent encryption, HBO could still have control over the data even when it has gotten passed around from partner to partner outside their building. They can turn access to that data on or off. They can see what users are doing with the data, how it's being passed around outside their four walls. They don't lose control of the data just because it's now in an external employee, contractor or partner's hands.
This is the Digital Rights Management dream. However in practice, DRM still has an incredible amount of usability problems at scale. For one thing, most file editors don't have this capability. Microsoft Office and Adobe do, but not Google Docs, nor CAD editors, nor image editors, nor Visio or MS Project, nor media file editors, nor ERP software, nor custom tools.
DRM really only works on MS office or Adobe files today. To encrypt an office file, you can use Microsoft's built-in capabilities, or a 3rd party tool like Vera or Seclore. These tools are all proprietary though, so if you encrypted with Microsoft, then your partner can only decrypt with Microsoft. If you encrypted with Vera, then your supply needs to install Vera to decrypt it. One day in the distant future perhaps there will be some open standard where all DRMs can encrypt and decrypt files interchangeably, but that would be very far off indeed.
Microsoft has made its own digital rights management capabilities natively available in Microsoft Office using Microsoft RMS (aka Azure Information Protection), yet few enterprise are able to use the technology at scale and so it remains largely unused. Just consider: what if you encrypt your data with RMS internally but your external partner does not have RMS to decrypt? What if your external partner has RMS but doesn't have federated identity, or a Microsoft-compatible identity? And even within your own organization, between internal employees, who determines who has access to what data? What if there's a mistake in the security policies? Suddenly you can't e-mail your boss because the file is locked down too strongly with persistent protection, and it's not clear how to resolve it. Imagine this problem at the scale of 1000s and millions of workers. It's a massive usability problem involving both cooperating underlying technologies and sensible visual cues to give the user context on how to resolve data access problems.
There is one more major problem with encrypting data using any DRM tool. Encrypting your data effectively locks your data in with that vendor. As described in Microsoft RMS: Standard or Stranglehold, the Microsoft AIP product is a "closed modular system" (see graphic from the article re-printed below). If you are no longer happy with the RMS/AIP product, there is no decommission option to decrypt your data and migrate to another vendor. You are effectively paying Microsoft (or any DRM vendor) for the privilege of unlocking your most sensitive data every month in perpetuity. Not a good deal.
Solutions To The Insider Threat Problem?
Here's what's on the horizon, and the opportunities presented to software vendors and IT professionals as a result:
Education and Security Culture
Since so much of the insider threat problem is accidental (a trusted employee, contractor or supplier simply putting data where it shouldn't go accidentally or negligently because they are in a hurry), security education is an obvious first line of defence.
Tightening Cloud Security
So much sensitive data not lives in the cloud, whether a cloud repository like Box or Amazon S3, a cloud application like Salesforce, Dynamics or JIRA, cloud e-mail like Gmail and O365, and the list goes on. Another first line of the defence is simply ensuring industry-wide baselines that all cloud vendors and cloud users must follow, at a contractual level, technology level (e.g. CASB and native cloud controls), and user awareness level. This will take years to level-set on.
Beyond the basics of encrypting data at rest and in transit, persistent encryption has been on the rise to solve supply-chain security and other "last mile" data protection problems. An obvious use case is when data is exported (or egressed) from a cloud service. While the industry is very focused on protecting data in the cloud itself, data can be exported out of the cloud with little protection today!
Persistent protections would encrypt and protect sensitive e-mail, files and other transient data wherever they go. Even if they fall into the wrong hands - say, a supplier's hands, or a hacker's hands - they are still protected.
Microsoft and other large players that dominate business data creation tools like Adobe, IBM and Google, will, over time, will mature their DRM capabilities to be more user-friendly so that the use of their built-in encryption will become widespread. When you own the data creation tool (O365, Gmail, Acrobat, etc.) and your encryption is built-in and proprietary, that's a huge advantage. It may just take many years for the solution to improve sufficiently that it can be usable on a wide scale.
Even further out might be the implementation of a persistent encryption open standard, where Microsoft, Adobe, Google, IBM as well as any small player like Vera and Seclore can encrypt and interchange data following a common, open standard, solving many of today's scalability and usability issues. This was already done for the persistent protection of media files in particular with the new W3C Encrypted Media Extensions (EME) standard. EME is now embedded in multiple applications and platforms, and allows content distributors to manage protected video files using one or a combination of key management and delivery systems (Google Widevine, Apple FairPlay, Microsoft PlayReady, Adobe Primetime, etc.) But a widespread data protection standard for all data types seems far off since the big players like Microsoft have little incentive to participate in a common standard and erode their advantage.
In the meantime, for companies that need to protect their data TODAY and can't wait a decade for the big players to get it right, the door is still open to a technology innovator who could provide persistent encryption in an innovative way that spares the problems of DRM. One promising company I have worked with is SecureCircle, a zero-trust solution with a novel approach to persistent encryption of any type of data.
Persistent encryption can protect data that flows into suppliers' hands, but it doesn't solve all forms of data leaks from remote workers and suppliers. A more hard-line approach that some enterprises take is to ship secured laptops to their third parties and require the third parties to only work on those laptops. Others require that their suppliers work in a locked down virtual space using VDI. These solutions are burdensome, slowing down productivity, and have a hard time scaling across thousands of suppliers, especially in a COVID-driven world of all remote work. But new forms of secure virtual machines and VM orchestration are surfacing, such as Tehama.io that promise to make widespread use of locked down virtual machines scalable, usable and secure.
Solving The Insider Threat Problem
For security software vendors and IT/IS professionals alike, the insider threat, and especially the "accidental" insider threat vector, represents a massive untapped market opportunity. In today's world of data freely flowing to cloud services and cloud applications, even the most innocent actions such as sending an e-mail, uploading a file to a sanctioned cloud application, or sharing data with a trusted partner can accidentally leak data into the wrong hands. The industry will take a decade at least to catch up on many fronts, including education, awareness, and technology.
In the meantime, companies can't wait that long to protect their data, and high-profile breaches are happening daily. Innovative solutions that help educate, secure the cloud, and persistently protect data, without introducing usability and productivity drag, will get traction in the market.