member sign-in
Forgot password? Create new account Close

Content Aware and DLP

Definition

A set of technologies used to protect confidential information. Content-Aware and Data Loss Prevention technology is designed, using data-centric concept, to prevent inadvertent or accidental leaks or exposure of sensitive enterprise information outside authorized channels using monitoring, filtering, blocking and remediation featuresthroughout its lifecycle with mini­mal impact on business process. DLP technologies are able to identify confidential content contained within an object - for example, a file, an e-mail message, a packet, an application, or a data store while at rest (in storage), in use (during an operation) or in motion (across a network) and has the ability to dynamically apply a policy (for example, by logging, reporting, classifying, relocating, tagging, encrypting or applying enterprise digital rights management protections).

DLP technologies can include hardware and software solutions that are deployed at the endpoint (desktop and servers), at the network boundary and within the enterprise for data discovery purposes, and they perform deep content inspection using sophisticated detection. Mechanisms for classifying information content may include exact data matching, structured data fingerprinting, statistical methods (such as Bayesian and machine learning), rule and regular expression matching, published lexicons, conceptual definitions, keywords, and watermark recognition. DLP products maintain detailed logs that can be used to support investigations.

User Benefits

DLP technologies give organizations tools to develop, educate and enforce better business practices concerning the handling and transmission of sensitive data. This technology is designed to be an effective in preventing theft of intellectual property and for preventing accidental disclosure of regulated information, but used to its full capability, DLP is a nontransparent control, which means it is intentionally visible to an end user with a primary value proposition of changing user behavior. Nontransparent controls can change the business culture in organizations, and it's critical to get business involvement in the requirements planning and implementation of DLP controls.

The inadvertent data lost actually represents the problem, so these automated controls are proving useful. However, motivated insiders will always find ways to steal data, and no technology will ever be able to fully control this.

As DLP technologies became more mature, organizations expectation from a DLP product must include abilities like:

  • Detection of sensitive content using sophisticated content-aware detection techniques, including partial and exact document matching, structured data fingerprinting, statistical analysis, extended regular expression matching, and conceptual and lexicon analysis
  • Detection of  sensitive content - structured and unstructured data in any combination of network traffic, data at rest or endpoint operations
  • Blocking capabilities for data at rest, in motion and in use.

Business Impact

This technology is not foolproof and complete developed, but it effectively addresses to more than 95% of lost that is due to accidents and ignorance. Organizations are finding that this technology does indeed meet their expectations and significantly reduces non deliberate outflows of sensitive data.

Using a DLP product, benefits are visible:

  • Risk Reduction: By knowing where your data is stored and how it’s being used, you can reduce your overall exposure to potential loss.
  • Cost Savings: DLP may help reduce other costs associated with data management and se­curity.
  • Compliance Support: DLP helps reduce the direct costs associated with some regulatory com­pliance, can ease audits, and reduces the risks of certain compliance-related incidents.
  • Policy Enforcement: Many data management policies in enterprises are difficult or impossible to enforce. DLP supports enforcement of accept­able use of information, not just security con­trols.
  • Data Security and Threat Management: While no security tool stops all threats, DLP reduces the risk of certain malicious activity.

Products supporting this technology

Forcepoint General Dynamics Fidelis Cybersecurity Solutions McAfee
Quarri

DLP technologies are performing content inspection of data at rest, in use or in motion, to identify sensitive con­tent and can execute responses, ranging from simple notification to active blocking, based on policy settings. The technology must support sophisticated detection techniques that extend beyond simple keyword matching and regular expressions.

Data Loss Prevention is one of the most controversial security tools on the market. With a lot of different names and even more technology approaches, it can be hard to be aware about the real value of the tools and which products best suit to which enterprise.   DLP is still not a fully mature technology and all the vendors are developing new features, requested by the market. But, keeping all this in mind, still DLP product provides significant value for those organizations that need it. In this moment on the market are a lot of products that pretend to be DLP products, but they have only some content aware features. The first problem in understanding DLP is figuring out what really means DLP. A lot of names are used to describe the same market: Data Loss Prevention/Protection, Data Leak Prevention/ Extrusion Prevention/ Content Monitoring and Filtering, Content Monitoring and Protection….and maybe are more. DLP seems the most common term and easy to use.

A short definition of DLP can be "products that using content analysis, are able to identify, monitor, and protect data at rest, in motion and in use based on central policies".

On the market are products having DLP as a feature, and DLP as a product. A number of products, provide some basic DLP functions, but aren't necessarily DLP products. The difference is clear:

  • A DLP Product has centralized management, policy creation, and enforcement workflow and is dedicated to the data protection based on content, using dedicated specific techniques.
  • DLP Featuresinclude some of the detection and enforcement of DLP products, but are not dedicated to the task of protecting content and data.

This difference is important when a customer choose the solution, because DLP products solve a specific business problem that may or may not be managed by the same business unit/user responsible for other security functions. Sometimes non-technical users are responsible for the protection of content, such as a legal, human resources department or compliance officer. Some organizations consider the DLP policies very sensitive and need to be managed by business unit leaders outside of security, which also supports a dedicated product.

Because the DLP is dedicated to a clear business problem - protect the content, DLP customer should look for dedicated solution.  With a DLP dedicated product the customer has the possibility of the central policy creation & management, and the workflow should be dedicated to the DLP problem and be isolated from other security functions.

For understanding the concept, it is necessary to define what means by protecting data at rest, data in motion and data in use.

  • Data-at-rest component isscanning in storage and all content repositories to identify where sensitive information is located. Another name used is contentdiscovery.
  • Data-in-motion component is sniffing of traffic on the network (passively or inline via proxy) and identify content across communications channels.
  • Data-in-usecomponent is typically endpoint solutions that monitor data as the user interacts with it.

It is important to keep in mind that the DLP technology is highly effective against bad business processes (unencrypted FTP exchange of confidential information) and mistakes.

Another term very used and important in defining the DLP technology is contentawareness. One of the distinctions of DLP solutions is that they look at the content itself, not just the context. Context would be about the sender, the recipient and also about the path. Content techniques are digging into a file embedded in the Word file, embedded in a .zip file and then in a .pdf, and detecting that one paragraph matches a protected document.  A product is considered content aware if it uses one or many content analysis techniques.  Many of the products on the market today support more than 300 file types, embedded content, multiple languages, double byte character sets (for Asian languages), and can pull plain text from unidentified file types. Some of them support analysis of encrypted data if they have the recovery keys for enterprise encryption, and most can identify standard encryption and use that as a contextual rule to block/quarantine content.

Some content analysis techniques used today are:

1. Rules-Based/Regular Expressions:This is a basic analysis technique and it analyzes the content for specific rules (ex. 16 digit numbers like in the credit card checksum requirements, other textual analysis). Most DLP solutions enhance basic regular expressions with additional analysis rules (for example a name in proximity for credit card numbers). This technique is practically a first-pass filter, that can be easily be configured.

2. Database Fingerprinting orExact Data Matching. This technique looks exact matches in database dump or live data (via ODBC connection) from a database. More advanced tools are able to search for combinations of information (first name or initial, last name, and credit card)

3. Exact File Matching is thetechnique based on a hash of a file and monitor for any files that match that exact fingerprint and is useful for media files and other binaries where textual analysis isn't necessarily possible.

4. Partial Document Matching is lookingfor a complete or partial match on protected content. Using this technique and also Extract File Matching DLP solution will look for both the complete text of the document, as well as for small pieces (as a few sentences.). Top vendors added and other analysis on top of the cyclical hashing, such as removing whitespace, looking at word proximities, and other linguistic analysis. This technique is used for the capability of protecting unstructured data.

5. Statistical Analysis is usingmachine learning, Bayesian analysis, and other statistical techniques to analyze the content and find policy violations on content that resembles the protected content. These are very similar to techniques used to block spam and to be efficient requires a large amount of source content.

6. Conceptual/Lexicontechnique uses combinations of dictionaries, rules, and other analysis to protect the "idea". This technique is used fro unstructured ideas that defy simple categorization based on matching known documents, databases, or other registered sources.

7. Categories means pre-built categories with rules and dictionaries for common types of sensitive data, such as credit card numbers/PCI protection, HIPAA, etc.

Not all products include all techniques and most products can also combine all techniques and contextual analysis.

The central management is the most important component of the DLP solution and is used for defining policies and manages workflow. DLP is focused on a business problem (pro­tecting sensitive information) as opposed to a techni­cal problem (network attacks) so is very important for the DLP products to support both non-technical and technical users. Policy creation should be easy to make and an incident management to be easy to be performed even by a human re­sources manager.

Other important features  for a DLP solution includes detailed reporting process, inte­gration with user directories to tie policies to users and groups, hierarchical management for multiple systems in large environments.

  • manufacturer
  • Category
  • Type