Create Data Loss Prevention policies
The Halon platform features a Data Loss Prevention (DLP) engine, that you can use to comply with DLP policy requirements. Our engine operate on a level that is called "data in motion", that is on data (e-mail) that is in-transit between two endpoints (clients and/or mail servers). It features different techniques in order to detect policy violations (all covered below). Once a violation is detected, the administrator may choose an appropriate action such as quarantine, log or reject the message.
Implementation
The Data Loss Prevention (DLP) engine is implemented by a process called halon-dlpd
. It is used from within the EOD context using the dlp client plugin. And it behaves very much like an anti-virus engine in the sense that it operation on patterns (user-defined), unpacks compressed archives, searches for violations and once done returns them back to the EOD context so that an action may be taken.
- Different part of the organization may have different policies.
- It should primarily be used to detect outbound violations.
Filter types
When creating a DLP policy, you get to select the policy type, described in the following sections.
Content scanning
"Content" scanning allows for user-defined rules (regular expressions) to detect well known patterns such as, credit card numbers or "secret" project names. This is useful when you know that no such information should leave the organization. Matching is done case-insensitive.
This example may detect credit card numbers.
\b4\s?-?\s?(?:\d{4}\s?-?\s?){3}\b
\b6011\s?-?\s?(?:\d{4}\s?-?\s?){3}\b
\b4\d{3}\s?-?\s?(?:\d{4}\s?-?\s?){3}\b
\b3(?:0[0-5]|6\d|8\d)\d\s?-?\s?\d{6}\s?-?\s?\d{4}\b
\b(?:213\s?-?\s?1|180\s?-?\s?0)\d{3}\s?-?\s?(?:\d{4}\s?-?\s?){2}\b
\b3[47]\d{2}\s?-?\s?\d{6}\s?-?\s?\d{5}\b
\b5[1-5]\d{2}\s?-?\s?(?:\d{4}\s?-?\s?){3}\b
\b35\d{2}\s?-?\s?(?:\d{4}\s?-?\s?){3}\b
You can add a comment by appending it to a row using the following syntax (without a blank space in between):
(?# Your comment goes here)
File type and MIME type
"File name" and "MIME type" detection may not be a true DLP feature, but for example a software company may have filter to detect source code files (text/x-c or .cpp), and quarantine them until an administration/senior developer has cleared the intent.
Our engine implements a technology called "magic", it searches the beginning of a file to detect the appropriate MIME type for that file. Tools to detect file types (regardless of extension) are available in almost every Unix installation and is called "file". To the detect the MIME type of a file run file -mime-type filename.ext
. The result shown are what should be used in your rules.
# file --mime-type main.cpp
main.cpp: text/x-c
add ^text/x-c$
on a single line. Matching is done as regular expressions, therefore the start ^
and end $
should be marked ^text/x-c$
. As for file extension you should escape .
and mark the end as well \.cpp$
. This is semi-important so that the filter .cpp doesn't match a filename like acpp-report.doc. Matching is done case-insensitive. For example, in order to block Windows executable (.exe) files, even if zipped, use the following file name pattern:
\.exe$
Document fingerprinting
"MD5 fingerprint", "SHA1 fingerprint" and "SHA2 fingerprint" allows for exact file matching, it should primary be used on files that is static by nature, such a images, binaries etc. because even the smallest change will alter the document fingerprint. MD5, SHA1 and SHA2 are all one-way hash algorithms, they take any data or document as input and outputs a string of text unique to that document. They are (for this purpose) equally good.
Tools to generate these hashes are available on all operating system. In Linux these tools are called "md5sum" and "sha1sum".
# md5sum document.ext
b07a682853e7bbafea145fa189dc7444 document.ext
# sha1sum document.ext
0cd377adf7ebbef00d7e4b0b388c05e21cfda9c7 document.ext
add b07a682853e7bbafea145fa189dc7444
on a single line on a MD5 fingerprint rule.
Testing
You may now test the rule by sending a ZIP file containing (c/cpp) source code files. The message should now rest in the quarantine, and may be released or deleted.