For a recent client project I was tasked with performing a card-data discovery audit, which intended to cover the usual Windows workstation/server estate, and also a Linux environment. One of the conditions of scanning the Linux environment was that symlinks were to not be scanned alongside the mounted drives.
Nessus was an obvious choice for the Windows systems as Tenable offers the File Content audit Compliance Policy, which was easily configurable (file types, amount of file data audited, regular expressions for the card-matching check).
However, for the Linux systems there were no seemingly useful tools for Linux, and given my current predicament, the best option was to build my own.
The tool was built in Python 2.7 and does not contain any obscure modules or anything, so it should be able to be run on any system with that version of Python installed.
Only one input is required to start the scanner off, with many of the default settings being configured for a quick, simple, local scan. The inputs needed to start a minimal scan are the file extensions that will be targeted, as an example for just searching text files you would use:
Multiple file extensions can be used and would need to be separated on the input command by spaces. As an example of common document files:
-e txt doc docx rtf.
Two options can be set to configure the depth of the scan from the starting path, although both have default values if the input arguments are not used.
The first option relates to the Minimum Scan Depth, set by using either of these aguments:
--min-depth. The argument for this would need to be an integer, e.g. if I wanted to scan from the first layer of child-folders I would use:
-d 1. By default this value is set to
0 to scan from the default
The second option is used to set the Maximum Scan Depth, again set by using a similar command:
--max-depth, with an integer again required for the argument. This option is useful to restrict the scanner from delving too deep in to the file system, and is one of the several options used to judge the length of time the scan runs. By default the value of this option is set to
3 as to only scan 3 folders deep from the scan path.
One of the other options that can determine scan time, and thereby the performance of the scan itself, is the lines input argument. This option is used to configure the amount of data (i.e. lines of a file) that the tool audits for potential card data. Usage of this input is set to
50 by default, with the input arguments as follows:
--lines, followed by the number of lines to search as an integer.
Initially a search for viable files to audit is performed. As noted previously there are several variables for scan performance, however one of the more considerable contributors to this is the potential files, specifically their sizes, that will be audited.
This tool can be configured to restrict the maximum and minimum file sizes so that any files outside of these limits are not subjected to the content comparisons later in the script.
As an example, the minimum file size of files that will be audited is set to
16c, which is actually 16 Bytes. On the flip side, the maximum file size of files to be returned is set to
100k, which is 100 Kilobytes.
Amending the default values can be performed on the command line input of this tool. For minimum size alterations the input argument
-min [int] or
--min-size [int] followed by the desired value. Maximum sizes can also be changed using a similar input argument
-max [int] or
The units for the size arguments are as follows:
1 2 3 4 Bytes = c Kilobytes = k Megabytes = M Gigabytes = G
Originally the tool was created to ignore symlinks, however it was also necessary to exclude the scanning of mounted remote file directories. However, by using the
--scan-mount arguments it is possible to scan these mounted file systems.
Note: Enabling this option will scan _ all _ mounted directories unless they are excluded using the
--exclude options (see below for more information).
Root Path to Scan
Paths can be individually scanned, for example it is possible to scan only
/opt/ and no child-directories or parent-directories by using the following argument:
./cardscan4linux.py -e txt csv -p /opt/ -d 0 -D 1.
-d 0 -D 1 states that the
/opt/ directory will be scanned for files, and that any files ending in
csv within that directory (hence the
-D 1) will be scanned.
-p [path] or
--path [path] arguments set the ‘root’ path to scan from. This can be any directory path, but by default the path
/ will be scanned if no argument is entered.
Often it may be required to exclude paths from being scanned, and this is of course possible by using the
--exclude arguments, followed by the directory path (e.g.
Wildcards can be used within this to exclude mid-level directories. For example
-x /*/path/* will exclude searching of anything with path on the second level directories.
An alternative to outputting the results to the terminal screen is to use the
--output options. Using this argument will save
This pipes full file-paths that the tool is scanning in real-time. Warning: the output can be overwhelming and this is implemented for potential debugging purposes.
Below you can find some example outputs:
Running a default scan for the TXT file extension.
Scanning the '/opt/ directory (only) for TXT and CSV file-types, with default configurations for all other options.
Outputting results to a file.