DiskBoss is optimized for modern multi-core and multi-CPU systems and is capable of parallelizing the file classification process in order to increase
the speed of file classification operations. DiskBoss provides a number of performance optimization options allowing one to control how many parallel
threads should be used to scan directories and how many parallel threads should be used to classify files.
In order to customize file classification performance options, open the file classification dialog, press the 'Options' button and select the 'Advanced' tab.
The 'Max Dir Scan Threads' option sets the maximum number of parallel threads to use to scan input disks, directories and/or network shares. This option is
especially useful when processing a large number of network shares allowing one to mitigate the network latency and slowly responding servers and NAS storage
devices. The 'Classification Threads' option sets the number of parallel file classification threads to use to classify files.
Another option very significantly impacting the performance of file classification operations is the 'Show Files User Names' option, which is located on the 'General'
tab of the file classification options dialog. When this option is enabled, DiskBoss inquires user names for all processed files and saves all user names in the file
classification report allowing one to show file classification statistics per user. The operation of inquiring a user name for a file is a relatively slow operation
especially when performed over the network and due to performance considerations this option is disabled by default. If the user needs to enable this option,
it is highly recommended to configure the file classification operation to use at least 4 parallel file classification threads even on single-core or dual-core systems.
File Classification Performance Results
The performance of file classification operations highly depends on the type of the storage device, the number of available CPUs and the speed on the network for file
classification operations performed over the network. For example, when classifying files located on a local SSD disk (without inquiring files user names), the performance
of file classification operations can reach up to 50,000 files per second. As it is show on the example performance graph, the maximum file classification performance
is reached with 4 parallel file classification threads.
On the other hand, when the same file classification operation is performed with the 'Show Files User Names' option enabled, the single-CPU performance drops significantly
from 31,500 Files/Sec to 4,900 Files/Sec while the multi-CPU performance continues to scale very well up to 8 parallel file classifications threads and reaches 23,000 Files/Sec
when all 8 CPUs are used to classify files in parallel.
Almost the same level of multi-threaded performance scaling is displayed when classifying files with the option to show files user names enabled on a system with a small
number of physical CPU cores. In general, the operation of inquiring a user name for a file does not require any CPU resources and for each processed file DiskBoss just
waits for the operating system to return a user name making it highly scalable to use a large number of parallel processing threads to inquire user names for a number
of files simultaneously.
File Classification Performance Results Over the Network
From the performance point of view, classifying files located on multiple network shares over the network is a slightly different operation, which depends on the number
of processed network shares, the speed and the latency of the network and the type of processed storage devices. For example, when classifying files over a high-speed,
low latency local network, the performance of file classification operations scales from 6,250 Files/Sec for a single network share classified using a single CPU to
23,800 Files/Sec when 4 network shares are processed simultaneously using 8 parallel file classification threads.
In such a configuration, the performance of file classification operations highly depends on the number of parallel threads used to scan network shares and the number
of parallel threads used to classify files. For high-speed, low latency networks, the number of parallel threads used to scan directories should be equal to the number
of parallel file classification threads. For slow, high-latency networks, it is possible to reach a high file classification speed when classifying a large number of
network shares simultaneously and using a large number of parallel directory scanning threads.
When the same file classification operation is performed with the option to show files user names enabled, the performance of the file classification operation drops
dramatically to just 150 Files/Sec for a single-threaded operation and scales up to 1,136 Files/Sec when files are classified using 8 parallel file classification threads.
In this case, the performance bottleneck is definitely the operation of inquiring a user name for a file over the network and in order to increase the performance of such
operations it is recommended to use a large number of parallel processing threads to inquire user names for a number of files simultaneously.