Prerequisites
To avoid potential misconfigurations interrupting sensor deployment and configuration, we recommend waiting until after the final stages of testing to use resource limiting features.
Overview
Capsule8’s Sensor can run with customized limits on resource utilization, in order to prioritize resources for production applications over security data collection.
The sensor also employs a circuit breaker capability which, in the event it falls under heavy load, sheds security data collection to maintain host performance.
(Note that these features limit the volume of telemetry being processed, not on the number of alerts being generated.)
Hard Resource Limits
Quick Start
To enable the resource limiter for the sensor and restrict the sensor's CPU and memory usage, add the following block to your sensor's configuration file, which by default at /etc/capsule8/capsule8-sensor.yaml
:
use_supervisor: true
use_resource_limits: true
This will enable the resource limiter with its default thresholds of 5% CPU and 1024MB memory. To apply these changes, restart your sensor.
Additional Detail
This section describes the design, implementation, and usage of the sensor’s hard resource limiting capabilities. This feature allows you to set exact limits for CPU and memory resources. This is implemented using Linux cgroups under the CPU and Memory subsystems. The sensor uses the cgroup named capsule8-sensor
. The implementation requires a supervisor process which executes and monitors the actual sensor. This accomplishes multiple desired behaviors. First, this forces all routines of the sensor process to reside in the cgroup. Since the supervisor process must be done as the root
user, this design also allows us to drop privileges of the sensor by executing the child process as a separate user. It also enables the supervisor process to restart the child sensor process when it exits and to monitor the sensor process for performance and violations.
Usage
The resource configurations are read in from the sensor's configuration file, which by default is at /etc/capsule8/capsule8-sensor.yaml
. The path to the configuration file may be overridden by setting the the CAPSULE8_CONFIG
environment variable. The following section describes the hard resource limiter configuration fields.
Configuration
The following fields are set in the Capsule8 sensor configuration file. They are also bound to environment variables.
use_supervisor
- Boolean value determining whether or not to use the supervisor and, therefore, the hard resource limits.- Environment Variable:
CAPSULE8_USE_SUPERVISOR
- Type: boolean
- Example:
true
,false
- Default:
false
- Environment Variable:
use_resource_limits
- Determines whether or not to use the hard resource limiter functionality of the supervisor.- Environment Variable:
CAPSULE8_USE_RESOURCE_LIMITS
- Type: boolean
- Example:
true
,false
- Default:
false
- Environment Variable:
memory_limit
- The maximum amount of memory that the sensor process is allowed to consume. The string must end in G (gigabyte) or M (megabyte). A special value of0
indicates no limit.- Environment Variable:
CAPSULE8_MEMORY_LIMIT
- Type: String
- Example:
512M
,1G
,0
- Default:
1024M
- Environment Variable:
cpu_limit
- The percentage of total CPU time that the sensor will be allowed to be scheduled for across, adjusting for multi-core processors. The special value of0
indicates no limit. Note: Avoid managing resources through supervisors like systemd as this can cause unpredictable behavior when dealing with multi-core processors.- Environment Variable:
CAPSULE8_CPU_LIMIT
- Type: Integer
- Example:
10
,15
,20
,0
- Default:
5
(i.e., 5% across all
- Environment Variable:
sensor_user
- The user that the sensor process will run as.- Environment Variable:
CAPSULE8_SENSOR_USER
- Type: String
- Example:
myuser
,root
,grant
- Default:
capsule8
- Environment Variable:
log_cgroup_metrics
- Specifies whether or not to log cgroup metrics to stderr on a 2 minute interval.- Environment Variable:
CAPSULE8_LOG_CGROUP_METRICS
- Type: boolean
- Example:
true
,false
- Default:
false
- Environment Variable:
Verification
You can ensure that cgroup configuration is properly working by using the top
utility. When running you should be able to see the memory and CPU usage of the sensor process in the form of percentages of total resources. For CPU, the sensor should never go above the configured CPU limit multiplied by the amount of cores on the machine (the shell utility nproc
will print number of cores). For memory, you can calculate the percentage of the machines total memory which is displayed in top
in KiB by default.
Event Limiter
This section describes the design, implementation, and usage of the sensor’s soft resource limiting capabilities. Event limiting allows you to set rate limits on telemetry collection and to customize the backoff policy.
Event limiting is implemented as follows:
- Telemetry subscription events are fed through a circuit breaker, and when throughput exceeds a predefined rate - in the example below, 3500 events-per-second for 30 seconds - telemetry collection is disabled/rendered dormant for a period of time (
group_request_duration
). - Once collection is disabled, detections flush their cache and enter a dormant state, and the backoff is logged.
- After the dormancy period expires, collection automatically resumes and detections once again are able to instrument and monitor the host.
- Upon resumption the dormancy duration is doubled, and the system will monitor the telemetry collection rate, triggering a backoff as previously noted. This cycle will continue until the
max_retries
ceiling is reached, upon which the event limiter will exit, logging an error. Note that for the time during which the sensor is throttled, telemetry may be delayed or shed.
Usage
The resource configurations are read in from the sensor’s configuration file which, by default, is at /etc/capsule8/capsule8-sensor.yaml
. The path to the configuration file may be overridden by setting the CAPSULE8_CONFIG
environment variable. The following section describes the event limiter configuration fields.
Configuration
The following fields are set in the Capsule8 sensor configuration file. They are also bound to environment variables.
limiter.enabled
- Boolean value indicating whether or not the event limiter is enabled.- Environment Variable:
CAPSULE8_EVENT_LIMITER_ENABLED
- Example:
true
,false
- Default:
false
- Environment Variable:
limiter.events_per_second
- The number of sustained events per second after which point - when combined withlimiter.duration
- the circuit breaker will trip.- Environment Variable:
CAPSULE8_EVENT_LIMITER_EVENTS_PER_SECOND
- Example:
7000
- Default:
3500
- Environment Variable:
limiter.duration
- The period of time, after the event rate continuously exceeds the limit, that collection will be disabled.- Environment Variable:
CAPSULE8_EVENT_LIMITER_DURATION
- Example:
30s
,10s
,60s
- Default:
30s
- Environment Variable:
limiter.max_retries
- The number of times to go dormant and backoff before exiting the sensor with an error (status code 1). The exit is logged.- Environment Variable:
CAPSULE8_EVENT_LIMITER_MAX_RETRIES
- Example:
3
,6
,9
- Default:
3
- Environment Variable:
limiter.group_request_duration
- The period granularity in which to group event counts.- Environment Variable:
CAPSULE8_EVENT_LIMITER_GROUP_REQUEST_DURATION
- Example:
1s
,10s
,2s
- Default:
1s
- Environment Variable:
Alert Limiter
This section describes the usage of the sensor’s soft resource limiting capabilities. Alert limiting allows you to set rate limits on alert output to limit the alert volume the sensor will transmit to a SIEM, logging stack, or webhook.
Usage
The resource configurations are read in from the sensor’s configuration file which, by default, is at /etc/capsule8/capsule8-sensor.yaml
. The path to the configuration file may be overridden by setting the CAPSULE8_CONFIG
environment variable. The following section describes the alert limiter configuration fields.
Configuration
Alert limiting is specified on a per-output basis, which allows configurations where certain outputs have higher limits than others. By default, no limits are applied and an output will receive all alerts. The following additional keys should be specified on an alert_output
to configure alert limiting:
limit_period
- Duration value indicating the period over which to limit alerts.- Example:
60s
,2m
- Default:
none
- Example:
limit_per_period
- The number of sustained events per period after which point - when combined withlimit_period
- the circuit breaker will trip and alerts will be discarded.- Example:
100
- Default:
none
- Example:
The following example configures at most 5 alerts per minute to be written to standard output:
alert_output:
outputs:
- type: stdout
enabled: true
limit_per_period: 5
limit_period: '60s'
Violations and Monitoring
The cgroups for memory and CPU handle violations differently. When the sensor process runs out of memory it will be killed by the kernel and restarted by the supervisor process. The CPU cgroup uses a concept of periods and quotas. The period is a configured amount of time and the quota refers to a number of microseconds per period. The sensor uses a period of one second and the quota is based on the configured percentage. When the sensor process has used up its quota of CPU time, it will be throttled, meaning it will not be scheduled on the CPU until the end of the period. Both of these will have effects on the sensor’s coverage of telemetry events.
The cgroup exposes statistics about CPU throttling which are then exposed by the supervisor process via logs to stderr. This must be turned on via the log_cgroup_metrics
configuration option.
Restarts
When the sensor child process exits for cgroup violations or otherwise, the supervisor process will restart it. This event is logged to stderr.
Capabilities
As part of your installation, the sensor should have the following capabilities:
CAP_CHOWN
CAP_DAC_OVERRIDE
CAP_FOWNER
CAP_KILL
CAP_SETGID
CAP_SETUID
CAP_SETPCAP
CAP_IPC_LOCK
CAP_SYS_PTRACE
CAP_SYS_ADMIN
CAP_SYSLOG
Since the supervisor process executes the sensor as a unprivileged user, this is necessary. If you are getting “permission denied” errors, you can verify these capabilities are set with getcap <sensor_binary>
. You can set these capabilities with setcap cap_chown,cap_dac_override,cap_fowner,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_ipc_lock,cap_sys_ptrace,cap_sys_admin,cap_syslog=+epi <sensor_binary>
Comments
0 comments
Please sign in to leave a comment.