Overview
Hadoop is an open-source framework capable of processing applications executed in a distributed computing environment. Hadoop HttpFS is a REST API the runs on node in an Hadoop cluster and allows for performing Hadoop actions through a single access point. Using the Capsule8 product you can set up investigations to load data into a Hadoop cluster using HttpFS by following the procedure explained below:
Requirements
- Hadoop Cluster
- Capsule8 Sensor running in your environment
- Kerberos Server (Optional, required for using authentication)
- Kerberos keytab file for the sensor (Optional, required for using authentication)
1. Configuring the Sensor:
Edit the configuration file /etc/capsule8/capsule8-sensor.yaml
.
Add a sink for HttpFS, enable the sensor to create directories, and turn on the flight recorder:
cloud_meta: auto blob_storage_create_buckets_enabled: true investigations: reporting_interval: 30s sinks: - name: "[namenode hostname/ip]:14000/capsule8-investigations/" backend: httpfs automated: true type: parquet partition_format: "hostname_partition={{.Hostname}}/date_partition={{.Time.Format \"2006-01-02\"}}" credentials:
blob_storage_httpfs_user: [hadoop user to write as]
blob_storage_httpfs_use_ssl: false flight_recorder: enabled: true tables: - name: "shell_commands" rows: 1000 enabled: true - name: "tty_data" rows: 1000 enabled: true - name: "connections" rows: 2000 enabled: true - name: "sensor_metadata" rows: 500 enabled: true - name: "alerts" rows: 100 enabled: true - name: "sensors" rows: 10 enabled: true - name: "process_events" rows: 4000 enabled: true - name: "container_events" rows: 300 enabled: true
Ensure you have saved the modified file above and restarted the sensor. Next check that the sensor was able to write to HttpFS by checking HDFS:
$ hdfs dfs -ls /capsule8-investigations/
This should list all of the tables that were enabled in the config:
drwxr--r-- - root supergroup 0 2020-10-27 18:33 /capsule8-investigations/alerts drwxr--r-- - root supergroup 0 2020-10-27 18:33 /capsule8-investigations/connections drwxr--r-- - root supergroup 0 2020-10-27 18:33 /capsule8-investigations/container_events drwxr--r-- - root supergroup 0 2020-10-27 18:33 /capsule8-investigations/process_events drwxr--r-- - root supergroup 0 2020-10-27 18:33 /capsule8-investigations/sensor_metadata drwxr--r-- - root supergroup 0 2020-10-27 18:33 /capsule8-investigations/sensors
2. Editing the Sensor config:
After confirming that the Capsule8 Sensor is properly configured, edit the reporting interval to a more reasonable time.
cloud_meta: auto blob_storage_create_buckets_enabled: true investigations: reporting_interval: 5m #...
3. Authentication with Kerberos (Optional)
The capsule8 sensor has the ability to write to Kerberos protected HttpFS Clusters. The four pieces of information needed in order to authenticate are:
blob_storage_httpfs_krb5_conf | The krb5 client config configured for the relevant kerberos environment. (Note: Currently the only encryption types supported by the client are: des3-cbc-sha1-kd and des3-hmac-sha1) |
blob_storage_httpfs_keytab | Path to the client keytab file |
blob_storage_httpfs_principal | The principal in the keytab to use. Ie. ""root/webserver-7fc8ddf957-f25w5.default.svc.cluster.local" |
blob_storage_httpfs_domain | The domain for the principle. Ie. EXAMPLE.COM |
cloud_meta: auto blob_storage_create_buckets_enabled: true investigations: reporting_interval: 30s sinks: - name: "[namenode hostname/ip]:14000/capsule8-investigations/" backend: httpfs automated: true type: parquet partition_format: "hostname_partition={{.Hostname}}/date_partition={{.Time.Format \"2006-01-02\"}}" credentials:
blob_storage_httpfs_auth_type: kerberos
blob_storage_httpfs_use_ssl: false
blob_storage_httpfs_krb5_conf: /etc/capsule8/krb5.conf
blob_storage_httpfs_keytab: /etc/capsule8/root.keytab
blob_storage_httpfs_principal: "root/kerberos-sidecar-7fc8ddf957-f25w5.default.svc.cluster.local"
blob_storage_httpfs_domain: "EXAMPLE.COM" flight_recorder: enabled: true tables: - name: "shell_commands" rows: 1000 enabled: true - name: "tty_data" rows: 1000 enabled: true - name: "connections" rows: 2000 enabled: true - name: "sensor_metadata" rows: 500 enabled: true - name: "alerts" rows: 100 enabled: true - name: "sensors" rows: 10 enabled: true - name: "process_events" rows: 4000 enabled: true - name: "container_events" rows: 300 enabled: true
Similar to above, deploy the config and restart the sensor. Next check that the sensor was able to write to HttpFS by checking HDFS:
$ hdfs dfs -ls /capsule8-investigations/
This should list all of the tables that were enabled in the config:
drwxr--r-- - root supergroup 0 2020-10-27 18:33 /capsule8-investigations/alerts drwxr--r-- - root supergroup 0 2020-10-27 18:33 /capsule8-investigations/connections drwxr--r-- - root supergroup 0 2020-10-27 18:33 /capsule8-investigations/container_events drwxr--r-- - root supergroup 0 2020-10-27 18:33 /capsule8-investigations/process_events drwxr--r-- - root supergroup 0 2020-10-27 18:33 /capsule8-investigations/sensor_metadata drwxr--r-- - root supergroup 0 2020-10-27 18:33 /capsule8-investigations/sensors
Comments
0 comments
Please sign in to leave a comment.