Josh Randall

Setting Up High Availability with Multiple Endpoint Log Hybrids

Blog Post created by Josh Randall Employee on Jan 20, 2020

HA is a common need in many enterprise architectures, so NetWitness Endpoint has some built-in capabilities that allow organizations to achieve a HA setup with fairly minimal configuration and implementation needs.

 

An overview of the setup:

  1. Install Primary and Alternate Endpoint Log Hybrids (ELH)
  2. Create a DNS CNAME record pointing to your Primary ELH
  3. Create Endpoint Policies with the CNAME record's alias value

 

The failover procedure:

  1. Change the CNAME record to point to the Alternate ELH

 

And the failback procedure:

  1. Change the CNAME record back to the Primary ELH

 

To start, you'll need to have an ELH already installed and orchestrated within your environment.  We'll assume this is your Primary ELH, where your endpoint agents will ordinarily be checking into, and what you need to have an alternate for in the event of a failure (hardware breaks, datacenter loses power, region-wide catastrophic event...whatever).

 

To install your Alternate ELH (where your endpoint agents should failover to) you'll need to follow the instructions here: https://community.rsa.com/docs/DOC-101660#NetWitne  under "Task 3 - Configuring Multiple Endpoint Log Hybrid".

**Make sure that you follow these instructions exactly...I did not the first time I set this up in my lab, and so of course my Alternate ELH did not function properly in my NetWitness environment...**

 

Once you have your Alternate ELH fully installed and orchestrated, your next step will be to create a DNS CNAME record.  This alias will be the key to entire HA implementation.  You'll want to point the record to your Primary ELH; e.g.: 

 

**Be aware that Windows DNS defaults to a 60 minute TTL...this will directly impact how quickly your endpoint agents will point to the Target Host FQDN in the CNAME record, so if 60 minutes is too long to wait for endpoints to be available during a failover you might want to consider setting the TTL lower...** (props to John Snider for helping me identify this in my lab during my testing)

 

And your last step in the initial setup will be to create Endpoint Policies that use this alias value.  In the NetWitness UI, navigate to Admin/Endpoint Sources/Policies and either modify an existing EDR policy (the Default, for instance) or create a new EDR policy.  The relevant configuration option in the EDR policy setting is the "Endpoint Server" option within the "Endpoint Server Settings" section:

 

When editing this option in the Policy, you'll want to choose your Primary ELH in the "Endpoint Server" dropdown and (the most important part) enter the CNAME record's alias as the "Server Alias":

 

Add and/or modify any additional policy settings as required for your organization and NetWitness environment, and when complete be sure to Publish your changes. (Guide to configuring Endpoint Groups and Policies here: NetWitness Endpoint Configuration Guide for RSA NetWitness Platform 11.x - Table of Contents)

 

You can test your setup and environment by running some nslookup commands from endpoints within your environment to check that your DNS CNAME is working properly and endpoints are resolving the alias to the correct target (Primary ELH at this point), as well as creating an Endpoint EDR Policy to point some active endpoint agents to the Alternate ELH (**this check is rather important, as it will help you confirm that your Alternate ELH is installed, orchestrated, and configured correctly**).

 

Prior to moving on to the next step, ensure that all your agents have received the "Updated" Policy - if any show in the UI with a "Pending" status after you've made these changes, then that means they have not yet updated:

 

Assuming all your tests come back positive and all your agents' policies are showing "Updated", you can now simulate a failure to validate that your setup is, indeed, HA-capable. This can be quite simple, and the failover process similarly straightforward.

 

  1. Shutdown the Primary ELH
  2. Modify the DNS CNAME record to point to the Secondary ELH
    1. This is where the TTL value will become important...the longer the TTL the longer it may take your endpoints to change over to the Alternate ELH
    2. Also...there is no need to change the existing Endpoint Policy, as the Server Alias you already entered will ensure your endpoints
  3. Confirm that you see endpoints talking to the Alternate ELH
    1. Run tcpdump on the Alternate ELH to check for incoming UDP and TCP packets from endpoints
    2. See hosts showing up in the UI
    3. Investigate hosts

 

After Steps 1 and 2 are complete, you can confirm that agents are communicating with the Alternate ELH by running tcpdump on that Alternate ELH to look for the UDP check-ins as well as the TCP tracking/scan data uploads:

 

...as well as confirm in the UI that you can see and investigate hosts:

 

 

Once your validation and testing is complete, you can simply revert the procedure by powering on the Primary ELH, powering off the Alternate ELH, and modifying the CNAME record to point back to the Primary ELH.

 

Of course, during an actual failure event, your only action to ensure HA of NetWitness Endpoint would be to change the CNAME record, so be sure to have a procedure in place for an emergency change control, if necessary in your organization.

Outcomes