Featured Post

LTM Secret Menu

So I learned this from an issue I had where the box had rebooted but gone to the wrong image and I did not have the root password or login credentials. I needed to reboot the LTM but I only had serial/console access to it. The answer? Escape-shift-9, yeap, hold down escape, hold down shift, then press...

Read More

SNAT Translation Overflow

Posted by brian | Posted in LTM, Resolved Issues, Tips and Tricks | Posted on 21-07-2009

Tags: , , , , ,

3

My first addition to the content on this site is going to cover a recent issue I encountered with the use of SNAT’s and the classic “on a stick” architecture that F5 recommends.

On the stick implementations are something I will cover in another post, but it seems many vendors support this method because it requires very little effort or engineering to implement and has great success in smaller organizations.

SNAT’s are used when traffic is sourced from nodes (pool members) and needs to be “translated” to an accessible VIP or core-facing network address that will allow bi-directional communication. For some reason the SNAT entries are divided into two tabs in the GUI and it is very easy to ignore the second tab.

SNAT Translation

If you notice this section is not populated until there are actual SNAT translations that occur. Each translation then inherits its own settings from the default F5 configuration which is “indefinite” (pictured below)

SNAT Indefinite

The challenge with this configuration is that most users implement a single default SNAT for these on-the-stick designs which restricts you to the maximum ephemeral port listing for one IP (max of 64511). If these translations never expire, with a high connection count you will see behavior similar to that pictured below (the behavior I encountered)

SNAT Translation

I knew this could not sustain, so I looked at the preprod environment for this application which was of the “inline” architecture, its connection count was pretty stable- typical of what you would expect to see:

SNAT Translation

A quick glance at the SNAT translation table revealed that sure enough several SNAT’s were over-utilized and not expiring properly:

[thef5guru@bigip01:Active]  home # b snat translation stats show
SNAT TRANSLATION 1.1.1.1  - stats:
|     (cur, max, limit,  tot) = (53754, 121, 0, 55754)
|     (pkts,bits) in =  (18967, 101.0M), out = (16836, 102.1M)

So that appears to be the issue, a quick change of settings for the particular SNAT entry (1.1.1.1) to a reasonable timeout seems to be the fix. I decided to go with a 24hour timeout (specified in seconds-86,400)

SNAT-5

Now, after making this change I noticed that the connections were still active in the translation table, so I pondered a reset of TMM, a failover, and other scenarios before I decided that there really is no clean way to change this without disrupting traffic. The short way to do it is via the command “b conn all delete”. The end result is shown below, notice that the connections are now holding steady and it could probably even be lowered to something more like an hour (3600 seconds)

SNAT-6

My ultimate recommendation would be to implement a SNAT pool of several IP’s so that the port usage can increase substantially more than the anticipated traffic levels for your organization. In this example the option was not on the table and immediate action was needed, so be sure to analyze your particular problem and make sure the solution is appropriate.