Featured Post

“Where does he get those wonderful toys?”

Nexus 5000 I hope you got that Joker reference in the title…It is hard to concentrate when you have this sitting on your desk. The NX-OS based Nexus 5000/2000’s just arrived and I’m pretty excited to see the new things Cisco is implementing with their switch fabric. This unit has a...

Read More

LTM Secret Menu

Posted by brian | Posted in LTM, Tips and Tricks | Posted on 13-08-2009

Tags: , , , , , , ,

0

So I learned this from an issue I had where the box had rebooted but gone to the wrong image and I did not have the root password or login credentials. I needed to reboot the LTM but I only had serial/console access to it. The answer? Escape-shift-9, yeap, hold down escape, hold down shift, then press 9, you will get this menu on the serial console:

escape-shift-nine

As you can see, you have a variety of options but I would steer clear of the halt option….then you’re pretty much stuck! I’m pretty sure this works with most F5 products all though I have not verified with the Firepass pair that I have.

LTM 9.1.2 Upgrade/Migration to 9.4.6

Posted by brian | Posted in LTM, Tips and Tricks | Posted on 13-08-2009

Tags: , , , , , , ,

0

I recently encountered an environment full of older 9.1.2 code LTM’s and got into the project of upgrading them all. It was decide not to use the Enterprise Manager to do the upgrades because automation hadn’t been tested and this was a sensitive set of appliances to handle (change windows, business rules/policies, etc). Since I have encountered much of the obstacles in making the leap from 9.1.2 to 9.4.6 I am going to share my findings so nobody has to fumble through it like I did. First of all if you are attempting an upgrade, you must have support on the device or pair of devices you are upgrading. It is important to get the serial numbers (on the unit itself, or use “b platform show” on cmd line) and give F5 a call (they request two weeks notice) to setup a “Sev 4 General Assistance Proactive Case for a code Migration” which requires the following info:

MAJOR MAINTENANCE PROCESS

What to do when you are planning a major change in your network

Case Details:

Customer contact information:

Maintenance window day, time, duration and time zone:

Reason for maintenance window:

SERIAL # of all units involved:

Unit function: (Production or test lab)

Remote Access (if possible):

Remote Access IP addresses and Log In information:

So you should have all of that info handy before attempting this, it is for your best benefit as it will assure that you get quick assistance should I lead you astray. I also highly recommend if not require that you have serial/console access to the unit. You can setup a syslog server to watch the progress but this gives you little control if you are remote. A console session insures your install does not turn into a failure. I took several screen shots along the way of my upgrade so you would know that you are on the right path, basically this is all done via command-line with my method and should take about 45 minutes if done with little pause or breaks.

After opening the proactive case you should acquire the code base we will need for the interim upgrade. This is available on F5’s website and you’ll want to get the local install package would should be a large “.im” file. There are several changes when going from the 9.1 branch to the 9.4 branch and the most important/required one is the addition of the “Service check date:” in the /config/bigip.license file. When the upgrade package checks to see if the unit is under a valid support contract, the 9.4.x installer will error out when looking at a 9.3.1 or older config file.

clear_partitions

Installer Boot Image Configuration Screen Fig1.1

What I have found to work best is to leave the current running version on its own partition, in this case it is on HD1.2.  I run the installer program (#im local-install-9.3.1.37.1.im) and go into the boot image configuration screen shown in Fig1.1  I select HD1.1 and CF1.1 for “discard” so that it will purge whatever is on them now and have them fresh and ready to go. I’ve had better luck with this than choosing to install over them because then the installer will look on that partition for the bigip.license file. With the partition clear, it is much easier to proceed. Selecting both of these will give you a summary screen saying are you sure? Well, are you? If so, hit yes and t will reboot and go into a installer mode where it simply clears those partitions, then reboot back to the HD1.2 which is what we want in this case, that’s the operational instance.

discard_summary

Discard Summary Fig1.2

To the right (fig1.2) is the discard summary screen I mentioned, you’ll want to see this before you let it clear them.

Now we are ready to do the actual install. First and foremost, go into the GUI and goto the license section and re-activate your license. If this fails, you cannot proceed. For some it is easier to do via command line, use the get_dossier method and use license.f5.com to get your license file. Once this is done, run the command again (#im /var/tmp/local-install-9.3.1.37.1.im) to kick off the 9.3.1 install. Select CF1.1 as the install target because this is purely a temporary upgrade that we want to get to.

install_9.3.1_cf1-2

9.3.1 Install Summary Fig1.3

Fig 1.3 has the summary you should see before you continue. Make SURE that it is on CF1.1 before you proceed. This will reboot the box and take approximately 15 mins to install, upon which point it will by default come to the new 9.3.1 version.

Now I bet at this point you know what is next? Go ahead and install the 9.4.x code right? Nope! We still do not have the updated license check date: field because we haven’t activated this license on the 9.3.x code base. That’s the next step, go into the GUI and do it or use command line, but make sure you update/reactivate the license before you proceed.

9.4.6 Install Summary Fig1.4

9.4.6 Install Summary Fig1.4

Once you are sure this has been done, go ahead and run the command (#im local-install-9.4.6.401.0.im) and let’s do the 9.4.x code install. We want to select the open partition  as the final destination. HD1.1 will be selected as shown in Fig1.4 . Once again, this will trigger a reboot with an install. After this process is over you should see a few additional steps that did not occur in the 9.3.x install but this is normal. Congrats, if you see the screen in the picture below (fig1.5) you have succesfully migrated to 9.4.x. I always recommend reactivating your license because it helps make sure everything is current, I would also verify/test failover and be sure to run a “switchboot” and verify this is the default boot partition (hd1.2 in this example)

Fig1.5 Install complete

Fig1.5 Install complete

(I realize 9.4.7 is out at the time of this writing, but 9.4.6 is the tested standard for this environment)

SNAT Translation Overflow

Posted by brian | Posted in LTM, Resolved Issues, Tips and Tricks | Posted on 21-07-2009

Tags: , , , , ,

3

My first addition to the content on this site is going to cover a recent issue I encountered with the use of SNAT’s and the classic “on a stick” architecture that F5 recommends.

On the stick implementations are something I will cover in another post, but it seems many vendors support this method because it requires very little effort or engineering to implement and has great success in smaller organizations.

SNAT’s are used when traffic is sourced from nodes (pool members) and needs to be “translated” to an accessible VIP or core-facing network address that will allow bi-directional communication. For some reason the SNAT entries are divided into two tabs in the GUI and it is very easy to ignore the second tab.

SNAT Translation

If you notice this section is not populated until there are actual SNAT translations that occur. Each translation then inherits its own settings from the default F5 configuration which is “indefinite” (pictured below)

SNAT Indefinite

The challenge with this configuration is that most users implement a single default SNAT for these on-the-stick designs which restricts you to the maximum ephemeral port listing for one IP (max of 64511). If these translations never expire, with a high connection count you will see behavior similar to that pictured below (the behavior I encountered)

SNAT Translation

I knew this could not sustain, so I looked at the preprod environment for this application which was of the “inline” architecture, its connection count was pretty stable- typical of what you would expect to see:

SNAT Translation

A quick glance at the SNAT translation table revealed that sure enough several SNAT’s were over-utilized and not expiring properly:

[thef5guru@bigip01:Active]  home # b snat translation stats show
SNAT TRANSLATION 1.1.1.1  - stats:
|     (cur, max, limit,  tot) = (53754, 121, 0, 55754)
|     (pkts,bits) in =  (18967, 101.0M), out = (16836, 102.1M)

So that appears to be the issue, a quick change of settings for the particular SNAT entry (1.1.1.1) to a reasonable timeout seems to be the fix. I decided to go with a 24hour timeout (specified in seconds-86,400)

SNAT-5

Now, after making this change I noticed that the connections were still active in the translation table, so I pondered a reset of TMM, a failover, and other scenarios before I decided that there really is no clean way to change this without disrupting traffic. The short way to do it is via the command “b conn all delete”. The end result is shown below, notice that the connections are now holding steady and it could probably even be lowered to something more like an hour (3600 seconds)

SNAT-6

My ultimate recommendation would be to implement a SNAT pool of several IP’s so that the port usage can increase substantially more than the anticipated traffic levels for your organization. In this example the option was not on the table and immediate action was needed, so be sure to analyze your particular problem and make sure the solution is appropriate.