Nescaler Surge queue analysis

September 29th, 2009

We’ve been working with one client who uses a rails application behind Netscalers, and who has been having issues with connections going in to the Netscaler surge queue. The surge queue is where the Netscaler puts connections when the destination load balancing VIP does not have a service it can send them to - as all the services bound to the VIP are at their max connection limit.

Unfortunately, despite what you may think about how a surge queue works, it is not a single queue per load balancing vserver in the netscaler - instead, the Netscaler will associate a request with a service, and put it in the service’s surge queue.

So, if you have 2 services associated with a vserver, and both services have a connection limit of one, and 3 requests come in, each service will process one a request, and one service will have the extra connection placed into its surge queue. All well and good, but if the other service completes it’s request, while the one with the request in the surge queue is running a 30 second long report, the request in the queue will have to wait, even though there is an idle service.

So having traffic in the surge queue is something to avoid, if possible.

The customer in question is savvy enough to subscribe to LogicMonitor, whos Citrix Netscaler monitoring is very good - so they at least knew they were having surge queue issues.

Their main concern was being assured that production traffic was not going into the surge queue. They did not have the spare thousands to buy a separate Netscaler cluster for staging, so staging systems were running on the same Netscaler cluster as production. Given they were getting alerts for the global surge queue, the concern was whether these were production requests or not.

This customer had worked with LogicMonitor to get access to the next release of LogicMonitor, that tracks requests and surge queue levels down to the service (physical server). However, this did not seem to help. Netscaler reports surge queue levels as a gauge, rather than a counter - even polling every minute, it is quite possible to read all zero’s, when there was a spike of 50 in between the poll’s. (This is why counters are the better choice for most datapoints, if you have a choice - but in this case, Netscaler does not expose a counter.)

So, the customer was seeing global surge queue activity, but the individual services would often not show surge queue entries - so how to determine where the surges were going?

The monitoring was not reliably catching. Doing a “stat lb vserver” on the Netscaler is useless for catching transitory spikes.

The only reliable way seems to be log analysis.

So, the above is a long waffling introduction as to why good monitoring is not enough in some cases, and you still have to fall back on good old manual sysadmin skills. (Although you still need good monitoring to alert you to the problem.)

Netscalers log a bunch of information - as has been mentioned before, this can be accessed by nsconmsg.
So, ssh to the netscaler, drop to the shell, and run:

cd /var/tmp #usually this has a lot of space
nsconmsg -K /var/nslog/newnslog -s ConLb=3 -d oldconmsg > NetscalerLog

(You need to use log level 3 to get the surge queue information.)

Now, scp that NetscalerLog file to a linux host somewhere, as the grep on the netscaler is not powerful enough to process it with the useful flags.
To track each entry in the surge queue, and when it occurred, run:
cat NetscalerLog | grep -B2 'SQ([1-9])\|current' | grep -B4 'SQ' | grep 'current\|S(\|SQ'
This will give you a timestamp, then a line for each service and its surge queue, if it has a non-zero surge queue.

current time is Mon Sep 28 22:04:01 2009
S(10.1.1.213:13411:UP) Hits(23587, 0/sec, P[0, 0/sec]) ATr(2:2) Mbps(0.00) BWlmt(0 kbits) RspTime(0.00 ms) Load(0) LConn_Idx: (C:2; V:2,I:1)
Conn: CSvr(8, 0/sec) MCSvr(1) OE(1) E(1) RP(0) SQ(1)
S(10.1.1.212:13390:UP) Hits(23491, 0/sec, P[0, 0/sec]) ATr(2:2) Mbps(0.00) BWlmt(0 kbits) RspTime(0.00 ms) Load(0) LConn_Idx: (C:2; V:2,I:1)
Conn: CSvr(5, 0/sec) MCSvr(1) OE(1) E(1) RP(0) SQ(1)
current time is Mon Sep 28 22:04:57 2009
S(10.1.1.212:13390:UP) Hits(23508, 0/sec, P[0, 0/sec]) ATr(2:2) Mbps(0.00) BWlmt(0 kbits) RspTime(0.00 ms) Load(0) LConn_Idx: (C:2; V:2,I:1)
Conn: CSvr(7, 0/sec) MCSvr(1) OE(1) E(1) RP(0) SQ(1)
current time is Mon Sep 28 22:17:05 2009
S(10.1.1.237:13249:UP) Hits(68292, 0/sec, P[0, 0/sec]) ATr(2:2) Mbps(0.05) BWlmt(0 kbits) RspTime(100.16 ms) Load(0) LConn_Idx: (C:2; V:2,I:1)
Conn: CSvr(1, 0/sec) MCSvr(1) OE(1) E(1) RP(0) SQ(1)

Alternatively, if you want to see the distribution of which servers had how many queued connections in the current logfile:
cat nscon.log | grep -B2 'SQ([1-9])'   | grep -v "\-\-"| sed 's/SQ(\(.*\))/\1\n---/' | awk ' {RS="---"; print $1, "Surge queue:", $NF }'  | awk ' { services[$1]+=$NF }; END { for (i in services) { print services[i],i; }}  '
0
3 S(10.1.1.213:13411:UP)
1 S(10.1.1.212:13295:UP)
83 S(10.1.1.237:13249:UP)
0 Surge
72 S(10.1.1.236:13658:UP)
1 Other:
1 S(10.1.1.212:13764:UP)
4 S(10.1.1.212:13390:UP)
2 S(10.1.1.236:13657:UP)

(Yes, it has some garbage in there, and the awk commands could be combined, but it gets the job done, and we didn’t want to spend the clients time/money perfecting it.)

In this case it is clear now that the affected hosts where staging systems only, so the client can relax that production was not impacted, while their developers figure out why their staging systems are running slowly.

So anyway, hopefully those two command lines will help some people trace down which connections are being placed in the surge queue, and also demonstrate that good monitoring is necessary, but not sufficient.

FTPS (aka “FTP secure”, aka FTP-SSL (and not SFTP)) and apache proxy

August 13th, 2009

So FTPS is encrypted (via SSL) FTP. It’s a definite step up from regular unencrypted FTP. It is not to be confused with SFTP, which IMHO is a misnomer as it’s not really FTP at all but rather a sub-system of SSH. For the below we’re talking about ‘explicit’ FTPS, meaning it uses the normal FTP port and negotiates encryption (’implicit’ uses a dedicated port). But anyways, if you’re still here you probably know most of this…

I like to proxy all outgoing connections from any infrastructure I set up. The reasoning is something along the lines of: why should your web server be browsing the web? Basically, servers should never be allowed to initiate connections to the outside world. I’ve had firsthand experience of servers being exploited (an OpenSSL hack comes to mind) where the exploit worked but the server was unable to connect back to the outside world, preventing the bad guy from being…bad. Where a server does need to need to initiate a connection to the outside world, it should be as restricted as possible, and via proxy. With a proxy you can control the outgoing connections of all your hosts at a single point, as well as monitor all traffic if need be. It recently came to be that an app was going ballistic in my current infrastructure, making thousands of requests a second to an outside service and effectively DoS’ing it. One change to the proxy server blocked all requests to that specific external host until the engineers could reign in the app. I didn’t need to find the app making the requests…just do the blocking at the proxy. I generally set up a load balanced VIP which has two proxies behind it, for redundancy.

Anyways, I use Apache proxy as it’s easy and free and works great for my simple proxy-ing needs. The issue came up recently that an app needed to make outgoing FTPS connections. I thought this would work without any changes as FTP outgoing proxy was already configured (mod_proxy_ftp) as was mod_proxy_connect, used for proxy-ing outgoing HTTPS requests. The one gotcha that prevented it from working and took a little bit to figure out was the AllowCONNECT directive:

The AllowCONNECT directive specifies a list of port numbers to which the proxy CONNECT method may connect. Today’s browsers use this method when a https connection is requested and proxy tunneling over HTTP is in effect.

By default, only the default https port (443) and the default snews port (563) are enabled. Use the AllowCONNECT directive to override this default and allow connections to the listed ports only.

As FTPS uses SSL and hence CONNECT when making a proxy request, we need to open up port 21. Basically:

AllowCONNECT 443 21

And all is good.

jeff

Netscaler Tips, Part 4

April 30th, 2009

Monitoring Netscalers
It is possible to monitor Netscalers yourself, but we strongly recommend LogicMonitor.com for Netscaler monitoring. It has predefined everything you need to monitor in a netscaler, requires no setup, automatically finds and monitors all your VIPs, integrated caching, GSLB, policies, etc. And keeps up to date automatically with changes. (And if you’ve ever tried to convert VIP names to snmp OIDs, you’ll appreciate how much time it saves - let alone eliminating the risk of not putting VIPs in monitoring.) Plus you can make cool dashboards easily (as well as monitor all your other devices. Netapp monitoring is also excellent.)

If you are writing your own monitors for Netscalers, once you have figured out which OIDs seem good to monitor, it helps to have some info on what they mean:
CPU goes to 100% during the gzipping of the log file, but this is no cause for concern. The NS process is in control of where the CPU allocates its cycles, and prioritizes traffic management first. Once traffic management has been taken care of, the NS process allows BSD processes to use the remaining cycles. Thus, if there were higher CPU demand from the NS process due to increases in network traffic, gzip would get a smaller percentage of the cycles.
Open Established: established connections between the NetScaler and the servers.
Active Transactions: how many of those connections are being used to handle request/response pairs
Reuse Pool: Open Established minus Active Transactions. In other words, these are connections that have not yet idled out, and are waiting to handle incoming requests.

clientConnRefused - “Client connections added the SurgeQ, and blocked from initiating a server connection to control op/s”
it refers to anytime that a connection is added to the surgeQ. This will increment whenever a client connections is temporarily queued due to SP kicking in, maxClients reached, or the client’s connection had to wait for a new server side TCP connection to be built. It does not indicate timeout issues, 5xx sent, or any other error condition. Seeing this increment is an indication of at least a short term inability of the servers to handle all the connections.

The response time of the server is measured for *every* HTTP request.
-The Least Response Time algorithm uses the average response time for the most recent complete 7-second polling interval. This provides some smoothing, but the algorithm does not strive for any greater complexity.

GSLB:
The GSLB redirects the HTTP request if the request contains the HOST (in host header) as the configured GSLB domain on the NetScaler. No host header, no redirect.

Syslog
Useful to have all netscaler events sent to syslog server.
Edit /nsconfig/syslog.conf to set up remote syslog as normal
*.* @10.1.1.1
However, the default syslog flags don’t work for remote logging.
rc.conf.defaults:syslogd_flags=”-b 127.0.0.1 -n”

That sources the packets from the loopback address when sending to a remote syslog server, which doesn’t work very well.
So add to /nsconfig/rc.conf
syslogd_flags=”-s -n”

NTP
Is not enabled by default.
Set up /nsconfig/ntp.conf
And add
ntpd_enable=”YES”
to /nsconfig/rc.conf

Netscaler implementation tips, Part 3

January 8th, 2008

Security Things
Validate Backend Servers
If you have secure data, and doing SSL to back end, should ensure netscaler checks validity of certs on services. By default it does not, which means basically its just doing ip address based authentication.
set ssl service -serverAuth ENABLED
Updating SSL keys:
Make sure you use:
Update ssl certkey
to update SSL certs – otherwise you need to unbind, remove the old certkey (as two identical certificates with the same “Subject-Identifier” and “Issuer-Identifier” cannot be loaded in the kernel), add new cert and bind again – this means a few seconds of downtime.
Header Insertions
If doing header insertion (for client IP, etc) should drop requests coming in that have that header. Netscaler will just add additional header if it exists, which could lead to insecure or indeterminate behaviour in app if it depends on header.
add service www1 -http www1 HTTP 80 -gslb NONE -maxClient 125 -maxReq 10000 -cacheable NO -cip ENABLED ClientHost
add policy expression ClientHostHead HTTPHEADER ClientHost EXISTS
add ns filter NoClientHost -reqRule ClientHostHead -reqAction RESET

Debugging things
What events did the netscaler see? Services passing/failing healthchecks? Very useful.
nsconmsg -K /var/nslog/newnslog -d event
2246 0 ’server_NSSVC_HTTP_216.52.45.145:80(test)’ UP Thu Jul 26 00:43:00 2007
2255 0 ’server_NSSVC_HTTP_216.52.45.174:80(test-vip)’ UP Thu Jul 26 00:44:29 2007
2257 58522 ’server_NSSVC_HTTP_216.52.45.145:80(test)’ Out Of Service Thu Jul 26 00:45:28 2007
2258 0 ’server_NSSVC_HTTP_216.52.45.174:80(test-vip)’ DOWN Thu Jul 26 00:45:28 2007

Was the netscaler sending traffic to various services?
nsconmsg -K /var/nslog/newnslog -s ConLb=1 -d oldconmsg | grep “time\|IP OF SERVICE or VIP”

See how things are doing:
nsconmsg –d oldconsmsg –s FIELD
Case sensitive for Field.
nsdebug_pe 1 = interface debug
ConDebug Connection info debug. 1= basic, 2= detailed, 3= all sorts of stuff about internal TCP parameters
ConLb 1= Load balancing debug
ConCSW 1=content switching debug
ConSSL 1=ssl Debug
ConCMP 1=compression debug
ConIC 1=integrated caching debug

e.g. Evaluate compression:
nsconmsg -s ConCMP=1 -d oldconmsg
CMPResps:CRes=547 Cin=20304690 Cout=6830730 Cratio=2.97(34%)
Response: Res=17649 Rin=161486642 Rout=148012682 Rratio=1.09(92%)

Compressible traffic being compressed by 66%; total only 8%

nsconmsg -s ConDebug=1 -d oldconmsg
Displaying debug performance information
Performance Data Record Version 2.0

current time is Thu Jul 19 11:56:23 2007
HTTP: Req(41580876512 1.1(39141733520) 1.0(1733429699)Get(38133042089) Postp(1966228573) Others(1481605850)) Res(41496471614 1.1(40630248963) 1.0(866222651) Pipe(11644297))
HTTP: Req/s(2623 1.1(92%) 1.0(5%) time=1) avgReq/s(0 1.1(0%) 1.0(0%) time=0)
HTTP: Res/s(2602 1.1(95%) 1.0(4%) time=1) avgRes/s(0 1.1(0%) 1.0(0%) time=0)

Note: 5% of requests are HTTP1.0. Oddly, so are 4% of responses. (Old servers?)

Examine response time (Time to first byte) of services, vservers:
To see live data:
nsconmsg -f “*svr_ttfb*” -d current
To see data in current log file, from start of log file:
nsconmsg -K /var/nslog/newnslog -f “*svr_ttfb*” -d current #historical

Nstcptrace.sh - very handy.
Can also use
/etc/nsapimgr -K nstrace3 -d netraces
to look at trace files saved with nstrace

SuSE 10 enterprise server RUG errors after SP1 upgrade

September 19th, 2007

After upgrading a bunch of SLES 10 servers to SP1, some (but not all) servers were now getting 401 Login Failed errors when doing a rug refresh (or any rug actions) from novell’s update servers. On some servers, deleting the update sources in Yast2 and reregistering via the customer center solved the issues, but on some it did not. (The issue only occurred on servers I upgraded to SP1, not where I did an install from SP1 sources.)
On the problematic ones:
Those update sources were listed as ZYPP sources, not NU. (I was letting yast create them via synchronizing with zenworks)
I had to:
remove old services:
rug sd SLES10-SP1-Online
rug sd SLES10-SP1-Updates
Add the new one (with the type specified, or it errored):
rug sa -t nu https://nu.novell.com

Subscribe to the new catalogs:
rug ca

Sub’d? | Name |
Service
——-+——————————————————–+——————————————————-
Yes | SUSE-Linux-Enterprise-Server-i386-10-0-20070605-044231 |
SUSE-Linux-Enterprise-Server-i386-10-0-20070605-044231
| SLES10-SP1-Online |
https://nu.novell.com
| SLES10-SP1-Updates |
https://nu.novell.com
| SLE10-SP1-Debuginfo-Updates |
https://nu.novell.com

#rug sub SLES10-SP1-Online
Subscribed to ‘SLES10-SP1-Online’

#rug sub SLES10-SP1-Updates
Subscribed to ‘SLES10-SP1-Updates’

rug worked then, but I still had 401 errors in Yast2 (when going to
online update, customer center, or installation sources)

To resolve them, I then had to remove the update sources in Yast2,
re-run the Novell customer center configuration, and then things seem OK.

Netscaler implementation tips, Part 2

September 7th, 2007

Service Configuration

Timeouts

Services have client and server timeout.
For a service, the client timeout has no effect unless running in proxy mode, and hitting service directly. Normally, vserver client timeout overrides. If client timeout too long, resources get consumed e.g. in Fin_wait_1 (Netscaler sends FIN and client never acks)

If client timeout too short, you can port scan your end users (if using layer 2 or 3 mode)
(where the client sends a FIN, and the server sends more data, say after 8 seconds. The load balancer boxes have already dropped the state table and send packets with the server’s (Not VIP’s) IP address, and this looks like a port scan to the client.)

Server timeout on service – time before netscaler will close the connection to the service if it’s idle. E.g. time an IIS connection will be held open, hoping to be reused. In general, you want this to be less than the servers (IIS or apache’s) timeout if not using TCP Buffering
Apache KeepAliveTimeout 15 sec
IIS – 120 secs

For request/response UDP services, set –svrTimeout to 0. Changes the session that is created on receipt of UDP packet to last only 2 seconds, or until response packet is received, whichever is sooner.
Otherwise, with default of 120 seconds, Netscaler can run out of resources tracking UDP of sessions.

Client Keep alive is not necessary for clients to have persistent reused connections UNLESS the server does not support them. Then the netscaler keeps a single client connection open, and will send data from several connections to servers down the same connection to the client. But it should be on for every service just in case.

Always set Max Clients

By setting a ceiling on the number of open TCP connections between the NS and the servers, you can keep TCP connection overhead from becoming an exacerbating issue when servers are under high load. Generally, we’ve observed that web servers have a “sweet spot” — depending a great deal on the platform, size of an average response and the type of content — at which they can deliver the maximum HTTP response rate. Also, if maxClient is already set on the servers (in apache config, etc – no equivalent in IIS), then it would be important to keep the NS from attempting to open more TCP connections than the server will accept. Note: Usually need to set netscaler MaxClients setting for a service slightly lower than apache config, to allow for monitoring connections, etc.
Even if the server can sustain 10K concurrent TCP connections, that may not be the number at which it delivers the highest response rate. Because the NS will multiplex HTTP requests over keep-alive connections, and queue requests when a connection isn’t waiting to be reused, it is safe, even advantageous, to keep the number of TCP connections down. Of course, the best measure of the right value is to do some relatively thorough testing.

Things not to load balance

DNS
- DNS system is designed to follow multiple NS records. For hosts, use nscd and have multiple servers in /etc/resolv.conf.
Inbound SMTP
- Adding another MX record is a much cheaper way than buying a load balancer

ALWAYS use TCP Buffering. (I enable globally.)

Without TCP Buffering enabled, NS initiates a new connection when seeing an ACTIVE server connection getting closed, so it has one ready to use if the same client comes back. Now, if this connection is not used, apache will timeout this connection after ‘keepalive timeout’ and log an error message ‘408 request timeout’. If you want to avoid this error messages, you could set the server-timeout for this service at NS to less than apache timeout, so NS will close this idle connection(and we don’t replace if an idle connection is getting closed).

i.e. you will consume a LOT of resources on a fairly busy server by the netscaler optimizing performance, if you do not have TCP buffering enabled.
Even worse if they are ssl connections (as now netscaler opens an SSL connection for everyone close, and server has to do the SSL handshake (CPU intensive))
And they will only be released on netscaler every 2 minutes.

The zombie timer that checks for the idle connections are very costly. It has to traverse through all the connection structures. That’s why we have this timer running for every 2 min. When we have more connections, then it is really a time consuming task.
This timer value can be changed through nsapimgr -ys zombie_timeout=.
Currently this value is 12007. For ex., if you wants to run at every 60 sec., then it will be
/etc/nsapimgr -ys zombie_timeout=6000.
nsconmsg -g zombie_timeout -d stats
Displaying current counter value information
Index reltime counter-value symbol-name&device-no
0 0 12007 cfg_zombie_timeout_ticks

“Linux kernel must be loaded before initrd” error with Autoyast in SLES SP1

August 15th, 2007

Something odd that may save some people time to figure out…The same autoyast files that worked fine with SLES 10 suddenly started generating “Linux kernel must be loaded before initrd” grub errors when I was using SLES10 SP1 install sources. Easy enough to fix interactively (insert a line specifying where the kernel is), but why is it happening, and how to automate? (All systems administration really gets to the point where you are just automating things. You don’t want to have to rely on people to DO things, or you will be disappointed.)

Rather than go and pore through the release notes in the hope that I could glean what may have changed, I just did a manual install over the network, and compared the generated reference autoyast file to the one I was serving from my boot server. This is what I found:

In the bootloader section of your autoyast.xml file, rather than

<section>
        <append>resume=/dev/sda1  splash=silent showopts pnpacpi=off</append>
        <initial>1</initial>
        <initrd>/boot/initrd</initrd>
        <kernel>/boot/vmlinuz</kernel>
        <lines_cache_id>1</lines_cache_id>
        <name>SUSE Linux Enterprise Server 10</name>
        <original_name>linux</original_name>
        <root>/dev/sda2</root>
        <type>image</type>
      </section>

just add a line

        <image>/boot/vmlinuz</image>

before the kernel line.

That will get your installation to complete, but not to boot correctly after the install. For that, you also need these lines:

      <activate>true</activate>
      <boot_root>true</boot_root>

in the global section of the bootloader section.

Netscaler implementation tips, Part 1

August 14th, 2007

We’ve been using Netscalers since 1992, and in that time have found many gotchas, bugs and caveats. (That being said, having also used Foundrys, F5, Arrowpoint, cisco 6500 CSM blades, and a variety of other load balancers, Netscalers are still the load balancing system we recommend for most high volume clients.)

Many of the issues have been resolved with new software releases, but these principles and issues below are current as of August 2007.

The design principles are just what we have found to work best with almost all of our clients.

Overall Principle (as in everything – KISS)

Load Balancing Methods

  • Least Connection – use almost always. Counts only connections that have active transactions, not just TCP connections that are in reuse pool. Thus it compensates for differing speed hardware.
  • Least Response time – use with vastly differently performing hardware bound to same vserver. (Will try to keep response time roughly same. Least Connection would keep connections same, so fast machines would do a lot more, but those users that hit a slow machine may have much longer transaction times.)
  • URL Hashing – to split traffic based on URL. E.g. in use for netcaches, so they only have to cache half the possible set or URLs.
  • Token hashing can be used to ensure same clients hitting different services go to same real server. I prefer persistence groups.

Persistence – best to use Cookie insert, no timeout (so creates only session based cookie). Uses no resources on netscaler to track.

Active/Standby systems

Netscaler does allow a backup vserver to be defined on a vserver – this means that if primary vserver is down, content is served from backup vserver. When primary is up, it takes over active role again.

Netscaler does not have a way to switch active/backup roles – i.e. if server B becomes active, keep it active, and make server A the new backup. (Think DB cluster where you want everything to go to same node, or any service that keeps state.)

My workaround: define persistence to be source IP based, netmask of 0.0.0.0, timeout of one day. All traffic will, because of the netmask, go to the same server as the first connection. If that server fails healthchecks, all traffic will go, and stick, to the other server, even when the first server comes alive again.

Content Switching

I usually make every web VIP a content switching VIP. Provides easy way to scale out performance without needing developers to change urls.

e.g. if a web site is busy, its trivial to split of requests for images to a different server; php files to another, everything else to regular server; or even, for SSL sites, send images (which are not confidential) to same server, but not encrypted, saving server load.

Conceptually, it’s just binding a cs vserver to lb vservers, in the same way they are bound to services, but with rules to say what goes to what.

e.g.

 

add cs policy  gifredirect -url /*.gif
add lb vserver lb-www.site.com-http http 10.1.1.10 80
add lb vserver lb-www.site.com-ssl ssl 10.1.1.10 443
add cs vserver cs-www.site.com-ssl ssl 201.1.1.1 443
bind cs vserver  cs-www.site.com-ssl lb-www.site.com-http -policyName gifredirect
bind cs vserver  cs-www. site.com-ssl lb-www.site.com-http-ssl

  • use RFC1918 addresses for lb vips that are only targets of cs vservers. Do not even route that subnet within your IGP.
  • CS vservers NEVER go down. If the lb vservers behind them are down, they say service unavailable. Even though they let you define a backup vserver, it will never be used, and the cs vserver will never be down.

Cleanest Network Design (I.M.O.)

Netscaler on a stick.

Disable L2 mode and L3 Mode. Aggregate interfaces together into one channel. All traffic that the netscaler processes is to/from a VIP or a service.

Not always possible if you have services that need client IP.

  • Avoid USIP if possible. (Must disable surge protection; very little connection reuse, etc).
  • Preferable to use ClientIP header insertion.

My preferred way if necessary to see client IP in IP packet is to enable L3 mode, have another channel interface on the netscaler be in the vlan of the servers, and have the servers use netscaler as default route. Not to use DSR. (DSR gives up all the acceleration features.)

- always set flow control RXTX (don’t trust the negotation)

- always enable pMTU discovery.

- Mac Based Forwarding Off

Mac Based Forwarding

Will return packets to the mac address they came in on, on a per connection basis (maps the incoming SYN packet’s source mac address.)

With HSRP in place, means all connections existing at time of HSRP failover will break, instead of potentially being able to use TCP retransmit to survive the HSRP failover.

(Packets are sourced with the mac of the cisco interface; only ARPs have the HSRP mac.)

Short interruption only, but w/o MBF, could recover via retransmissions.

Secret ICMP limiting

Applies to any traffic flowing through the netscaler (not just to netscaler IPs)

Can cause issues with monitoring. Personally, I rate limit ICMPs in catalyst switches on network ingress, which do it in hardware, and disable this rate limit. (Although its high enough now not to cause issues.)

From shell:

nsconmsg -g icmp_cur_ratethreshold -d stats

Displaying current counter value information

Index reltime counter-value symbol-name&device-no

0 0 100 icmp_cur_ratethreshold

So this is 100 packets allowed per 10 ms, or 10,000 ICMPs /sec. (Has been going up per release. Started at 200 per sec – which is 100 pings with replies)

Disable – set to 0.

/etc/nsapimgr -ys icmp_rate_threshold=0

High Availability Xen

August 14th, 2007

More clients are having us implement Xen solutions (and for good reason - its perfect for development, QA and some staging environments to increase your capabilities without blowing your hardware budget. We’ve created deployments where the developers of our clients can spawn their own Xen images, freshly built using the same processes as production systems (autoyast with different profiles, cfengine, etc), and they can also trigger LVM snapshots of their VM’s before they try something odd, so if it blows up they can revert back to the running image as it was before the explosion. I think that’s cool. :-) )

However, one thing I’d never gotten around to resolving fully was how to make a high availability (HA) xen master (dom 0 server). I usually set up most servers with NIC bonding for HA (in some cases we use OSPF running on the hosts advertising a path to a loopback interface, to which the service is bound.) However, out of the box, Xen does not play very nicely with bonded NICs (at least not with SLES 10, which most of our clients use.)

Out of the box, as soon as xend starts, it will assume that your main interface is eth0 (not bond0) and do various things to it which break your connectivity. (If you find yourself in this state, do: /etc/xen/scripts/network-bridge stop; service network restart to get backto your initial network.)

The short answer you may be looking for to make xen work with bonding: edit /etc/xen/xend-config.sxp and change this line:

(network-script network-bridge)

to this:

(network-script ‘network-bridge netdev=bond0′)

and it will work. You will get errors in your syslog “bond0: received packet with own address as source address”, but these are cosmetic. The rest of this post is about investigating them, but the above is all you need to know.

OK, I thought, I should be able to resolve these messages - networking is my specialty. I was a CCIE many years ago (long enough that I can’t even remember when I stopped bothering to renew it, as I was not seeing the value) , and I know even more now. However - you cannot resolve them. As soon as a bonded NIC is part of a bridge interface, even with no other members, even on a non-XEN kernel system without the netback kernel modules, they occur. (This would be because a broadcast or multicast packet goes out NIC1 in the bond, and is flooded to all ports in the same vlan, including NIC2 in the bond. I’m guessing that the kernel has code to filter/ignore such packets if they come from a bond, but when the bond passes them to bridge first, the kernel sees the packets coming from a bridge, so doesn’t apply the same logic.)

There are some workarounds, if you like:

  • Don’t run bonding, just set up your NICs as part of the linux bridge, and use STP for fault tolerance. It will resolve those messages, but I don’t like this one much as:
    • its only old 802.1d spanning tree, so slow convergence
    • by default the linux bridge will be the root of your STP. Not good.
    • spanning tree should be avoided where possible - it’s just more error prone than layer 3.
  • Don’t use active-backup mode 1 bonding, but instead use 802.3ad link aggregation for bonding. However, as we want High Availability, only use this if you are connecting to switches that support link aggregation across multiple physical units (such as Cisco’s 3750 series.) This is the best solution, except that you are now trusting increased complexity in the switches (virtualized stacking) to do the right thing.

UPDATE: I spoke to soon above - the Dom U images installed on a system with bonding set up above do not quite work. They work to some hosts (such as the host used to install the OS via network, which is what led me to jump to conclusions above), but with SLES 10 SP1, they suffer from the bug described here
The xen bridge is incorrectly interpreting the broadcast packets coming in through the other bonded NIC as meaning that the MAC address of the Dom U hosts is reachable out that NIC, so it doesn’t pass on the packets destined to dom U.

So the only solutions until Novell patches/updates the kernel are:
- patch the kernel yourself. (But then if you were going to do that, you wouldn’t be running SLES.)
- don’t run bonding, and run a single NIC
- don’t run bonding, and use 802.1d spanning tree with both nics in the bridge
- run 802.3ad link aggregation, with both NICs going to same switch (or switch image).

The latter is the best option at this point.