Service Configuration
Timeouts
Services have client and server timeout.
For a service, the client timeout has no effect unless running in proxy mode, and hitting service directly. Normally, vserver client timeout overrides. If client timeout too long, resources get consumed e.g. in Fin_wait_1 (Netscaler sends FIN and client never acks)
If client timeout too short, you can port scan your end users (if using layer 2 or 3 mode)
(where the client sends a FIN, and the server sends more data, say after 8 seconds. The load balancer boxes have already dropped the state table and send packets with the server’s (Not VIP’s) IP address, and this looks like a port scan to the client.)
Server timeout on service – time before netscaler will close the connection to the service if it’s idle. E.g. time an IIS connection will be held open, hoping to be reused. In general, you want this to be less than the servers (IIS or apache’s) timeout if not using TCP Buffering
Apache KeepAliveTimeout 15 sec
IIS – 120 secs
For request/response UDP services, set –svrTimeout to 0. Changes the session that is created on receipt of UDP packet to last only 2 seconds, or until response packet is received, whichever is sooner.
Otherwise, with default of 120 seconds, Netscaler can run out of resources tracking UDP of sessions.
Client Keep alive is not necessary for clients to have persistent reused connections UNLESS the server does not support them. Then the netscaler keeps a single client connection open, and will send data from several connections to servers down the same connection to the client. But it should be on for every service just in case.
Always set Max Clients
By setting a ceiling on the number of open TCP connections between the NS and the servers, you can keep TCP connection overhead from becoming an exacerbating issue when servers are under high load. Generally, we’ve observed that web servers have a “sweet spot” — depending a great deal on the platform, size of an average response and the type of content — at which they can deliver the maximum HTTP response rate. Also, if maxClient is already set on the servers (in apache config, etc – no equivalent in IIS), then it would be important to keep the NS from attempting to open more TCP connections than the server will accept. Note: Usually need to set netscaler MaxClients setting for a service slightly lower than apache config, to allow for monitoring connections, etc.
Even if the server can sustain 10K concurrent TCP connections, that may not be the number at which it delivers the highest response rate. Because the NS will multiplex HTTP requests over keep-alive connections, and queue requests when a connection isn’t waiting to be reused, it is safe, even advantageous, to keep the number of TCP connections down. Of course, the best measure of the right value is to do some relatively thorough testing.
Things not to load balance
DNS
- DNS system is designed to follow multiple NS records. For hosts, use nscd and have multiple servers in /etc/resolv.conf.
Inbound SMTP
- Adding another MX record is a much cheaper way than buying a load balancer
ALWAYS use TCP Buffering. (I enable globally.)
Without TCP Buffering enabled, NS initiates a new connection when seeing an ACTIVE server connection getting closed, so it has one ready to use if the same client comes back. Now, if this connection is not used, apache will timeout this connection after ‘keepalive timeout’ and log an error message ‘408 request timeout’. If you want to avoid this error messages, you could set the server-timeout for this service at NS to less than apache timeout, so NS will close this idle connection(and we don’t replace if an idle connection is getting closed).
i.e. you will consume a LOT of resources on a fairly busy server by the netscaler optimizing performance, if you do not have TCP buffering enabled.
Even worse if they are ssl connections (as now netscaler opens an SSL connection for everyone close, and server has to do the SSL handshake (CPU intensive))
And they will only be released on netscaler every 2 minutes.
The zombie timer that checks for the idle connections are very costly. It has to traverse through all the connection structures. That’s why we have this timer running for every 2 min. When we have more connections, then it is really a time consuming task.
This timer value can be changed through nsapimgr -ys zombie_timeout=
Currently this value is 12007. For ex., if you wants to run at every 60 sec., then it will be
/etc/nsapimgr -ys zombie_timeout=6000.
nsconmsg -g zombie_timeout -d stats
Displaying current counter value information
Index reltime counter-value symbol-name&device-no
0 0 12007 cfg_zombie_timeout_ticks