troubleshooting attacks (might be?)

#1
My server is suddenly having very high load, more than 250, while usually, it's less than 1. A lot of connections coming in. And ps command shows defunct php

Code:
root      35253  0.0  0.0  40908  6536 ?        S    20:50   0:02 openlitespeed (lshttpd - main)
root      35254  0.0  0.0  30704  2872 ?        S    20:50   0:00 openlitespeed (lscgid)
apache    35255  3.4  0.4 107556 72268 ?        S    20:50   2:24 openlitespeed (lshttpd - #01)
apache    35256  3.4  0.4 110204 74448 ?        S    20:50   2:24 openlitespeed (lshttpd - #02)
an-user   36978  1.3  0.0      0     0 ?        Zs   21:12   0:39 [lsphp] <defunct>
root      38223  0.0  0.0      0     0 ?        S    21:28   0:00 [kworker/5:2]
root      38304  0.0  0.0      0     0 ?        S    21:30   0:00 [kworker/1:0]
root      38537  0.2  0.0      0     0 ?        S    21:35   0:04 [kworker/2:1]
an-user   38574  0.3  0.0      0     0 ?        Zs   21:36   0:05 [lsphp] <defunct>
an-user   38698  0.6  0.0      0     0 ?        Zs   21:36   0:09 [lsphp] <defunct>
an-user   38702  0.3  0.0      0     0 ?        Zs   21:36   0:04 [lsphp] <defunct>
an-user   38711  0.5  0.0      0     0 ?        Zs   21:36   0:08 [lsphp] <defunct>
an-user   38808  0.4  0.0      0     0 ?        Zs   21:36   0:05 [lsphp] <defunct>
root      39137  0.0  0.0      0     0 ?        S    21:39   0:00 [kworker/4:2]
root      39161  0.0  0.0      0     0 ?        S    21:40   0:00 [kworker/0:0]
an-user   39468  0.3  0.0      0     0 ?        Zs   21:46   0:02 [lsphp] <defunct>
an-user   39495  0.2  0.0      0     0 ?        Zs   21:47   0:02 [lsphp] <defunct>
an-user   39548  0.3  0.0      0     0 ?        Zs   21:48   0:02 [lsphp] <defunct>
an-user   39574  0.1  0.0      0     0 ?        Zs   21:48   0:01 [lsphp] <defunct>
an-user   39575  0.0  0.0      0     0 ?        Zs   21:48   0:00 [lsphp] <defunct>
an-user   39582  0.1  0.0      0     0 ?        Zs   21:48   0:01 [lsphp] <defunct>
root      39714  0.0  0.0      0     0 ?        S    21:49   0:00 [kworker/2:2]
an-user   39754  0.0  0.0      0     0 ?        Zs   21:49   0:00 [lsphp] <defunct>
an-user   39832  0.1  0.0      0     0 ?        Zs   21:49   0:01 [lsphp] <defunct>
an-user   39842  0.6  0.0      0     0 ?        Zs   21:49   0:04 [lsphp] <defunct>
an-user   39863  0.5  0.0      0     0 ?        Zs   21:49   0:03 [lsphp] <defunct>
an-user   39887  0.0  0.0      0     0 ?        Zs   21:49   0:00 [lsphp] <defunct>
root      39960  0.0  0.0      0     0 ?        S    21:50   0:00 [kworker/4:1]
an-user   40102  0.0  0.0      0     0 ?        Zs   21:52   0:00 [lsphp] <defunct>
an-user   40126  0.0  0.0      0     0 ?        Zs   21:52   0:00 [lsphp] <defunct>
an-user   40172  0.7  0.0      0     0 ?        Zs   21:53   0:03 [lsphp] <defunct>
an-user   40179  0.4  0.0      0     0 ?        Zs   21:53   0:01 [lsphp] <defunct>
an-user   40198  0.9  0.0      0     0 ?        Zs   21:53   0:04 [lsphp] <defunct>
an-user   40211  0.8  0.0      0     0 ?        Zs   21:53   0:03 [lsphp] <defunct>
an-user   40212  0.0  0.0      0     0 ?        Zs   21:53   0:00 [lsphp] <defunct>
an-user   40219  0.0  0.0      0     0 ?        Zs   21:53   0:00 [lsphp] <defunct>
an-user   40221  0.0  0.0      0     0 ?        Zs   21:53   0:00 [lsphp] <defunct>
an-user   40222  0.0  0.0      0     0 ?        Zs   21:53   0:00 [lsphp] <defunct>
The log of an-user shows connections from different IPs.

The server has 6 cores CPU, 16GB RAM. When the high load is happening, server still has around 9GB RAM available. PHP config is:

Code:
maxConns                        220
env                             PHP_LSAPI_CHILDREN=220
env                             LSPHP_ENABLE_USER_INI=on
initTimeout                     3000
memSoftLimit                    2047M
memHardLimit                    2047M
procSoftLimit                   400
procHardLimit                   500
backlog                         100
instances                       1
priority                        0
retryTimeout                    0
persistConn                     1
pcKeepAliveTimeout
respBuffer                      0
How to find out the reason for that high load, please?
Thank you.
 
Last edited:
#3
When I checked connections with the command

netstat -ntu | grep ESTABLISHED | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -nr

Most of them has only 1-2 connections per IP. Then I decided to restart the server, and it helped. The load came back to normal. Something stuck somewhere, a full reboot fixed it.
 
#4
This problem was just happening again. When it's happening, the load went up very fast. One second, the load went from 1 to 25, then 6x, 8x, 100+. I tried to killall lsphp, it didn't help. Another reboot brought everything back to normal, with the server load is 0.3. If it was an attack, after the server came back from reboot, the problem should be continue. But again, a reboot resolved the problem, just like last time.

When I killed lsphp processes, this was recorded in stderr.log
Code:
[UID:1002][88033] Child process with pid: 88058 was killed by signal: 15, core dumped: no
[UID:1002][88033] Child process with pid: 88059 was killed by signal: 15, core dumped: no
[UID:1002][88033] Child process with pid: 88060 was killed by signal: 15, core dumped: no
[UID:1002][88033] Child process with pid: 88061 was killed by signal: 15, core dumped: no
[UID:1002][88033] Child process with pid: 88062 was killed by signal: 15, core dumped: no
[UID:1002][88033] Child process with pid: 88063 was killed by signal: 15, core dumped: no
[UID:1002][88033] Child process with pid: 88064 was killed by signal: 15, core dumped: no
[UID:1002][88033] Child process with pid: 88067 was killed by signal: 15, core dumped: no
But there were still a lot of other lsphp processes when viewing in top command. No other abnormal log in error_log.
Anything else that I can check to prevent this happen again?

Thank you.
 
#6
It doesn't matter at the limit is 200 or 40. If the server is under an attack, after restarting, the limit will be reached and the load goes up again immediately. But in this case, nothing. The limit is fine, the load is less than 1 after rebooting. That means something else, not the limit or recaptcha. Is this a wrong thinking?

And the thing is when the load issue happens, the load is so high, I cannot even login to ssh. Any specific place which you want to check after rebooting?
 

Cold-Egg

Administrator
#7
I suggested DDoS defence because you mentioned "My server is suddenly having very high load, more than 250, while usually, it's less than 1. A lot of connections coming" at the beginning.

If you lower the PHP process child limit, at least, it won't generates such high load under that account , and you can check the error log and access log further. Get the log is the key to debug it.
 
Top