Linux内核TCP/IP参数分析与调优
如下图展示的是TCP的三个阶段.1,TCP三次握手. 2,TCP数据传输. 3,TCP的四次挥手。
SYN:(同步序列编号,Synchronize Sequence Numbers)该标志仅在三次握手建立的时候有效。表示一个新的TCP连接请求。
ACK:(确认编号,Acknowledgement Number)是对TCP请求的确认标志,同事提示对端系统已经成功连接所有数据。
FIN(结束标志,Finish)用来结束一个TCP会话,但对应端口仍处于开放状态,准备接受新数据。
一下分别解析11个阶段的Server端和Client端的TCP状态。
1)、LISTEN:首先服务端需要打开一个socket进行监听,状态为LISTEN. /* The socket is listening for incoming connections. 侦听来自远方TCP端口的连接请求 */
2)、SYN_SENT:客户端通过应用程序调用connect进行active open.于是客户端tcp发送一个SYN以请求建立一个连接.之后状态置为SYN_SENT. /*The socket isactively attempting toestablish a connection. 在发送连接请求后等待匹配的连接请求 */
3)、SYN_RECV:服务端应发出ACK确认客户端的SYN,同时自己向客户端发送一个SYN. 之后状态置为SYN_RECV /* A connection request has been received fromthenetwork. 在收到和发送一个连接请求后等待对连接请求的确认 */(这一过程很短暂,用netstat很难看到这种状态)
4)、ESTABLISHED: 代表一个打开的连接,双方可以进行或已经在数据交互了。/* The socket has anestablishedconnection. 代表一个打开的连接,数据可以传送给用户 */
5)、FIN_WAIT1:主动关闭(active close)端应用程序调用close,于是其TCP发出FIN请求主动关闭连接,之后进入FIN_WAIT1状态./* The socket is closed, andtheconnection is shutting down. 等待远程TCP的连接中断请求,或先前的连接中断请求的确认 */(FIN_WAIT1只出现在主动关闭的那一端,其实FIN_WAIT_1和FIN_WAIT_2状态的真正含义都是表示等待对方的FIN报文。而这两种状态的区别是:FIN_WAIT_1状态实际上是当SOCKET在ESTABLISHED状态时,它想主动关闭连接,向对方发送了FIN报文,此时该SOCKET即进入到FIN_WAIT_1状态。而当对方回应ACK报文后,则进入到FIN_WAIT_2状态,当然在实际的正常情况下,无论对方何种情况下,都应该马上回应ACK报文,所以FIN_WAIT_1状态一般是比较难见到的,而FIN_WAIT_2状态还有时常常可以用netstat看到。)
6)、CLOSE_WAIT:被动关闭(passive close)端TCP接到FIN后,就发出ACK以回应FIN请求(它的接收也作为文件结束符传递给上层应用程序),并进入CLOSE_WAIT. /* The remote end hasshut down, waitingfor the socket to close. 等待从本地用户发来的连接中断请求 */
7)、FIN_WAIT2:主动关闭端接到ACK后,就进入了FIN-WAIT-2 ./* Connection is closed, and the socket is waiting forashutdown from the remote end. 从远程TCP等待连接中断请求*/
8)、LAST_ACK:被动关闭端一段时间后,接收到文件结束符的应用程序将调用CLOSE关闭连接。这导致它的TCP也发送一个 FIN,等待对方的ACK.就进入了LAST-ACK. /* The remote end has shut down, andthe socket is closed. Waiting foracknowledgement. 等待原来发向远程TCP的连接中断请求的确认 */
9)、TIME_WAIT:在主动关闭端接收到FIN后,TCP就发送ACK包,并进入TIME-WAIT状态。/* The socket iswaiting after close tohandle packets still in the network.等待足够的时间以确保远程TCP接收到连接中断请求的确认 */(主线在主动关闭端,表示收到了对方的FIN报文,并且发送出了ACK报文,等2MSL后即可回到CLOSED可用状态了。)
10)、CLOSING: 比较少见./* Both sockets areshut down but westill don'thave all our data sent. 等待远程TCP对连接中断的确认 */
11)、CLOSED: 被动关闭端在接受到ACK包后,就进入了closed的状态。连接结束./* The socket is notbeing used. 没有任何连接状态 */
TIME_WAIT状态的形成只发生在主动关闭连接的一方。
主动关闭方在接收到被动关闭方的FIN请求后,发送成功给对方一个ACK后,将自己的状态由FIN_WAIT2修改为TIME_WAIT,而必须再等2倍 的MSL(Maximum Segment Lifetime, MSL是一个数据报在internetwork中能存在的时间)时间之后双方才能把状态 都改为CLOSED以关闭连接。目前RHEL里保持TIME_WAIT状态的时间为60秒。
TCP的三次握手状态变化:
1. Client:SYN ->Server
Client发送一个SYN到Server,此时客户端状态变为SYN_SENT.
2. Server: SYN + ACK –>Client
Server接收到SYN包,并发送ACK到Client,此时Server端状态LISTEN-> SYN_RECV
3. Client:ACK -> Server
Client收到Server的SYN和ACK,此时Server端状态:LISTEN ->SYN_RECV -> ESTABLISHED
Client端状态SYN_SENT –>ESTABLISHED
第一次握手过程中涉及到的内核参数:
net.ipv4.tcp_syn_retries=5
· (The maximum number oftimes initial SYNs for an active TCP connection attempt will beretransmitted. This value should not be higherthan 255. The defaultvalue is 5, which corresponds to approximately180seconds.)
第二次握手涉及到的参数:
一、 在这一过程中,内核有一个用来接受client发送的SYN并对SYN进行排队的队列参数,如果队列满了,就不接受新的请求,等待最后发送ack的时候允许多少个等待,前提是有足够内存。此参数是:
net.ipv4.tcp_max_syn_backlog
默认是1024,内存足够大,高并发的服务器建议提高到net.ipv4.tcp_max_syn_backlog = 16384 .
二、 其次是SYN-ACK重传,当Server向Client发送SYN+ACK没有得到相应,Server将重传,控制这个过程的参数是
tcp_synack_retries
· (The maximum number of times a SYN/ACK segment for apassive TCP connection will be retransmitted. Thisnumber should not be higher than 255.)
默认值是5,对应的时间是180秒,建议修改为
tcp_synack_retries = 1
三、 SYN Cookies 是对TCP服务器端的三次握手协议作一些修改,专门用来防范SYN Flood攻击的一种手段。它的原理是,在TCP服务器收到TCP SYN包并返回TCPSYN+ACK包时,不分配一个专门的数据区,而是根据这个SYN包计算出一个cookie值。在收到TCPACK包时,TCP服务器在根据那个cookie值检查这个TCP ACK包的合法性。如果合法,再分配专门的数据区进行处理未来的TCP连接。对应内核参数是:
net.ipv4.tcp_syncookies = {0|1}
· (Enable TCP syncookies. The kernel must be compiled with CONFIG_SYN_COOKIES. Send out syncookies when the syn backlog queue of a socket overflows. The syncookies featureattempts to protect a socket from a SYN flood attack. This should be used as a last resort, if at all. This is a violation of the TCP protocol, andconflicts with other areas of TCP such as TCP extensions. It can cause problems for clients and relays. It is not recommended as a tuning mechanism for heavilyloaded servers to help with overloaded or misconfigured conditions. For recommended alternatives see tcp_max_syn_backlog, tcp_synack_retries, andtcp_abort_on_overflow.)
·
tcp_syncookies 与 tcp_max_syn_backlog一起联合使用,防止SYN Flood攻击。
中间传输数据的过程中涉及到的内核参数:
net.ipv4.tcp_keepalive_intvl=15
net.ipv4.tcp_keepalive_probes=3
net.ipv4.tcp_keepalive_time=120
这三个参数是如果Server端和Client端一直没有数据传输,过了120秒后,第一次探测,间隔15秒后做第二次探测,直到探测3次就放弃连接。
四次挥手的状态变化:
客户端(主动发起关闭):
1.Client : FIN(M) ->Server
Client发送一个FIN给Server,请求关闭,Client由ESTABLISHED -> FIN_WAIT1
2.Server : ACK ->Client
Server收到FIN后发送ACK 确认,Server有ESTABLISHED ->CLOSE_WAIT
Client收到Server的ACK,由FIN_WAIT1->FIN_WAIT2继续等待Server发送数据
3.Server : FIN(N) ->Client
Server端状态变为ESTABLISHED ->CLOSE_WAIT ->LAST_ACK
4.Client : ACK(N+1)->Server
Client收到FIN,状态由ESTABLISHED->FIN_WAIT1->FIN_WAIT2->TIME_WAIT[2MSL超时]->closed
Server端变为ESTABLISHED ->CLOSE_WAIT ->LAST_ACK->CLOSED.
上面涉及到一个名词,2MSL (Maximum Segment Lifetime )
· The TIME_WAIT state isalso called the 2MSL wait state.
· Every implementation mustchoose a value for the maximum segment lifetime (MSL). It is the maximum amount of time any segment can exist in the network before being discarded.
· RFC793 specifies the MSLas 2 minutes. Common implementation values, however, are 30seconds, 1 minute, or 2 minutes. Recall that the limit on lifetime of the IP datagram is based on the number of hops, not a timer.
· Given an MSL for animplementation, the rule is: when TCP performs an active close, and sends the final ACK, that connection must stay in the TIME_WAIT state for twice the MSL.
· This lets TCP resend thefinal ACK in case this ACK is lost (in which case the other endwill time out and retransmit its final FIN).
· An effect of this 2MSLwait is that while the TCP connection is in the 2MSL wait, thesocket pair defining that connection cannot be reused.
· Any delayed segments thatarrive for a connection while it is in the 2MSL wait are discarded. Since the connection defined by the socket pair in the 2MSL wait cannot be reused, when we do establish a valid connection we know that delayed segments from an earlier incarnation of thisconnection cannot be misinterpreted as being part of the newconnection.
· The client, who performsthe active close, enters the 2MSL wait. The server does not. Thismeans if we terminate a client, and restart the client immediately, the new client cannot reuse the same local port number.
· Servers, however, usewell-known ports. If we terminate a server that has a connectionestablished, and immediately try to restart the server, the server cannot assign its well-known port number to its end point.
简单点理解就是,主动发送FIN的那一端最后发送了ack确认给服务器后必然经过的一个时间。TIME_WAIT(也是2MSL)状态的目的是为了防止最后client发出的ack丢失,让server处于LAST_ACK超时重发FIN。配置2MSL时间长短的服务器参数,我们需要的是Time_wait的连接可以重用,并且能迅速关闭。
控制迅速回收和重用的参数是:
net.ipv4.tcp_tw_reuse=1
net.ipv4.tcp_tw_recycle=1
注意如果是LVS-NAT服务器不推荐开启以上参数。
如果发现服务器有大量TIME_WAIT的连接,可降低tcp_fin_timeout参数(默认60),如果有这个问题出现,一般伴随的就是本地端口被占用完毕,还需要扩大端口范围:
net.ipv4.tcp_fin_timeout=20
· How many seconds towait fora final FIN packet before the socket is forcibly closed. This is strictly a violation of the TCP specification, but required to prevent denial-of-service (DoS) attacks. The default value in2.4 kernels is 60, down from 180 in2.2.
·
net.ipv4.ip_local_port_range=1024 65534
以及 TIME_WAIT的最大值:
net.ipv4.tcp_max_tw_buckets=20000
· The maximum number ofsockets in TIME_WAIT state allowed in the system. This limit exists only to prevent simple denial-of-service attacks. The default value of NR_FILE*2 is adjusted depending on the memory in the system. If this number isexceeded, the socket is closed and a warning is printed.
超过这个值的time_wait就被关闭掉了。
TCP缓冲参数
net.ipv4.tcp_mem='873800 8388608 8388608′
定义TCP协议栈使用的内存空间;分别为最小值,默认值和最大值;
· low:当TCP使用了低于该值的内存页面数时,TCP不会考虑释放内存。即低于此值没有内存压力。(理想情况下,这个值应与指定给 tcp_wmem 的第 2 个值相匹配- 这第 2 个值表明,最大页面大小乘以最大并发请求数除以页大小 (131072 * 300 / 4096)。 )
· pressure:当TCP使用了超过该值的内存页面数量时,TCP试图稳定其内存使用,进入pressure模式,当内存消耗低于low值时则退出pressure状态。(理想情况下这个值应该是 TCP 可以使用的总缓冲区大小的最大值 (204800 * 300 / 4096)。 )
· high:允许所有tcpsockets用于排队缓冲数据报的页面量。(如果超过这个值,TCP连接将被拒绝,这就是为什么不要令其过于保守 (512000 * 300 / 4096) 的原因了。在这种情况下,提供的价值很大,它能处理很多连接,是所预期的 2.5 倍;或者使现有连接能够传输 2.5 倍的数据。)
· 一般情况下这些值是在系统启动时根据系统内存数量计算得到的。
net.ipv4.tcp_rmem='4096 87380 8388608′
定义TCP协议栈用于接收缓冲的内存空间;
第一个值为最小值,即便当前主机内存空间吃紧,也得保证tcp协议栈至少有此大小的空间可用;
第二个值为默认值,它会覆盖net.core.rmem_default中为所有协议定义的接收缓冲的大小;
第三值为最大值,即能用于tcp接收缓冲的最大内存空间;
net.ipv4.tcp_wmem='4096 65536 8388608′
定义TCP协议栈用于发送缓冲的内存空间;
其他的一些参数
net.ipv4.tcp_max_orphans=262144
· The maximum number oforphaned (not attached to any user file handle) TCP sockets allowed in the system. When this number is exceeded, theorphaned connection is reset and a warning is printed. This limitexists only to prevent simple denial-of-service attacks. Lowering this limit is not recommended. Network conditionsmight require you to increase the number of orphans allowed, butnote that each orphan can eat up to ~64K of unswappablememory. The default initial value is set equal to thekernel parameter NR_FILE. This initial default is adjusted depending on the memory in the system.
系统所能处理不属于任何进程的TCPsockets最大数量。假如超过这个数量﹐那么不属于任何进程的连接会被立即reset,并同时显示警告信息。之所以要设定这个限制﹐纯粹为了抵御那些简单的 DoS 攻击﹐千万不要依赖这个或是人为的降低这个限制。如果内存大更应该增加这个值。
系统中最多有多少个TCP套接字不被关联到任何一个用户文件句柄上;如果超过这个数字,孤儿连接将即刻被复位并打印出警告信息;
这个限制仅仅是为了防止简单的DoS 攻击,不能过分依靠它或者人为地减小这个值,如果需要修改,在确保有足够内存可用的前提下,应该增大此值;
#这个数值越大越好,越大对于抗攻击能力越强
在之前公司遇到的一次incident,涉及到广告服务器backend服务器的参数,当时遇到网络丢包,tcp table被占满的情况,调整的相应参数(默认是65536):
net.ipv4.ip_conntrack_max= 196608
net.ipv4.netfilter.ip_conntrack_max= 196608
这儿所列参数是老男孩老师生产中常用的参数:
net.ipv4.tcp_syn_retries = 1
net.ipv4.tcp_synack_retries = 1
net.ipv4.tcp_keepalive_time = 600
net.ipv4.tcp_keepalive_probes = 3
net.ipv4.tcp_keepalive_intvl =15
net.ipv4.tcp_retries2 = 5
net.ipv4.tcp_fin_timeout = 2
net.ipv4.tcp_max_tw_buckets = 36000
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_max_orphans = 32768
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_max_syn_backlog = 16384
net.ipv4.tcp_wmem = 8192 131072 16777216
net.ipv4.tcp_rmem = 32768 131072 16777216
net.ipv4.tcp_mem = 786432 1048576 1572864
net.ipv4.ip_local_port_range = 1024 65000
net.ipv4.ip_conntrack_max = 65536
net.ipv4.netfilter.ip_conntrack_max=65536
net.ipv4.netfilter.ip_conntrack_tcp_timeout_established=180
net.core.somaxconn = 16384
net.core.netdev_max_backlog = 16384
内核参数的优化还是要看业务的具体应用场景和硬件参数做动态调整,这儿所列只是常用优化参数,根据参数各个定义,理解后,再根据自己生产环境而定。
转载自:https://www.linuxidc.com/Linux/2016-01/127842.htm
声明: 除非转自他站(如有侵权,请联系处理)外,本文采用 BY-NC-SA 协议进行授权 | 智乐兔
转载请注明:转自《Linux内核TCP/IP参数分析与调优》
本文地址:https://www.zhiletu.com/archives-5771.html
关注公众号:
微信赞赏支付宝赞赏