WORKS
My first contribution to OSS, Linux. Cosmetic cleanup for a tiny bug which was introduced in 1996 when I was born. See also: [Git](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=5000b28b0b1a34144b39376318cafb8c2a0f79fd"). ``` commit 5000b28b0b1a34144b39376318cafb8c2a0f79fd Author: Kuniyuki Iwashima <kuni1840@gmail.com> Date: Tue Dec 10 02:41:48 2019 +0000 tcp: Cleanup duplicate initialization of sk->sk_state. When a TCP socket is created, sk->sk_state is initialized twice as TCP_CLOSE in sock_init_data() and tcp_init_sock(). The tcp_init_sock() is always called after the sock_init_data(), so it is not necessary to update sk->sk_state in the tcp_init_sock(). Before v2.1.8, the code of the two functions was in the inet_create(). In the patch of v2.1.8, the tcp_v4/v6_init_sock() were added and the code of initialization of sk->state was duplicated. Signed-off-by: Kuniyuki Iwashima <kuni1840@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 8a39ee794891..09e2cae92956 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -443,8 +443,6 @@ void tcp_init_sock(struct sock *sk) tp->tsoffset = 0; tp->rack.reo_wnd_steps = 1; - sk->sk_state = TCP_CLOSE; - sk->sk_write_space = sk_stream_write_space; sock_set_flag(sk, SOCK_USE_WRITE_QUEUE); ```
When you bind sockets with the *SO_REUSEPORT* option to the same port (e.g., nginx), they belong to the same structure *sock_reuseport* up to the limit managed by *sock_reuseport.max_socks*. When the number of sockets exceeds the limit, *reuseport_grow()* doubles it. The initialization is done in *__reuseport_alloc()* and *reuseport_grow()*. This commit removes the latter one. What is the default value of *max_socks*? hmm...128! So *reuseport_grow()* will never be called ;-) See also: [Git](https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=cd94ef06392ffd49e0a0e1c28bc5cd44f37f1f6b). ``` commit cd94ef06392ffd49e0a0e1c28bc5cd44f37f1f6b Author: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Date: Sat Jan 25 10:41:02 2020 +0000 soreuseport: Cleanup duplicate initialization of more_reuse->max_socks. reuseport_grow() does not need to initialize the more_reuse->max_socks again. It is already initialized in __reuseport_alloc(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net> diff --git a/net/core/sock_reuseport.c b/net/core/sock_reuseport.c index f19f179538b9..91e9f2223c39 100644 --- a/net/core/sock_reuseport.c +++ b/net/core/sock_reuseport.c @@ -107,7 +107,6 @@ static struct sock_reuseport *reuseport_grow(struct sock_reuseport *reuse) if (!more_reuse) return NULL; - more_reuse->max_socks = more_socks_size; more_reuse->num_socks = reuse->num_socks; more_reuse->prog = reuse->prog; more_reuse->reuseport_id = reuse->reuseport_id; ```
This patch set includes 4 patches. - [tcp: Remove unnecessary conditions in inet_csk_bind_conflict().](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=16f6c2518f9e0347eb54d368473ebd0904ac4298) - [tcp: bind(0) remove the SO_REUSEADDR restriction when ephemeral ports are exhausted.](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=4b01a9674231a97553a55456d883f584e948a78d) - [tcp: Forbid to bind more than one sockets haveing SO_REUSEADDR and SO_REUSEPORT per EUID.](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=335759211a327d61244580070d74f55561c35895) - [selftests: net: Add SO_REUSEADDR test to check if 4-tuples are fully utilized.](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=7f204a7de8b08542aca3c1daa96ed20e1177ba87) Without these patches, we fail to bind sockets to ephemeral ports when all of the ports are exhausted even if all sockets have SO_REUSEADDR enabled. In this case, we still have a chance to connect to the different remote hosts. I added net.ipv4.ip_autobind_reuse option and fixed the behaviour to fully utilize all space of the local (addr, port) tuples.
``` commit a594920f8747fa032c784c3660d6cd5a8ab291f8 Author: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Date: Sat Jul 11 00:57:59 2020 +0900 inet: Remove an unnecessary argument of syn_ack_recalc(). Commit 0c3d79bce48034018e840468ac5a642894a521a3 ("tcp: reduce SYN-ACK retrans for TCP_DEFER_ACCEPT") introduces syn_ack_recalc() which decides if a minisock is held and a SYN+ACK is retransmitted or not. If rskq_defer_accept is not zero in syn_ack_recalc(), max_retries always has the same value because max_retries is overwritten by rskq_defer_accept in reqsk_timer_handler(). This commit adds three changes: - remove redundant non-zero check for rskq_defer_accept in reqsk_timer_handler(). - remove max_retries from the arguments of syn_ack_recalc() and use rskq_defer_accept instead. - rename thresh to max_syn_ack_retries for readability. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Reviewed-by: Benjamin Herrenschmidt <benh@amazon.com> CC: Julian Anastasov <ja@ssi.bg> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index afaf582a5aa9..22b0e7336360 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -648,20 +648,19 @@ struct dst_entry *inet_csk_route_child_sock(const struct sock *sk, EXPORT_SYMBOL_GPL(inet_csk_route_child_sock); /* Decide when to expire the request and when to resend SYN-ACK */ -static inline void syn_ack_recalc(struct request_sock *req, const int thresh, - const int max_retries, - const u8 rskq_defer_accept, - int *expire, int *resend) +static void syn_ack_recalc(struct request_sock *req, + const int max_syn_ack_retries, + const u8 rskq_defer_accept, + int *expire, int *resend) { if (!rskq_defer_accept) { - *expire = req->num_timeout >= thresh; + *expire = req->num_timeout >= max_syn_ack_retries; *resend = 1; return; } - *expire = req->num_timeout >= thresh && - (!inet_rsk(req)->acked || req->num_timeout >= max_retries); - /* - * Do not resend while waiting for data after ACK, + *expire = req->num_timeout >= max_syn_ack_retries && + (!inet_rsk(req)->acked || req->num_timeout >= rskq_defer_accept); + /* Do not resend while waiting for data after ACK, * start to resend on end of deferring period to give * last chance for data or ACK to create established socket. */ @@ -720,15 +719,12 @@ static void reqsk_timer_handler(struct timer_list *t) struct net *net = sock_net(sk_listener); struct inet_connection_sock *icsk = inet_csk(sk_listener); struct request_sock_queue *queue = &icsk->icsk_accept_queue; - int qlen, expire = 0, resend = 0; - int max_retries, thresh; - u8 defer_accept; + int max_syn_ack_retries, qlen, expire = 0, resend = 0; if (inet_sk_state_load(sk_listener) != TCP_LISTEN) goto drop; - max_retries = icsk->icsk_syn_retries ? : net->ipv4.sysctl_tcp_synack_retries; - thresh = max_retries; + max_syn_ack_retries = icsk->icsk_syn_retries ? : net->ipv4.sysctl_tcp_synack_retries; /* Normally all the openreqs are young and become mature * (i.e. converted to established socket) for first timeout. * If synack was not acknowledged for 1 second, it means @@ -750,17 +746,14 @@ static void reqsk_timer_handler(struct timer_list *t) if ((qlen << 1) > max(8U, READ_ONCE(sk_listener->sk_max_ack_backlog))) { int young = reqsk_queue_len_young(queue) << 1; - while (thresh > 2) { + while (max_syn_ack_retries > 2) { if (qlen < young) break; - thresh--; + max_syn_ack_retries--; young <<= 1; } } - defer_accept = READ_ONCE(queue->rskq_defer_accept); - if (defer_accept) - max_retries = defer_accept; - syn_ack_recalc(req, thresh, max_retries, defer_accept, + syn_ack_recalc(req, max_syn_ack_retries, READ_ONCE(queue->rskq_defer_accept), &expire, &resend); req->rsk_ops->syn_ack_timeout(req); if (!expire && ```
This patch set addresses two issues which happen when both connected and unconnected sockets are in the same UDP reuseport group. - [udp: Copy has_conns in reuseport_grow().](https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=f2b2c55e512879a05456eaf5de4d1ed2f7757509) - [udp: Improve load balancing for SO_REUSEPORT.](https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=efc6b6f6c3113e8b203b9debfb72d81e0f3dcace)