[U-Boot] [PATCH] net: More BOOTP retry timeout improvements

From: Thierry Reding treding@nvidia.com
It's not unusual for DHCP servers to take a couple hundred milliseconds to respond to DHCP discover messages. One possible reason for the delay can be that the server checks (typically using an ARP request) that the IP it's about to hand out isn't in use yet. To make matters worse, some servers may also queue up requests and process them sequentially, which can cause excessively long delays if clients retry too fast.
Commit f59be6e850b3 ("net: BOOTP retry timeout improvements") shortened the retry timeouts significantly, but the BOOTP/DHCP implementation in U-Boot doesn't handle that well because it will ignore incoming replies to earlier requests. In one particular setup this increases the time it takes to obtain a DHCP lease from 630 ms to 8313 ms.
This commit attempts to fix this in two ways. First it increases the initial retry timeout from 10 ms to 250 ms to give DHCP servers some more time to respond. At the same time a cache of outstanding DHCP request IDs is kept so that the implementation will know to continue transactions even after a retransmission of the DISCOVER message. The maximum retry timeout is also increased from 1 second to 2 seconds. An ID cache of size 4 will keep DHCP requests around for 8 seconds (once the maximum retry timeout has been reached) before dropping them. This should give servers plenty of time to respond. If it ever turns out that this isn't enough, the size of the cache can easily be increased.
With this commit the DHCP lease on the above-mentioned setup still takes longer (1230 ms) than originally, but that's an acceptable compromise to improve DHCP lease acquisition time for a broader range of setups.
To make it easier to benchmark DHCP in the future, this commit also adds the time it took to obtain a lease to the final "DHCP client bound to address x.x.x.x" message.
Signed-off-by: Thierry Reding treding@nvidia.com --- net/bootp.c | 44 ++++++++++++++++++++++++++++++++++++-------- 1 file changed, 36 insertions(+), 8 deletions(-)
diff --git a/net/bootp.c b/net/bootp.c index a4f6db570c6c..f6dd4c2848a2 100644 --- a/net/bootp.c +++ b/net/bootp.c @@ -47,7 +47,8 @@ #define CONFIG_DHCP_MIN_EXT_LEN 64 #endif
-ulong BootpID; +ulong bootp_ids[4]; +unsigned int bootp_num_ids; int BootpTry; ulong bootp_start; ulong bootp_timeout; @@ -77,6 +78,30 @@ static char *dhcpmsg2str(int type) #endif #endif
+static void bootp_add_id(ulong id) +{ + if (bootp_num_ids >= ARRAY_SIZE(bootp_ids)) { + size_t size = sizeof(bootp_ids) - sizeof(id); + + memmove(bootp_ids, &bootp_ids[1], size); + bootp_ids[bootp_num_ids - 1] = id; + } else { + bootp_ids[bootp_num_ids] = id; + bootp_num_ids++; + } +} + +static bool bootp_match_id(ulong id) +{ + unsigned int i; + + for (i = 0; i < bootp_num_ids; i++) + if (bootp_ids[i] == id) + return true; + + return false; +} + static int BootpCheckPkt(uchar *pkt, unsigned dest, unsigned src, unsigned len) { struct Bootp_t *bp = (struct Bootp_t *) pkt; @@ -96,7 +121,7 @@ static int BootpCheckPkt(uchar *pkt, unsigned dest, unsigned src, unsigned len) retval = -4; else if (bp->bp_hlen != HWL_ETHER) retval = -5; - else if (NetReadLong((ulong *)&bp->bp_id) != BootpID) + else if (!bootp_match_id(NetReadLong((ulong *)&bp->bp_id))) retval = -6;
debug("Filtering pkt = %d\n", retval); @@ -351,8 +376,8 @@ BootpTimeout(void) #endif } else { bootp_timeout *= 2; - if (bootp_timeout > 1000) - bootp_timeout = 1000; + if (bootp_timeout > 2000) + bootp_timeout = 2000; NetSetTimeout(bootp_timeout, BootpTimeout); BootpRequest(); } @@ -616,9 +641,10 @@ static int BootpExtended(u8 *e)
void BootpReset(void) { + bootp_num_ids = 0; BootpTry = 0; bootp_start = get_timer(0); - bootp_timeout = 10; + bootp_timeout = 250; }
void @@ -631,6 +657,7 @@ BootpRequest(void) #ifdef CONFIG_BOOTP_RANDOM_DELAY ulong rand_ms; #endif + ulong BootpID;
bootstage_mark_name(BOOTSTAGE_ID_BOOTP_START, "bootp_start"); #if defined(CONFIG_CMD_DHCP) @@ -699,7 +726,8 @@ BootpRequest(void) | ((ulong)NetOurEther[4] << 8) | (ulong)NetOurEther[5]; BootpID += get_timer(0); - BootpID = htonl(BootpID); + BootpID = htonl(BootpID); + bootp_add_id(BootpID); NetCopyLong(&bp->bp_id, &BootpID);
/* @@ -960,8 +988,8 @@ DhcpHandler(uchar *pkt, unsigned dest, IPaddr_t sip, unsigned src, /* Store net params from reply */ BootpCopyNetParams(bp); dhcp_state = BOUND; - printf("DHCP client bound to address %pI4\n", - &NetOurIP); + printf("DHCP client bound to address %pI4 (%lu ms)\n", + &NetOurIP, get_timer(bootp_start)); bootstage_mark_name(BOOTSTAGE_ID_BOOTP_STOP, "bootp_stop");

On 08/18/2014 12:45 AM, Thierry Reding wrote:
From: Thierry Reding treding@nvidia.com
It's not unusual for DHCP servers to take a couple hundred milliseconds to respond to DHCP discover messages. One possible reason for the delay can be that the server checks (typically using an ARP request) that the IP it's about to hand out isn't in use yet. To make matters worse, some servers may also queue up requests and process them sequentially, which can cause excessively long delays if clients retry too fast.
Commit f59be6e850b3 ("net: BOOTP retry timeout improvements") shortened the retry timeouts significantly, but the BOOTP/DHCP implementation in U-Boot doesn't handle that well because it will ignore incoming replies to earlier requests. In one particular setup this increases the time it takes to obtain a DHCP lease from 630 ms to 8313 ms.
This commit attempts to fix this in two ways. First it increases the initial retry timeout from 10 ms to 250 ms to give DHCP servers some more time to respond. At the same time a cache of outstanding DHCP request IDs is kept so that the implementation will know to continue transactions even after a retransmission of the DISCOVER message. The maximum retry timeout is also increased from 1 second to 2 seconds. An ID cache of size 4 will keep DHCP requests around for 8 seconds (once the maximum retry timeout has been reached) before dropping them. This should give servers plenty of time to respond. If it ever turns out that this isn't enough, the size of the cache can easily be increased.
With this commit the DHCP lease on the above-mentioned setup still takes longer (1230 ms) than originally, but that's an acceptable compromise to improve DHCP lease acquisition time for a broader range of setups.
To make it easier to benchmark DHCP in the future, this commit also adds the time it took to obtain a lease to the final "DHCP client bound to address x.x.x.x" message.
Tested-by: Stephen Warren swarren@nvidia.com
So long as it's considered safe to accept DHCP responses to older requests, this approach seems fine. Having seen this patch now (I hadn't when I responded to the other thread), I guess I don't have a strong an opinion re: reverting my original patch vs. taking this; net maintainers, feel free to decide:-)

Hi Thierry,
On Mon, Aug 18, 2014 at 1:45 AM, Thierry Reding thierry.reding@gmail.com wrote:
From: Thierry Reding treding@nvidia.com
It's not unusual for DHCP servers to take a couple hundred milliseconds to respond to DHCP discover messages. One possible reason for the delay can be that the server checks (typically using an ARP request) that the IP it's about to hand out isn't in use yet. To make matters worse, some servers may also queue up requests and process them sequentially, which can cause excessively long delays if clients retry too fast.
Commit f59be6e850b3 ("net: BOOTP retry timeout improvements") shortened the retry timeouts significantly, but the BOOTP/DHCP implementation in U-Boot doesn't handle that well because it will ignore incoming replies to earlier requests. In one particular setup this increases the time it takes to obtain a DHCP lease from 630 ms to 8313 ms.
This commit attempts to fix this in two ways. First it increases the initial retry timeout from 10 ms to 250 ms to give DHCP servers some more time to respond. At the same time a cache of outstanding DHCP request IDs is kept so that the implementation will know to continue transactions even after a retransmission of the DISCOVER message. The maximum retry timeout is also increased from 1 second to 2 seconds. An ID cache of size 4 will keep DHCP requests around for 8 seconds (once the maximum retry timeout has been reached) before dropping them. This should give servers plenty of time to respond. If it ever turns out that this isn't enough, the size of the cache can easily be increased.
With this commit the DHCP lease on the above-mentioned setup still takes longer (1230 ms) than originally, but that's an acceptable compromise to improve DHCP lease acquisition time for a broader range of setups.
To make it easier to benchmark DHCP in the future, this commit also adds the time it took to obtain a lease to the final "DHCP client bound to address x.x.x.x" message.
Signed-off-by: Thierry Reding treding@nvidia.com
net/bootp.c | 44 ++++++++++++++++++++++++++++++++++++-------- 1 file changed, 36 insertions(+), 8 deletions(-)
diff --git a/net/bootp.c b/net/bootp.c index a4f6db570c6c..f6dd4c2848a2 100644 --- a/net/bootp.c +++ b/net/bootp.c @@ -47,7 +47,8 @@ #define CONFIG_DHCP_MIN_EXT_LEN 64 #endif
-ulong BootpID; +ulong bootp_ids[4];
I would prefer you make this a #define config parameter with a default value (4 seems fine) and add it to the documentation. This will make it easier for a person to adjust for their environment.
+unsigned int bootp_num_ids; int BootpTry; ulong bootp_start; ulong bootp_timeout; @@ -77,6 +78,30 @@ static char *dhcpmsg2str(int type) #endif #endif
+static void bootp_add_id(ulong id) +{
if (bootp_num_ids >= ARRAY_SIZE(bootp_ids)) {
size_t size = sizeof(bootp_ids) - sizeof(id);
memmove(bootp_ids, &bootp_ids[1], size);
bootp_ids[bootp_num_ids - 1] = id;
} else {
bootp_ids[bootp_num_ids] = id;
bootp_num_ids++;
}
+}
+static bool bootp_match_id(ulong id) +{
unsigned int i;
for (i = 0; i < bootp_num_ids; i++)
if (bootp_ids[i] == id)
return true;
return false;
+}
static int BootpCheckPkt(uchar *pkt, unsigned dest, unsigned src,
unsigned len)
{ struct Bootp_t *bp = (struct Bootp_t *) pkt; @@ -96,7 +121,7 @@ static int BootpCheckPkt(uchar *pkt, unsigned dest,
unsigned src, unsigned len)
retval = -4; else if (bp->bp_hlen != HWL_ETHER) retval = -5;
else if (NetReadLong((ulong *)&bp->bp_id) != BootpID)
else if (!bootp_match_id(NetReadLong((ulong *)&bp->bp_id))) retval = -6; debug("Filtering pkt = %d\n", retval);
@@ -351,8 +376,8 @@ BootpTimeout(void) #endif } else { bootp_timeout *= 2;
if (bootp_timeout > 1000)
bootp_timeout = 1000;
if (bootp_timeout > 2000)
bootp_timeout = 2000; NetSetTimeout(bootp_timeout, BootpTimeout); BootpRequest(); }
@@ -616,9 +641,10 @@ static int BootpExtended(u8 *e)
void BootpReset(void) {
bootp_num_ids = 0; BootpTry = 0; bootp_start = get_timer(0);
bootp_timeout = 10;
bootp_timeout = 250;
What is the impact on the time it takes for you to get an address assigned in your environment with the rest of this patch, but without this change of the initial timeout? It seems like it shouldn't impact you negatively but will still help Stephen's original case.
}
void @@ -631,6 +657,7 @@ BootpRequest(void) #ifdef CONFIG_BOOTP_RANDOM_DELAY ulong rand_ms; #endif
ulong BootpID; bootstage_mark_name(BOOTSTAGE_ID_BOOTP_START, "bootp_start");
#if defined(CONFIG_CMD_DHCP) @@ -699,7 +726,8 @@ BootpRequest(void) | ((ulong)NetOurEther[4] << 8) | (ulong)NetOurEther[5]; BootpID += get_timer(0);
BootpID = htonl(BootpID);
BootpID = htonl(BootpID);
bootp_add_id(BootpID); NetCopyLong(&bp->bp_id, &BootpID); /*
@@ -960,8 +988,8 @@ DhcpHandler(uchar *pkt, unsigned dest, IPaddr_t sip,
unsigned src,
/* Store net params from reply */ BootpCopyNetParams(bp); dhcp_state = BOUND;
printf("DHCP client bound to address %pI4\n",
&NetOurIP);
printf("DHCP client bound to address %pI4 (%lu
ms)\n",
&NetOurIP, get_timer(bootp_start)); bootstage_mark_name(BOOTSTAGE_ID_BOOTP_STOP, "bootp_stop");
-- 2.0.4
Thanks, -Joe

On Mon, Aug 18, 2014 at 01:01:17PM -0500, Joe Hershberger wrote:
Hi Thierry,
On Mon, Aug 18, 2014 at 1:45 AM, Thierry Reding thierry.reding@gmail.com wrote:
From: Thierry Reding treding@nvidia.com
It's not unusual for DHCP servers to take a couple hundred milliseconds to respond to DHCP discover messages. One possible reason for the delay can be that the server checks (typically using an ARP request) that the IP it's about to hand out isn't in use yet. To make matters worse, some servers may also queue up requests and process them sequentially, which can cause excessively long delays if clients retry too fast.
Commit f59be6e850b3 ("net: BOOTP retry timeout improvements") shortened the retry timeouts significantly, but the BOOTP/DHCP implementation in U-Boot doesn't handle that well because it will ignore incoming replies to earlier requests. In one particular setup this increases the time it takes to obtain a DHCP lease from 630 ms to 8313 ms.
This commit attempts to fix this in two ways. First it increases the initial retry timeout from 10 ms to 250 ms to give DHCP servers some more time to respond. At the same time a cache of outstanding DHCP request IDs is kept so that the implementation will know to continue transactions even after a retransmission of the DISCOVER message. The maximum retry timeout is also increased from 1 second to 2 seconds. An ID cache of size 4 will keep DHCP requests around for 8 seconds (once the maximum retry timeout has been reached) before dropping them. This should give servers plenty of time to respond. If it ever turns out that this isn't enough, the size of the cache can easily be increased.
With this commit the DHCP lease on the above-mentioned setup still takes longer (1230 ms) than originally, but that's an acceptable compromise to improve DHCP lease acquisition time for a broader range of setups.
To make it easier to benchmark DHCP in the future, this commit also adds the time it took to obtain a lease to the final "DHCP client bound to address x.x.x.x" message.
Signed-off-by: Thierry Reding treding@nvidia.com
net/bootp.c | 44 ++++++++++++++++++++++++++++++++++++-------- 1 file changed, 36 insertions(+), 8 deletions(-)
diff --git a/net/bootp.c b/net/bootp.c index a4f6db570c6c..f6dd4c2848a2 100644 --- a/net/bootp.c +++ b/net/bootp.c @@ -47,7 +47,8 @@ #define CONFIG_DHCP_MIN_EXT_LEN 64 #endif
-ulong BootpID; +ulong bootp_ids[4];
I would prefer you make this a #define config parameter with a default value (4 seems fine) and add it to the documentation. This will make it easier for a person to adjust for their environment.
Will do.
@@ -616,9 +641,10 @@ static int BootpExtended(u8 *e)
void BootpReset(void) {
bootp_num_ids = 0; BootpTry = 0; bootp_start = get_timer(0);
bootp_timeout = 10;
bootp_timeout = 250;
What is the impact on the time it takes for you to get an address assigned in your environment with the rest of this patch, but without this change of the initial timeout? It seems like it shouldn't impact you negatively but will still help Stephen's original case.
Leaving the initial timeout at 10 ms increases the time to get a lease from 1230 ms to 5350 ms. I suspect the reason for that is that the DHCP server will queue an ARP ping for each request and process them sequentially. Each of those takes about 600 ms and I see U-Boot sending out a total of 9 broadcasts. So that's 5400 ms delay only due to the requests that haven't been answered. The 8th broadcast is probably the one that U-Boot receives a response for, hence explaining the 5350 ms.
Even the 250 ms timeout will make things a lot better for Stephen's use case. Given the earlier discussion it seems like the DHCP server in his network doesn't check for existing users of an IP address using ARP pings and replies fairly quickly, so I suspect the whole process to take around 300 ms for him. That's still a lot better than the 1230 ms that it takes on my setup (which used to be somewhere around 630 ms before Stephen's original patch).
Thierry

On 08/19/2014 02:06 AM, Thierry Reding wrote:
On Mon, Aug 18, 2014 at 01:01:17PM -0500, Joe Hershberger wrote:
Hi Thierry,
On Mon, Aug 18, 2014 at 1:45 AM, Thierry Reding thierry.reding@gmail.com wrote:
From: Thierry Reding treding@nvidia.com
It's not unusual for DHCP servers to take a couple hundred milliseconds to respond to DHCP discover messages. One possible reason for the delay can be that the server checks (typically using an ARP request) that the IP it's about to hand out isn't in use yet. To make matters worse, some servers may also queue up requests and process them sequentially, which can cause excessively long delays if clients retry too fast.
Commit f59be6e850b3 ("net: BOOTP retry timeout improvements") shortened the retry timeouts significantly, but the BOOTP/DHCP implementation in U-Boot doesn't handle that well because it will ignore incoming replies to earlier requests. In one particular setup this increases the time it takes to obtain a DHCP lease from 630 ms to 8313 ms.
This commit attempts to fix this in two ways. First it increases the initial retry timeout from 10 ms to 250 ms to give DHCP servers some more time to respond. At the same time a cache of outstanding DHCP request IDs is kept so that the implementation will know to continue transactions even after a retransmission of the DISCOVER message. The maximum retry timeout is also increased from 1 second to 2 seconds. An ID cache of size 4 will keep DHCP requests around for 8 seconds (once the maximum retry timeout has been reached) before dropping them. This should give servers plenty of time to respond. If it ever turns out that this isn't enough, the size of the cache can easily be increased.
With this commit the DHCP lease on the above-mentioned setup still takes longer (1230 ms) than originally, but that's an acceptable compromise to improve DHCP lease acquisition time for a broader range of setups.
To make it easier to benchmark DHCP in the future, this commit also adds the time it took to obtain a lease to the final "DHCP client bound to address x.x.x.x" message.
void BootpReset(void) {
bootp_num_ids = 0; BootpTry = 0; bootp_start = get_timer(0);
bootp_timeout = 10;
bootp_timeout = 250;
What is the impact on the time it takes for you to get an address assigned in your environment with the rest of this patch, but without this change of the initial timeout? It seems like it shouldn't impact you negatively but will still help Stephen's original case.
Leaving the initial timeout at 10 ms increases the time to get a lease from 1230 ms to 5350 ms. I suspect the reason for that is that the DHCP server will queue an ARP ping for each request and process them sequentially. Each of those takes about 600 ms and I see U-Boot sending out a total of 9 broadcasts. So that's 5400 ms delay only due to the requests that haven't been answered. The 8th broadcast is probably the one that U-Boot receives a response for, hence explaining the 5350 ms.
Even the 250 ms timeout will make things a lot better for Stephen's use case. Given the earlier discussion it seems like the DHCP server in his network doesn't check for existing users of an IP address using ARP pings and replies fairly quickly, so I suspect the whole process to take around 300 ms for him. That's still a lot better than the 1230 ms that it takes on my setup (which used to be somewhere around 630 ms before Stephen's original patch).
I'm pretty sure I have seen ARP pings, although I guess the timeout on my DHCP server must be much more reasonable, since the whole DHCP process completes for me on the first request U-Boot sends.

On Tue, Aug 19, 2014 at 09:59:00AM -0600, Stephen Warren wrote:
On 08/19/2014 02:06 AM, Thierry Reding wrote:
On Mon, Aug 18, 2014 at 01:01:17PM -0500, Joe Hershberger wrote:
Hi Thierry,
On Mon, Aug 18, 2014 at 1:45 AM, Thierry Reding thierry.reding@gmail.com wrote:
From: Thierry Reding treding@nvidia.com
It's not unusual for DHCP servers to take a couple hundred milliseconds to respond to DHCP discover messages. One possible reason for the delay can be that the server checks (typically using an ARP request) that the IP it's about to hand out isn't in use yet. To make matters worse, some servers may also queue up requests and process them sequentially, which can cause excessively long delays if clients retry too fast.
Commit f59be6e850b3 ("net: BOOTP retry timeout improvements") shortened the retry timeouts significantly, but the BOOTP/DHCP implementation in U-Boot doesn't handle that well because it will ignore incoming replies to earlier requests. In one particular setup this increases the time it takes to obtain a DHCP lease from 630 ms to 8313 ms.
This commit attempts to fix this in two ways. First it increases the initial retry timeout from 10 ms to 250 ms to give DHCP servers some more time to respond. At the same time a cache of outstanding DHCP request IDs is kept so that the implementation will know to continue transactions even after a retransmission of the DISCOVER message. The maximum retry timeout is also increased from 1 second to 2 seconds. An ID cache of size 4 will keep DHCP requests around for 8 seconds (once the maximum retry timeout has been reached) before dropping them. This should give servers plenty of time to respond. If it ever turns out that this isn't enough, the size of the cache can easily be increased.
With this commit the DHCP lease on the above-mentioned setup still takes longer (1230 ms) than originally, but that's an acceptable compromise to improve DHCP lease acquisition time for a broader range of setups.
To make it easier to benchmark DHCP in the future, this commit also adds the time it took to obtain a lease to the final "DHCP client bound to address x.x.x.x" message.
void BootpReset(void) {
bootp_num_ids = 0; BootpTry = 0; bootp_start = get_timer(0);
bootp_timeout = 10;
bootp_timeout = 250;
What is the impact on the time it takes for you to get an address assigned in your environment with the rest of this patch, but without this change of the initial timeout? It seems like it shouldn't impact you negatively but will still help Stephen's original case.
Leaving the initial timeout at 10 ms increases the time to get a lease from 1230 ms to 5350 ms. I suspect the reason for that is that the DHCP server will queue an ARP ping for each request and process them sequentially. Each of those takes about 600 ms and I see U-Boot sending out a total of 9 broadcasts. So that's 5400 ms delay only due to the requests that haven't been answered. The 8th broadcast is probably the one that U-Boot receives a response for, hence explaining the 5350 ms.
Even the 250 ms timeout will make things a lot better for Stephen's use case. Given the earlier discussion it seems like the DHCP server in his network doesn't check for existing users of an IP address using ARP pings and replies fairly quickly, so I suspect the whole process to take around 300 ms for him. That's still a lot better than the 1230 ms that it takes on my setup (which used to be somewhere around 630 ms before Stephen's original patch).
I'm pretty sure I have seen ARP pings, although I guess the timeout on my DHCP server must be much more reasonable, since the whole DHCP process completes for me on the first request U-Boot sends.
Are you sure they are ARP pings from the server? I've seen DHCP clients using ARP pings to check their IP address before using it. Something on the order of 20 or 40 ms for an ARP ping timeout would be surprising.
Thierry
participants (3)
-
Joe Hershberger
-
Stephen Warren
-
Thierry Reding