[U-Boot] [PATCH] w1: fix occasional enumeration failure

Sometimes enumeration fails (about 1 in 50 times on my custom board).
The underlying reason is probably electrical but Linux does not have the problem.
Comparing the Linux / u-boot implementations shows that Linux retries the error case whereas u-boot aborts early.
Removing the early abort in u-boot fixes the problem.
Signed-off-by: Martin Fuzzey martin.fuzzey@flowbird.group --- drivers/w1/w1-uclass.c | 4 ---- 1 file changed, 4 deletions(-)
diff --git a/drivers/w1/w1-uclass.c b/drivers/w1/w1-uclass.c index 5544b19..ad86bf6 100644 --- a/drivers/w1/w1-uclass.c +++ b/drivers/w1/w1-uclass.c @@ -84,10 +84,6 @@ static int w1_enumerate(struct udevice *bus) rn |= (tmp64 << i); }
- /* last device or error, aborting here */ - if ((triplet_ret & 0x03) == 0x03) - last_device = true; - if ((triplet_ret & 0x03) != 0x03) { if (desc_bit == last_zero || last_zero < 0) { last_device = 1;

On 23.11.2018 11:53, Martin Fuzzey wrote:
Sometimes enumeration fails (about 1 in 50 times on my custom board).
The underlying reason is probably electrical but Linux does not have the problem.
Comparing the Linux / u-boot implementations shows that Linux retries the error case whereas u-boot aborts early.
Removing the early abort in u-boot fixes the problem.
Signed-off-by: Martin Fuzzey martin.fuzzey@flowbird.group
drivers/w1/w1-uclass.c | 4 ---- 1 file changed, 4 deletions(-)
diff --git a/drivers/w1/w1-uclass.c b/drivers/w1/w1-uclass.c index 5544b19..ad86bf6 100644 --- a/drivers/w1/w1-uclass.c +++ b/drivers/w1/w1-uclass.c @@ -84,10 +84,6 @@ static int w1_enumerate(struct udevice *bus) rn |= (tmp64 << i); }
/* last device or error, aborting here */
if ((triplet_ret & 0x03) == 0x03)
last_device = true;
Hello Martin,
Your fix will basically make U-boot try again the same ID ? What happens if we keep getting errors at this triplet, we will be stuck in a loop ? When are we going to abort searching this ID if we keep getting errors ?
Thanks, Eugen
if ((triplet_ret & 0x03) != 0x03) { if (desc_bit == last_zero || last_zero < 0) { last_device = 1;

Hi Eugen,
On 27/11/2018 10:47, Eugen.Hristev@microchip.com wrote:
On 23.11.2018 11:53, Martin Fuzzey wrote:
Sometimes enumeration fails (about 1 in 50 times on my custom board).
The underlying reason is probably electrical but Linux does not have the problem.
Comparing the Linux / u-boot implementations shows that Linux retries the error case whereas u-boot aborts early.
Removing the early abort in u-boot fixes the problem.
Signed-off-by: Martin Fuzzey martin.fuzzey@flowbird.group
drivers/w1/w1-uclass.c | 4 ---- 1 file changed, 4 deletions(-)
diff --git a/drivers/w1/w1-uclass.c b/drivers/w1/w1-uclass.c index 5544b19..ad86bf6 100644 --- a/drivers/w1/w1-uclass.c +++ b/drivers/w1/w1-uclass.c @@ -84,10 +84,6 @@ static int w1_enumerate(struct udevice *bus) rn |= (tmp64 << i); }
/* last device or error, aborting here */
if ((triplet_ret & 0x03) == 0x03)
last_device = true;
Hello Martin,
Your fix will basically make U-boot try again the same ID ? What happens if we keep getting errors at this triplet, we will be stuck in a loop ? When are we going to abort searching this ID if we keep getting errors ?
Yes it will retry again the same 64 bit sequence after a bus reset.
So theoretically yes we could get stuck in a loop if the same error keeps occuring at the same point.
However I'm not really convinced that it will occur in real life except for completely broken W1 devices or contrived scenarios (though you could argue that this would allow a malicious W1 device to do a DOS attack on u-boot...)
The bus reset (which actually cuts the power to the bus and hence hard resets all the devices on it) should ensure that any transient weirdness due to bus noise or software bugs on the W1 devices themselves won't cause us to loop and if the bus itself breaks we will exit at the beginning of the loop on the error return from ops->bus_reset() due to non detection of the presence pulse.
That said It would be easy to add a max retry counter to ensure that the pathological case cannot occur. But Linux doesn't do that and I'm not aware of any infinite loop cases there.
If you think we need the retry limit though I'll send a V2 with that.
Regards,
Martin

On 27.11.2018 12:58, Martin Fuzzey wrote:
Hi Eugen,
On 27/11/2018 10:47, Eugen.Hristev@microchip.com wrote:
On 23.11.2018 11:53, Martin Fuzzey wrote:
Sometimes enumeration fails (about 1 in 50 times on my custom board).
The underlying reason is probably electrical but Linux does not have the problem.
Comparing the Linux / u-boot implementations shows that Linux retries the error case whereas u-boot aborts early.
Removing the early abort in u-boot fixes the problem.
Signed-off-by: Martin Fuzzey martin.fuzzey@flowbird.group
drivers/w1/w1-uclass.c | 4 ---- 1 file changed, 4 deletions(-)
diff --git a/drivers/w1/w1-uclass.c b/drivers/w1/w1-uclass.c index 5544b19..ad86bf6 100644 --- a/drivers/w1/w1-uclass.c +++ b/drivers/w1/w1-uclass.c @@ -84,10 +84,6 @@ static int w1_enumerate(struct udevice *bus) rn |= (tmp64 << i); } - /* last device or error, aborting here */ - if ((triplet_ret & 0x03) == 0x03) - last_device = true;
Hello Martin,
Your fix will basically make U-boot try again the same ID ? What happens if we keep getting errors at this triplet, we will be stuck in a loop ? When are we going to abort searching this ID if we keep getting errors ?
Yes it will retry again the same 64 bit sequence after a bus reset.
So theoretically yes we could get stuck in a loop if the same error keeps occuring at the same point.
However I'm not really convinced that it will occur in real life except for completely broken W1 devices or contrived scenarios (though you could argue that this would allow a malicious W1 device to do a DOS attack on u-boot...)
The bus reset (which actually cuts the power to the bus and hence hard resets all the devices on it) should ensure that any transient weirdness due to bus noise or software bugs on the W1 devices themselves won't cause us to loop and if the bus itself breaks we will exit at the beginning of the loop on the error return from ops->bus_reset() due to non detection of the presence pulse.
About reset cutting the power, it depends on how it's implemented, on my boards it's a bit-banged GPIO so I do not believe this happens.
That said It would be easy to add a max retry counter to ensure that the pathological case cannot occur. But Linux doesn't do that and I'm not aware of any infinite loop cases there.
If you think we need the retry limit though I'll send a V2 with that.
Your points are valid, but I think it would be good to avoid the situation all-together with a retry limit...
Anyway it's just my opinion and I let Tom or Maxime have a last word if they feel different.
Thanks for your patch,
Eugen
Regards,
Martin

On Fri, Nov 23, 2018 at 10:53:06AM +0100, Martin Fuzzey wrote:
Sometimes enumeration fails (about 1 in 50 times on my custom board).
The underlying reason is probably electrical but Linux does not have the problem.
Comparing the Linux / u-boot implementations shows that Linux retries the error case whereas u-boot aborts early.
Removing the early abort in u-boot fixes the problem.
Signed-off-by: Martin Fuzzey martin.fuzzey@flowbird.group
I saw the overall discussion and yes, OK, lets try this method.
Applied to u-boot/master, thanks!
participants (3)
-
Eugen.Hristev@microchip.com
-
Martin Fuzzey
-
Tom Rini