[U-Boot] RFC - How to speed up multiplexed input between serial and network?

Hi
We are trying to use U-Boot that it can be remote controlled over netconsole and in locally over the serial terminal. We were quite successful but we saw some latency issues on the serial terminal. The polling of the serial driver is too slow to get all characters. This does not allow you to e.g. to copy/paste, most of the characters are lost.
We analyzed the code and tried to speed it up, without the required improvement. The tests are done with an MPC852@66MHz and an MPC8247.
In the file common/console.c we added hooks to measure the time for tstc() execution. The measured time are: serial-driver 3 Microseconds nc 15 Milliseconds
The result is, that the serial interface is polled only every 15 Millisecond. On the serial interface with a line-rate of 115200 we receive aprox 10'000 Character every second. This is one character every 100 Microsecond.
The serial driver has one buffer-descriptor with the space for one character. This results in a maximal polling period of 100 Microseconds.
The HW-FIFO for a MPC852T is 2 bytes.
There are 2 possibilities to solve the problem: ----------------------------------------------- a) make the netconsole faster b) make serial more "robust" and allow more latency
The better solution is of course to make the netconsole faster. But can we reach 100 Microseconds? We can reduce it (as already done e.g. accelerate the readout of env-variables). To accelerate by factor 150 we need to do major changes e.g. read-out the env if changed so we need a mechanism to see this.
On the other hand we can enhance the serial driver to "absorb" e.g. one line that allows you to copy/paste. This is not a big code change but it needs more dp-ram.
The copy/paste test shows the following result copy paste 0123456789 -> 0 -> first character
a) So I tried to make the netconsole faster with the optimisation of tstc() ------------------------------------------------------------------------ --- There is the possibility to do the getenv() only if the env is changed. I added a "transactionId" what is incremented after every write to env. So the user of env can check if the env changed and only read if changed. This reduced the tstc() of nc to 60 Microseconds. So the polling of serial is done every 70 Microseconds. In principle this should be fast enough to be able to copy paste copy paste 0123456789 -> 013679 -> 50%
Why are we receiving only half of the character? This due to the fact that processing a character needs time. If we check how often we call getc() while copy/paste, this is every 180 Microsecond. The method getc() do not need lot of time, but the received character is sent over nc before we get the next char. I think we cannot avoid this.
I do not see how we can reduce this time even further.
The measurement is also done without nc. There the getc() is called every 80-90 Microseconds. So we see that is little headroom to do additional processing!
b) Make the serial driver more "robust" to absorb bursts -------------------------------------------------------- I think it would make sense to be able to absorb the burst of one line e.g. 128 character.
This can be done in 2 way: b1) use more buffer descriptor with one character b2) use the feature of smc to allow multi-character buffer
b1) driver with multi buffer descriptor --------------------------------------- This is the possibility that is quite simple to implement, but needs more resources. I have already sent this. The required dual-port-memory is high 128 bd * 8 byte plus 128 byte for character = 1152 byte more. (I also implemented this driver)
b2) driver with multi-character buffer -------------------------------------- I have implemented this driver for MPC852T (SMC) and attached a patch. The additional use of DP-RAM is the size the buffer (e.g. 128 bytes) and 4 bytes for an index to the next character to read. A define can be used to specify the size of the buffer. If undefined the size is 1.
Conclusion: ----------- I do not see a good chance to be able to reduce the processing time in the netconsole below 100 Microseconds.
I expect copy/paste to work for a line (128 characters).
So I propose to enhance the serial driver.
Best regards, Stefan Bigler

Hi
Here is patch for the driver for MPC8xx (SMC). If the idea is accepted the then the driver for MPC82xx will be enhanced as well.
Best regards, Stefan Bigler
diff --git a/cpu/mpc8xx/serial.c b/cpu/mpc8xx/serial.c index ad02299..20440cd 100644 --- a/cpu/mpc8xx/serial.c +++ b/cpu/mpc8xx/serial.c @@ -108,17 +108,30 @@ static void smc_setbrg (void) serial_setdivisor(cp); }
+ +typedef volatile struct SerialBuffer { + cbd_t rxbd; /* Rx BD */ + cbd_t txbd; /* Tx BD */ +#ifdef CONFIG_SMC_RXBUFLEN + uint rxCharIndex; /* index for next character to read */ + volatile uchar rxbuf[CONFIG_SMC_RXBUFLEN]; /* rx buffers */ +#else + volatile uchar rxbuf[1]; /* rx buffers */ +#endif + volatile uchar txbuf; /* tx buffers */ +} SerialBuffer; + static int smc_init (void) { volatile immap_t *im = (immap_t *)CFG_IMMR; volatile smc_t *sp; volatile smc_uart_t *up; - volatile cbd_t *tbdf, *rbdf; volatile cpm8xx_t *cp = &(im->im_cpm); #if (!defined(CONFIG_8xx_CONS_SMC1)) && (defined(CONFIG_MPC823) || defined(CONFIG_MPC850)) volatile iop8xx_t *ip = (iop8xx_t *)&(im->im_ioport); #endif uint dpaddr; + SerialBuffer* rtx;
/* initialize pointers to SMC */
@@ -194,23 +207,26 @@ static int smc_init (void) */
#ifdef CFG_ALLOC_DPRAM - dpaddr = dpram_alloc_align (sizeof(cbd_t)*2 + 2, 8) ; + /* allocate + * the size of struct SerialBuffer with bd rx/tx, buffer rx/tx and rx index + */ + dpaddr = dpram_alloc_align ((sizeof(SerialBuffer)), 8); #else dpaddr = CPM_SERIAL_BASE ; #endif
+ rtx = (SerialBuffer*)&cp->cp_dpmem[dpaddr]; /* Allocate space for two buffer descriptors in the DP ram. * For now, this address seems OK, but it may have to * change with newer versions of the firmware. * damm: allocating space after the two buffers for rx/tx data */
- rbdf = (cbd_t *)&cp->cp_dpmem[dpaddr]; - rbdf->cbd_bufaddr = (uint) (rbdf+2); - rbdf->cbd_sc = 0; - tbdf = rbdf + 1; - tbdf->cbd_bufaddr = ((uint) (rbdf+2)) + 1; - tbdf->cbd_sc = 0; + rtx->rxbd.cbd_bufaddr = (uint) &rtx->rxbuf; + rtx->rxbd.cbd_sc = 0; + + rtx->txbd.cbd_bufaddr = (uint) &rtx->txbuf; + rtx->txbd.cbd_sc = 0;
/* Set up the uart parameters in the parameter ram. */ @@ -256,13 +272,21 @@ static int smc_init (void)
/* Make the first buffer the only buffer. */ - tbdf->cbd_sc |= BD_SC_WRAP; - rbdf->cbd_sc |= BD_SC_EMPTY | BD_SC_WRAP; + rtx->txbd.cbd_sc |= BD_SC_WRAP; + rtx->rxbd.cbd_sc |= BD_SC_EMPTY | BD_SC_WRAP;
+#ifdef CONFIG_SMC_RXBUFLEN + /* multi-character receive. + */ + up->smc_mrblr = CONFIG_SMC_RXBUFLEN; + up->smc_maxidl = 10; + rtx->rxCharIndex = 0; +#else /* Single character receive. */ up->smc_mrblr = 1; up->smc_maxidl = 0; +#endif
/* Initialize Tx/Rx parameters. */ @@ -285,11 +309,16 @@ static int smc_init (void) static void smc_putc(const char c) { - volatile cbd_t *tbdf; - volatile char *buf; volatile smc_uart_t *up; volatile immap_t *im = (immap_t *)CFG_IMMR; volatile cpm8xx_t *cpmp = &(im->im_cpm); + SerialBuffer *rtx; + + up = (smc_uart_t *)&cpmp->cp_dparam[PROFF_SMC]; +#ifdef CFG_SMC_UCODE_PATCH + up = (smc_uart_t *)&cpmp->cp_dpmem[up->smc_rpbase]; +#endif + rtx = (SerialBuffer *)&cpmp->cp_dpmem[up->smc_rbase];
#ifdef CONFIG_MODEM_SUPPORT if (gd->be_quiet) @@ -299,24 +328,12 @@ smc_putc(const char c) if (c == '\n') smc_putc ('\r');
- up = (smc_uart_t *)&cpmp->cp_dparam[PROFF_SMC]; -#ifdef CFG_SMC_UCODE_PATCH - up = (smc_uart_t *) &cpmp->cp_dpmem[up->smc_rpbase]; -#endif - - tbdf = (cbd_t *)&cpmp->cp_dpmem[up->smc_tbase]; - - /* Wait for last character to go. - */ - - buf = (char *)tbdf->cbd_bufaddr; - - *buf = c; - tbdf->cbd_datlen = 1; - tbdf->cbd_sc |= BD_SC_READY; + rtx->txbuf = c; + rtx->txbd.cbd_datlen = 1; + rtx->txbd.cbd_sc |= BD_SC_READY; __asm__("eieio");
- while (tbdf->cbd_sc & BD_SC_READY) { + while (rtx->txbd.cbd_sc & BD_SC_READY) { WATCHDOG_RESET (); __asm__("eieio"); } @@ -333,29 +350,39 @@ smc_puts (const char *s) static int smc_getc(void) { - volatile cbd_t *rbdf; - volatile unsigned char *buf; volatile smc_uart_t *up; volatile immap_t *im = (immap_t *)CFG_IMMR; volatile cpm8xx_t *cpmp = &(im->im_cpm); - unsigned char c; + SerialBuffer *rtx;
up = (smc_uart_t *)&cpmp->cp_dparam[PROFF_SMC]; #ifdef CFG_SMC_UCODE_PATCH up = (smc_uart_t *) &cpmp->cp_dpmem[up->smc_rpbase]; #endif
- rbdf = (cbd_t *)&cpmp->cp_dpmem[up->smc_rbase]; + rtx = (SerialBuffer *)&cpmp->cp_dpmem[up->smc_rbase]; + unsigned char c;
/* Wait for character to show up. */ - buf = (unsigned char *)rbdf->cbd_bufaddr; - - while (rbdf->cbd_sc & BD_SC_EMPTY) + while (rtx->rxbd.cbd_sc & BD_SC_EMPTY) WATCHDOG_RESET ();
- c = *buf; - rbdf->cbd_sc |= BD_SC_EMPTY; +#ifdef CONFIG_SMC_RXBUFLEN + /* the characters are read one by one, use the rxCharIndex to know the next char to deliver */ + c = *(unsigned char *) (rtx->rxbd.cbd_bufaddr+rtx->rxCharIndex); + rtx->rxCharIndex++; + + /* check if all char are readout, then make prepare for next receive */ + if (rtx->rxCharIndex >= rtx->rxbd.cbd_datlen) + { + rtx->rxCharIndex = 0; + rtx->rxbd.cbd_sc |= BD_SC_EMPTY; + } +#else + c = *(unsigned char *) (rtx->rxbd.cbd_bufaddr); + rtx->rxbd.cbd_sc |= BD_SC_EMPTY; +#endif
return(c); } @@ -363,19 +390,19 @@ smc_getc(void) static int smc_tstc(void) { - volatile cbd_t *rbdf; volatile smc_uart_t *up; volatile immap_t *im = (immap_t *)CFG_IMMR; volatile cpm8xx_t *cpmp = &(im->im_cpm); + SerialBuffer *rtx;
up = (smc_uart_t *)&cpmp->cp_dparam[PROFF_SMC]; #ifdef CFG_SMC_UCODE_PATCH up = (smc_uart_t *) &cpmp->cp_dpmem[up->smc_rpbase]; #endif
- rbdf = (cbd_t *)&cpmp->cp_dpmem[up->smc_rbase]; + rtx = (SerialBuffer *)&cpmp->cp_dpmem[up->smc_rbase];
- return(!(rbdf->cbd_sc & BD_SC_EMPTY)); + return(!(rtx->rxbd.cbd_sc & BD_SC_EMPTY)); }
struct serial_device serial_smc_device =
-----Original Message----- From: u-boot-bounces@lists.denx.de
[mailto:u-boot-bounces@lists.denx.de]
On Behalf Of Bigler, Stefan Sent: Wednesday, October 29, 2008 9:27 AM To: u-boot@lists.denx.de Subject: [U-Boot] RFC - How to speed up multiplexed input between
serial
andnetwork?
Hi
We are trying to use U-Boot that it can be remote controlled over netconsole and in locally over the serial terminal. We were quite successful but we saw some latency issues on the serial terminal. The polling of the serial driver is too slow to get all characters. This does not allow you to e.g. to copy/paste, most of the characters are lost.
We analyzed the code and tried to speed it up, without the required improvement. The tests are done with an MPC852@66MHz and an MPC8247.
In the file common/console.c we added hooks to measure the time for tstc() execution. The measured time are: serial-driver 3 Microseconds nc 15 Milliseconds
The result is, that the serial interface is polled only every 15 Millisecond. On the serial interface with a line-rate of 115200 we receive aprox 10'000 Character every second. This is one character every 100 Microsecond.
The serial driver has one buffer-descriptor with the space for one character. This results in a maximal polling period of 100
Microseconds.
The HW-FIFO for a MPC852T is 2 bytes.
There are 2 possibilities to solve the problem:
a) make the netconsole faster b) make serial more "robust" and allow more latency
The better solution is of course to make the netconsole faster. But
can
we reach 100 Microseconds? We can reduce it (as already done e.g. accelerate the readout of env-variables). To accelerate by factor 150 we need to do major
changes
e.g. read-out the env if changed so we need a mechanism to see this.
On the other hand we can enhance the serial driver to "absorb" e.g.
one
line that allows you to copy/paste. This is not a big code change but it needs more dp-ram.
The copy/paste test shows the following result copy paste 0123456789 -> 0 -> first character
a) So I tried to make the netconsole faster with the optimisation of tstc()
------------------------------------------------------------------------
There is the possibility to do the getenv() only if the env is
changed.
I added a "transactionId" what is incremented after every write to
env.
So the user of env can check if the env changed and only read if changed. This reduced the tstc() of nc to 60 Microseconds. So the polling of serial is done every 70 Microseconds. In principle this should be fast enough to be able to copy paste copy paste 0123456789 -> 013679 -> 50%
Why are we receiving only half of the character? This due to the fact that processing a character needs time. If we check how often we call getc() while copy/paste, this is every 180 Microsecond. The method getc() do not need lot of time, but the received character is sent
over
nc before we get the next char. I think we cannot avoid this.
I do not see how we can reduce this time even further.
The measurement is also done without nc. There the getc() is called every 80-90 Microseconds. So we see that is little headroom to do additional processing!
b) Make the serial driver more "robust" to absorb bursts
I think it would make sense to be able to absorb the burst of one line e.g. 128 character.
This can be done in 2 way: b1) use more buffer descriptor with one character b2) use the feature of smc to allow multi-character buffer
b1) driver with multi buffer descriptor
This is the possibility that is quite simple to implement, but needs more resources. I have already sent this. The required
dual-port-memory
is high 128 bd * 8 byte plus 128 byte for character = 1152 byte more. (I also implemented this driver)
b2) driver with multi-character buffer
I have implemented this driver for MPC852T (SMC) and attached a patch. The additional use of DP-RAM is the size the buffer (e.g. 128 bytes)
and
4 bytes for an index to the next character to read. A define can be used to specify the size of the buffer. If undefined
the
size is 1.
Conclusion:
I do not see a good chance to be able to reduce the processing time in the netconsole below 100 Microseconds.
I expect copy/paste to work for a line (128 characters).
So I propose to enhance the serial driver.
Best regards, Stefan Bigler
U-Boot mailing list U-Boot@lists.denx.de http://lists.denx.de/mailman/listinfo/u-boot

Dear "Bigler, Stefan",
In message D839955AA28B9A42A61B9181506E27C4012E1907@SRVCHBER1212.ch.keymile.net you wrote:
Here is patch for the driver for MPC8xx (SMC). If the idea is accepted the then the driver for MPC82xx will be enhanced as well.
Your patch was line wrapped by your mailed and thus corrupted.
Also, it has a many coding style issues (indentation not by TAB, MixedCaseVariableNamesWhichAreNotAllowedInUBootBecauseTheyAreHardToRead, too many empty lines, incorrect multi-line comment format, too long lines, incorrect brace style, trailing white space, etc.).
But please see my previous message - I don't think it is necessary to fix and resubmit the patch.
Best regards,
Wolfgang Denk

Dear "Bigler, Stefan",
In message D839955AA28B9A42A61B9181506E27C4012E18DB@SRVCHBER1212.ch.keymile.net you wrote:
In the file common/console.c we added hooks to measure the time for tstc() execution. The measured time are: serial-driver 3 Microseconds nc 15 Milliseconds
Let's start asking ourself why there is such a big difference.
The serial driver just checks that status bits in some hardware registers. This is pretty fast.
The nc driver however actually runs a NetLoop() call, i. e. it performs some active polling. THis takes much more time.
There are 2 possibilities to solve the problem:
a) make the netconsole faster b) make serial more "robust" and allow more latency
There may be other options as well, like making the multiplexing a little more intelligent. See below for an idea or two.
The better solution is of course to make the netconsole faster. But can we reach 100 Microseconds?
Probably not. But do we really have to?
We can reduce it (as already done e.g. accelerate the readout of env-variables). To accelerate by factor 150 we need to do major changes e.g. read-out the env if changed so we need a mechanism to see this.
Indeed environment variable handling could be accelerated a lot for example by using a has table for in-ram storage instead of the linear search list we use now.
On the other hand we can enhance the serial driver to "absorb" e.g. one line that allows you to copy/paste. This is not a big code change but it needs more dp-ram.
And it needs to be tested on many systems.
I really hesitate to add more complexity into the serial driver. It is part of the very basic design ideas in U-Boot to have a serial console very, very soon in the initialization sequence. This is a very impartant feature during board bringup and U-Boot porting, and I will not give this up easily.
Adding more complexity here is probably not a good idea unless there is really no other way around it.
a) So I tried to make the netconsole faster with the optimisation of tstc()
There is the possibility to do the getenv() only if the env is changed. I added a "transactionId" what is incremented after every write to env. So the user of env can check if the env changed and only read if changed. This reduced the tstc() of nc to 60 Microseconds. So the polling of serial is done every 70 Microseconds.
Ah, but that's excellent - you asked for 100 us above, so we're already much faster than needed now.
In principle this should be fast enough to be able to copy paste copy paste 0123456789 -> 013679 -> 50%
Why are we receiving only half of the character? This due to the fact that processing a character needs time. If we check how often we call getc() while copy/paste, this is every 180 Microsecond. The method getc() do not need lot of time, but the received character is sent over nc before we get the next char. I think we cannot avoid this.
Well, one very simple way to avoid it is to run the serial port at a lower baud rate. If you have a slow processor, you will be facing certain limits. The more features you add (like I/O multiplexing with a network driver) the more restrictive these limits will be.
I do not see how we can reduce this time even further.
Then it's probably time to lean back and think about alternative approaches to implement the featrues you are looking for.
The measurement is also done without nc. There the getc() is called every 80-90 Microseconds. So we see that is little headroom to do additional processing!
I don't want to doubt your measurements, but they don;t match my experience. You say you test on a 8xx at 66 MHz - I'm pretty sure that the 8xx can be used at 115200 bps reliably even when running with 33 MHz only.
Unfortunately I cannot test this at the moment, but I will run such a test as soon as possible.
b) Make the serial driver more "robust" to absorb bursts
I think it would make sense to be able to absorb the burst of one line e.g. 128 character.
Who says this is sufficient?
I don;t consider this as a real solution - you just push the limits a bit, so that it works in a few test cases now, but it will still fails in the same way as soon as somebody uses a little longer lines.
This can be done in 2 way: b1) use more buffer descriptor with one character b2) use the feature of smc to allow multi-character buffer
I really do not want to add such complexity to the serial driver, especially since the current implementation matches what Linux uses for early console, too.
Conclusion:
I do not see a good chance to be able to reduce the processing time in the netconsole below 100 Microseconds.
I expect copy/paste to work for a line (128 characters).
So I propose to enhance the serial driver.
I really do not like this approach.
But - do we really need such a "fix"? Lets step back a bit.
As I understand it, the problem results from the fact that you are trying to always alternate calls to polling the serial and the network console. This makes sense in the idle state, when we are waiting for input from any of the possible input devices. But does it still make sense to interrupt the serial code with network polling while a (high-speed) data transfer is going on on the serial line?
I don't think so.
From what I've gathered from the existing design, you don't really
care about guaranteeing any deterministic behaviour in case both input channels transfer data at the same time. If such a simultaneous transaction happens, your input data stream (for both channels) is likely to get corrupted (without any error indication).
I interpret the fact that your code does not care about this as an indication that such situations are very untypical for your mode of usage - but then why do we need to provide code that focuses on such a case?
So my question is: does it really make sense to continue polling the network console while a serial data transfer is in progress? [*]
I do not think so.
[*] Of course the same is true for the other direction - but there situation is much easier becahuse (a) the serial driver is very fast and (b) the nc code already handles multi-byte data packets.
My suggestion is to make the multiplexing more intelligent instead of making the serial driver more complex. The nice thing with this is that you probably still get the same results (actually even better ones as the artificial 128 byte line lengt limit can be avoided), and the changes are only in the new code, i. e. users who do not need such I/O multiplexing will not be affected.
I think it should be fairly simple to implement something similar to the VTIME feature for non-canonical reads in the Unix serial drivers (see "man tcsetattr"):
- In idle mode, all configured input devices are polled in a round-robin manner (as it is done now).
- As soon as a character is received on the serial line, a timestamp is taken. As you calculated, one character at 115 kbps takes about 100 us on the wire. Within a window of (for exmaple) 500 us (or about 5 character times) now polling of all other I/O ports will be skipped.
This should give you raw serial driver performacne while a serial data transfer is running, while keeping functionality for all other use cases.
What do you think?
Best regards,
Wolfgang Denk

Dear Mr. Denk
Thank you for the detailed answer.
Dear "Bigler, Stefan",
In message D839955AA28B9A42A61B9181506E27C4012E18DB@SRVCHBER1212.ch.keymile.net
you
wrote:
In the file common/console.c we added hooks to measure the time for tstc() execution. The measured time are: serial-driver 3 Microseconds nc 15 Milliseconds
Let's start asking ourself why there is such a big difference.
The serial driver just checks that status bits in some hardware registers. This is pretty fast.
The nc driver however actually runs a NetLoop() call, i. e. it performs some active polling. THis takes much more time.
This is true and all configuration e.g. ipaddr are reread again and again.
There are 2 possibilities to solve the problem:
a) make the netconsole faster b) make serial more "robust" and allow more latency
There may be other options as well, like making the multiplexing a little more intelligent. See below for an idea or two.
The better solution is of course to make the netconsole faster. But
can
we reach 100 Microseconds?
Probably not. But do we really have to?
We can reduce it (as already done e.g. accelerate the readout of env-variables). To accelerate by factor 150 we need to do major
changes
e.g. read-out the env if changed so we need a mechanism to see this.
Indeed environment variable handling could be accelerated a lot for example by using a has table for in-ram storage instead of the linear search list we use now.
On the other hand we can enhance the serial driver to "absorb" e.g.
one
line that allows you to copy/paste. This is not a big code change but it needs more dp-ram.
And it needs to be tested on many systems.
I really hesitate to add more complexity into the serial driver. It is part of the very basic design ideas in U-Boot to have a serial console very, very soon in the initialization sequence. This is a very impartant feature during board bringup and U-Boot porting, and I will not give this up easily.
Adding more complexity here is probably not a good idea unless there is really no other way around it.
a) So I tried to make the netconsole faster with the optimisation of tstc()
------------------------------------------------------------------------
There is the possibility to do the getenv() only if the env is
changed.
I added a "transactionId" what is incremented after every write to
env.
So the user of env can check if the env changed and only read if changed. This reduced the tstc() of nc to 60 Microseconds. So the polling of serial is done every 70 Microseconds.
Ah, but that's excellent - you asked for 100 us above, so we're already much faster than needed now.
In principle this should be fast enough to be able to copy paste copy paste 0123456789 -> 013679 -> 50%
Why are we receiving only half of the character? This due to the
fact
that processing a character needs time. If we check how often we
call
getc() while copy/paste, this is every 180 Microsecond. The method getc() do not need lot of time, but the received character is sent
over
nc before we get the next char. I think we cannot avoid this.
Well, one very simple way to avoid it is to run the serial port at a lower baud rate. If you have a slow processor, you will be facing certain limits. The more features you add (like I/O multiplexing with a network driver) the more restrictive these limits will be.
I do not see how we can reduce this time even further.
Then it's probably time to lean back and think about alternative approaches to implement the featrues you are looking for.
The measurement is also done without nc. There the getc() is called every 80-90 Microseconds. So we see that is little headroom to do additional processing!
I don't want to doubt your measurements, but they don;t match my experience. You say you test on a 8xx at 66 MHz - I'm pretty sure that the 8xx can be used at 115200 bps reliably even when running with 33 MHz only.
Unfortunately I cannot test this at the moment, but I will run such a test as soon as possible.
b) Make the serial driver more "robust" to absorb bursts
I think it would make sense to be able to absorb the burst of one
line
e.g. 128 character.
Who says this is sufficient?
I don;t consider this as a real solution - you just push the limits a bit, so that it works in a few test cases now, but it will still fails in the same way as soon as somebody uses a little longer lines.
This can be done in 2 way: b1) use more buffer descriptor with one character b2) use the feature of smc to allow multi-character buffer
I really do not want to add such complexity to the serial driver, especially since the current implementation matches what Linux uses for early console, too.
Conclusion:
I do not see a good chance to be able to reduce the processing time
in
the netconsole below 100 Microseconds.
I expect copy/paste to work for a line (128 characters).
So I propose to enhance the serial driver.
I really do not like this approach.
But - do we really need such a "fix"? Lets step back a bit.
As I understand it, the problem results from the fact that you are trying to always alternate calls to polling the serial and the network console. This makes sense in the idle state, when we are waiting for input from any of the possible input devices. But does it still make sense to interrupt the serial code with network polling while a (high-speed) data transfer is going on on the serial line?
I don't think so.
From what I've gathered from the existing design, you don't really care about guaranteeing any deterministic behaviour in case both input channels transfer data at the same time. If such a simultaneous transaction happens, your input data stream (for both channels) is likely to get corrupted (without any error indication).
I interpret the fact that your code does not care about this as an indication that such situations are very untypical for your mode of usage - but then why do we need to provide code that focuses on such a case?
So my question is: does it really make sense to continue polling the network console while a serial data transfer is in progress? [*]
I do not think so.
[*] Of course the same is true for the other direction - but there situation is much easier becahuse (a) the serial driver is very fast and (b) the nc code already handles multi-byte data packets.
My suggestion is to make the multiplexing more intelligent instead of making the serial driver more complex. The nice thing with this is that you probably still get the same results (actually even better ones as the artificial 128 byte line lengt limit can be avoided), and the changes are only in the new code, i. e. users who do not need such I/O multiplexing will not be affected.
I think it should be fairly simple to implement something similar to the VTIME feature for non-canonical reads in the Unix serial drivers (see "man tcsetattr"):
In idle mode, all configured input devices are polled in a round-robin manner (as it is done now).
As soon as a character is received on the serial line, a timestamp is taken. As you calculated, one character at 115 kbps takes about 100 us on the wire. Within a window of (for exmaple) 500 us (or about 5 character times) now polling of all other I/O ports will be skipped.
This should give you raw serial driver performacne while a serial data transfer is running, while keeping functionality for all other use cases.
What do you think?
First we need to have a good and accepted solution to reduce the time in NetLoop e.g. read only the env when changed. Then the polling is not anymore critical path. The main problem from my point of view is the echo of the received data to serial and also to nc. This is done now immediately, character by character and this takes time (more than we have).
Am I right when I say that between a read from character getc() until the next call of getc() we have 100 Microseconds to do all the required processing otherwise we lose data?
Best regards,
Wolfgang Denk
-- DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
Best regards, Stefan Bigler

Dear "Bigler, Stefan",
In message D839955AA28B9A42A61B9181506E27C4012E192F@SRVCHBER1212.ch.keymile.net you wrote:
This should give you raw serial driver performacne while a serial data transfer is running, while keeping functionality for all other use cases.
What do you think?
First we need to have a good and accepted solution to reduce the time in NetLoop e.g. read only the env when changed. Then the polling is not anymore critical path.
Hm... sorry, but I disagree. With my suggestion above, the time spent in NetLoop() does not matter any more at all. So no optimizations there will be needed to get your code working.
Optimizing NetLoop() is a complex thing with global impact that will require a lot of testing. There is little chance to see this in mainline soon - at least not in the upcoming 2008.12 release.
My suggestion however results in small code, and additionally this code affects only users of the new console multiplexing feature, but nobody else.
Such a modification could go into mainline much faster.
But I agree that it is a worthwile goal to optimize NetLoop() anyway.
The main problem from my point of view is the echo of the received data to serial and also to nc. This is done now immediately, character by character and this takes time (more than we have).
Sorry. I don't get it. It seems you bring up a new topic here.
Less than 6 hours before this you wrote: "The polling of the serial driver is too slow to get all characters. ... we added hooks to measure the time for tstc() execution. The measured time are: ... nc 15 Milliseconds".
My interpretation was (and is) that it's the *input* processing which is your major concern. And I showed a way to solve this problem ( at least I think that my suggestion will solve it).
Now you bring up a new topic - the time needed to output the characters. May be we should try and solve problems sequentially - if we throw all isses we see into one big pot we might not be able to swallow this.
BTW: did you measure any times for the character output?
Am I right when I say that between a read from character getc() until the next call of getc() we have 100 Microseconds to do all the required processing otherwise we lose data?
On average, yes. The time for a single character might be longer (up to close to 200 us) assumimg we are fast enough then to catch the third char. All this assuming a console baudrate of 115 kbps.
BTW - reducing the console baud rate would be a trivial way to avoid most of these issues ;-)
Best regards,
Wolfgang Denk

Dear "Denk Wolfgang",
In message D839955AA28B9A42A61B9181506E27C4012E192F@SRVCHBER1212.ch.keymile.net
you
wrote:
This should give you raw serial driver performacne while a
serial
data transfer is running, while keeping functionality for all
other
use cases.
What do you think?
First we need to have a good and accepted solution to reduce the
time in
NetLoop e.g. read only the env when changed. Then the polling is not anymore critical path.
Hm... sorry, but I disagree. With my suggestion above, the time spent in NetLoop() does not matter any more at all. So no optimizations there will be needed to get your code working.
If you know how to implement the behaviour like VTIME I'm fine, but I don't understand how it can work. Is it correct to say: To check if data is received at our nc we have run NetLoop(). If yes, one run cost me 15 Milliseconds, so 150 character are potentially lost on the serial. Of course when I'm on the serial I stay longer on the serial and read more.
The main problem from my point of view is the echo of the received
data
to serial and also to nc. This is done now immediately, character by character and this takes time (more than we have).
Sorry. I don't get it. It seems you bring up a new topic here.
Less than 6 hours before this you wrote: "The polling of the serial driver is too slow to get all characters. ... we added hooks to measure the time for tstc() execution. The measured time are: ... nc 15 Milliseconds".
My interpretation was (and is) that it's the *input* processing which is your major concern. And I showed a way to solve this problem ( at least I think that my suggestion will solve it).
Now you bring up a new topic - the time needed to output the characters. May be we should try and solve problems sequentially - if we throw all isses we see into one big pot we might not be able to swallow this.
Sorry I did not tell you the full story (I also do not understand all).
BTW: did you measure any times for the character output?
What I know is, that reducing the time spend in the functions for nc by calling getenv() only when the env is changed is listed below: nc tstc() before 15 Milliseconds after 60 Microseconds nc getc() before 5 Microseconds after 5 Microseconds nc send_packet() before 90 Microseconds after 90 Microseconds
For the receiving the "real job" is done in tstc(), getc() only take it from the input_buffer. The sending do not run the NetLoop() in "steady state". This explains that only the tstc() gets faster.
BTW - reducing the console baud rate would be a trivial way to avoid most of these issues ;-)
Reducing the baud rate helps here the measurements (pasting a 200 character line)
with 57600 6% of the characters are lost with 38400 0% of the characters are lost --> this would work
Am I right when I say that between a read from character getc()
until
the next call of getc() we have 100 Microseconds to do all the required processing otherwise we lose data?
On average, yes. The time for a single character might be longer (up
to > close to 200 us) assumimg we are fast enough then to catch the third
char. All this assuming a console baudrate of 115 kbps.
I agree with this when we assume that one character is received in the buffer/bd and 2 can be held in the HW-FIFO. When this would be the case then I should receive always the 3 first characters and then we have losses. But this is not the case we already loose the second. Do you have an explanation for this?
Best regards, Stefan Bigler

On Wed, 29 Oct 2008 13:14:52 +0100 Wolfgang Denk wd@denx.de wrote:
[big snip details of analysis]
My suggestion is to make the multiplexing more intelligent instead of making the serial driver more complex. The nice thing with this is that you probably still get the same results (actually even better ones as the artificial 128 byte line lengt limit can be avoided), and the changes are only in the new code, i. e. users who do not need such I/O multiplexing will not be affected.
I think it should be fairly simple to implement something similar to the VTIME feature for non-canonical reads in the Unix serial drivers (see "man tcsetattr"):
In idle mode, all configured input devices are polled in a round-robin manner (as it is done now).
As soon as a character is received on the serial line, a timestamp is taken. As you calculated, one character at 115 kbps takes about 100 us on the wire. Within a window of (for exmaple) 500 us (or about 5 character times) now polling of all other I/O ports will be skipped.
I took a quick look at this idea, but I didn't try to implement all the fancy timestamp stuff, etc.
Basically, I kept the pointer to the last device which had input and checked it first in tstc().
My testing was done on a sequoia at a baudrate of 115200. The sequoia is a fast board. Testing was done with combinations of stdin and stdout devices (serial and nc).
I observed no performance improvements.
I then looked more closely at the results of a rather simple case - stdin=serial and stdout=serial,nc. In this case the change mentioned above would have no effect since ther is only one stdin device.
Doing a paste of an 80 character line resulted in 90% loss of input. With stdin=stdout=serial cut&paste worked with no character loss. The obvious conclusion is that the _output_ to nc was so slow that it caused the character loss.
Thus, efforts to try to optimize the input at high baudrates in the multiplexing code itself it will not help due to the slow output.
The suggestion to lower the baudrate seems like the most intelligent solution.
--- Gary Jennejohn ********************************************************************* DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: +49-8142-66989-0 Fax: +49-8142-66989-80 Email: office@denx.de *********************************************************************
participants (3)
-
Bigler, Stefan
-
Gary Jennejohn
-
Wolfgang Denk