[U-Boot] Cavium/Marvell Octeon Support

Hi all,
I have been tasked with porting our Octeon U-Boot to the latest U-Boot and merging it upstream. This will involve a very significant amount of code that generally will not be compatible with other MIPS processors due to our needs and requirements. For example, the start.S will need to be completely different than what is present. For example, our existing start.S is 3577 lines of code in order to deal with things like RAS, exceptions, virtual memory and more. We need to use virtual memory since U-Boot can be loaded at any 4MB boundary in memory, not just 0xbfc00000. A number of drivers will need to be updated in order to properly map pointers to physical addresses. This is needed anyway, since I see numerous drivers that assume that a pointer is a DMA address. For MIPS this is never the case (I'm looking at XHCI).
The new Octeon U-Boot will be native 64-bit instead of how the earlier one was 32-bit using the N32 ABI (so 64-bit addresses could be accessed). We had to jump through some hoops to make a 32-bit U-Boot fully support 64-bit hardware.
I think we can shrink the code by removing support for starting "simple executive" tasks. Simple executive tasks are bare metal applications that can run on dedicated cores beside Linux (or without Linux). I will also not be porting any support for anything older than Octeon3.
We also make heavy use of our SDK in order to perform hardware initialization and networking. In our old U-Boot, we have almost 900K lines of code. I can cut out much of this but much will remain.
We also have added extensive infrastructure for handling SFP and QSFP cables as well as very extensive phy support for phys from Aquantia/Marvell, Vitesse/Microsemi, Inphi/Cortina and an Avago gearbox. Our customer wants us to port all of this to the new U-Boot and upstream it. I'm worried about the sheer amount of code since it is absolutely massive. Some of these phy drivers are extremely complex and need to tie into the SFP management. We also need to use a background polling thread while at the command prompt. A fair bit of our phy code is not in the normal phy drivers because it did not fit the model. Some of these phy drivers need to interact with the SFP support code in order to handle hot plug events in order to reconfigure themselves based on the cable type. The existing SFP code handles everything from SFP to SFP28 as well as QSFP and 100G QSFP (never tested).
In the old U-Boot the PHY support had to be significantly enhanced due to requirements for hot-plugging and how some of the PHYs are configured. It gets quite complicated with phys like the Inphi where one phy can handle either four ports (XFI/SGMII) or a single 4-lane port (XLAUI). It gets even worse since in some boards we use reclocking chips and there is one chip that handles the receive path of a QSFP and another that handles the transmit path. Further complicating things, with a QSFP it can be treated either as XLAUI or as four XFI ports, so you can have four ports spread across two chips, with each port using different slices of each chip. In the case of the Inphi/Cortina chip, a single device can handle one or four ports based on the configuration and it is configured by "slice" which is basically an offset into the MDIO register space. We had to jump through hoops in order to have this stuff work in a sane way in the device tree. We added entries for SFP and QSFP slots in the device tree which point to the MACs, GPIOs and I2C bus because pointing them to the phys just got too insane. This will need to be ported to the new U-Boot. It should not break the existing support since most of it was implemented outside of the core PHY handling code. In the port, it would be far better if this could be integrated in. The SFP management code is architecture agnostic as is all of the PHY support. The callbacks for the SFP support are used by the MAC which then notifies the PHY since the MAC often needs to reconfigure itself. It can handle some crazy configurations.
While I see some phy drivers that we also support, i.e. Cortina, our drivers tend to have a lot more functionality. For example, all of our phy drivers that support firmware support commands for upgrading the firmware as well as things like cable testing and other features.
Our bootloader needs to be able to be booted from a variety of sources, including SPI, eMMC, NOR flash and booting over the PCI bus from a host system. This is one reason we use virtual memory. The other reason is that it eliminates the need to perform relocation. Our start.S code handles all of these different cases as well as exception handling.
I will also say up front that the memory initialization code is a mess and quite large (it was written by a hardware engineer who never heard of functions).
One thing is that this will break mips unless it is refactored like ARM is, for example, separating armv7 and armv8. This way we could have arch/mips/cpu/octeon. I did this with the old bootloader to separate our stuff. I'm open to suggestions as for the naming. I don't see how we can share much of the code with the other MIPS CPUs.
All in all, I think the final port will add between 500K-1M lines of code for the Octeon CPU. It is much more extensive than what is required for OcteonTX since in the latter case most of the hardware initialization is done by earlier stage bootloaders and the ATF handles things like SFP port management and many of the networking operations.
I'm not sure how well I'll be able to upstream all of this code at this point since I was just handed this task. We already have at least 1M lines of code added to the old U-Boot which is based off of 2013.08 with a lot of backports.

Hi Aaron,
On Wed, Oct 23, 2019 at 4:50 PM Aaron Williams awilliams@marvell.com wrote:
Hi all,
I have been tasked with porting our Octeon U-Boot to the latest U-Boot and merging it upstream. This will involve a very significant amount of code that generally will not be compatible with other MIPS processors due to our needs and requirements. For example, the start.S will need to be completely different than what is present. For example, our existing start.S is 3577 lines of code in order to deal with things like RAS, exceptions, virtual memory and more. We need to use virtual memory since U-Boot can be loaded at any 4MB boundary in memory, not just 0xbfc00000. A number of drivers will need to be updated in order to properly map pointers to physical addresses. This is needed anyway, since I see numerous drivers that assume that a pointer is a DMA address. For MIPS this is never the case (I'm looking at XHCI).
I don't have any specific reply to your questions. But as a Cavium/Marvell customer I'm really happy to see this finally happening. I've got a few boards using the Octeon3 SoC that I'd be able to upstream once this code lands.
The new Octeon U-Boot will be native 64-bit instead of how the earlier one was 32-bit using the N32 ABI (so 64-bit addresses could be accessed). We had to jump through some hoops to make a 32-bit U-Boot fully support 64-bit hardware.
I think we can shrink the code by removing support for starting "simple executive" tasks. Simple executive tasks are bare metal applications that can run on dedicated cores beside Linux (or without Linux). I will also not be porting any support for anything older than Octeon3.
We also make heavy use of our SDK in order to perform hardware initialization and networking. In our old U-Boot, we have almost 900K lines of code. I can cut out much of this but much will remain.
We also have added extensive infrastructure for handling SFP and QSFP cables as well as very extensive phy support for phys from Aquantia/Marvell, Vitesse/Microsemi, Inphi/Cortina and an Avago gearbox. Our customer wants us to port all of this to the new U-Boot and upstream it. I'm worried about the sheer amount of code since it is absolutely massive. Some of these phy drivers are extremely complex and need to tie into the SFP management. We also need to use a background polling thread while at the command prompt. A fair bit of our phy code is not in the normal phy drivers because it did not fit the model. Some of these phy drivers need to interact with the SFP support code in order to handle hot plug events in order to reconfigure themselves based on the cable type. The existing SFP code handles everything from SFP to SFP28 as well as QSFP and 100G QSFP (never tested).
In the old U-Boot the PHY support had to be significantly enhanced due to requirements for hot-plugging and how some of the PHYs are configured. It gets quite complicated with phys like the Inphi where one phy can handle either four ports (XFI/SGMII) or a single 4-lane port (XLAUI). It gets even worse since in some boards we use reclocking chips and there is one chip that handles the receive path of a QSFP and another that handles the transmit path. Further complicating things, with a QSFP it can be treated either as XLAUI or as four XFI ports, so you can have four ports spread across two chips, with each port using different slices of each chip. In the case of the Inphi/Cortina chip, a single device can handle one or four ports based on the configuration and it is configured by "slice" which is basically an offset into the MDIO register space. We had to jump through hoops in order to have this stuff work in a sane way in the device tree. We added entries for SFP and QSFP slots in the device tree which point to the MACs, GPIOs and I2C bus because pointing them to the phys just got too insane. This will need to be ported to the new U-Boot. It should not break the existing support since most of it was implemented outside of the core PHY handling code. In the port, it would be far better if this could be integrated in. The SFP management code is architecture agnostic as is all of the PHY support. The callbacks for the SFP support are used by the MAC which then notifies the PHY since the MAC often needs to reconfigure itself. It can handle some crazy configurations.
While I see some phy drivers that we also support, i.e. Cortina, our drivers tend to have a lot more functionality. For example, all of our phy drivers that support firmware support commands for upgrading the firmware as well as things like cable testing and other features.
Our bootloader needs to be able to be booted from a variety of sources, including SPI, eMMC, NOR flash and booting over the PCI bus from a host system. This is one reason we use virtual memory. The other reason is that it eliminates the need to perform relocation. Our start.S code handles all of these different cases as well as exception handling.
I will also say up front that the memory initialization code is a mess and quite large (it was written by a hardware engineer who never heard of functions).
One thing is that this will break mips unless it is refactored like ARM is, for example, separating armv7 and armv8. This way we could have arch/mips/cpu/octeon. I did this with the old bootloader to separate our stuff. I'm open to suggestions as for the naming. I don't see how we can share much of the code with the other MIPS CPUs.
All in all, I think the final port will add between 500K-1M lines of code for the Octeon CPU. It is much more extensive than what is required for OcteonTX since in the latter case most of the hardware initialization is done by earlier stage bootloaders and the ATF handles things like SFP port management and many of the networking operations.
I'm not sure how well I'll be able to upstream all of this code at this point since I was just handed this task. We already have at least 1M lines of code added to the old U-Boot which is based off of 2013.08 with a lot of backports. _______________________________________________ U-Boot mailing list U-Boot@lists.denx.de https://lists.denx.de/listinfo/u-boot

Hi Aaron,
Am 23.10.19 um 05:50 schrieb Aaron Williams:
Hi all,
I have been tasked with porting our Octeon U-Boot to the latest U-Boot and merging it upstream. This will involve a very significant amount of code that generally will not be compatible with other MIPS processors due to our needs and requirements. For example, the start.S will need to be completely different than what is present. For example, our existing start.S is 3577 lines of code in order to deal with things like RAS, exceptions, virtual memory and more. We need to use virtual memory since U-Boot can be loaded at any 4MB boundary in memory, not just 0xbfc00000. A number of drivers will need to be updated in order to properly map pointers to physical addresses. This is needed anyway, since I see numerous drivers that assume that a pointer is a DMA address. For MIPS this is never the case (I'm looking at XHCI).
Good to see some progress in mainline Octeon support. Could you briefly describe the differences and commonalities in booting an Octeon CPU compared to other "generic" MIPS cores? Or could you point me to a public Git tree? It can't be that different because Linux kernel is also able to share most of the code ;)
In principle you could compile an own start.S in your mach-octeon directory, but you should try to use the generic start.S which is already customisable and extensible. If needed, we could add more extension points to it. Booting from any custom memory address is already supported and very common for other MIPS based SoC's. Exception support is also already there.
The new Octeon U-Boot will be native 64-bit instead of how the earlier one was 32-bit using the N32 ABI (so 64-bit addresses could be accessed). We had to jump through some hoops to make a 32-bit U-Boot fully support 64-bit hardware.
We have 64 bit support for MIPS. I even sync'ed the asm/io stuff from Linux in the past (which includes support for Octeon) so that you would be able to use the standard IO primitives and ioremap stuff and hook in your platform-specifc memory mappings.
I think we can shrink the code by removing support for starting "simple executive" tasks. Simple executive tasks are bare metal applications that can run on dedicated cores beside Linux (or without Linux). I will also not be porting any support for anything older than Octeon3.
We also make heavy use of our SDK in order to perform hardware initialization and networking. In our old U-Boot, we have almost 900K lines of code. I can cut out much of this but much will remain.
We also have added extensive infrastructure for handling SFP and QSFP cables as well as very extensive phy support for phys from Aquantia/Marvell, Vitesse/Microsemi, Inphi/Cortina and an Avago gearbox. Our customer wants us to port all of this to the new U-Boot and upstream it. I'm worried about the sheer amount of code since it is absolutely massive.
Maybe you should cut down your customers expectations a bit. According to sloccount we currently have 1.6M SLOC for the whole U-Boot. I guess Tom or Wolfgang wouldn't agree with adding another 900k only for one CPU. Actually what should be upstream is the basic CPU, driver and board support to be able to boot a mainline kernel. Everything else like custom bare metal applications or the SFP/PHY handling stuff mentioned below could also be maintained in a downstream tree. Maybe Wolfgang is willing to host one on gitlab.denx.de.
Some of these phy drivers are extremely complex and need to tie into the SFP management. We also need to use a background polling thread while at the command prompt. A fair bit of our phy code is not in the normal phy drivers because it did not fit the model. Some of these phy drivers need to interact with the SFP support code in order to handle hot plug events in order to reconfigure themselves based on the cable type. The existing SFP code handles everything from SFP to SFP28 as well as QSFP and 100G QSFP (never tested).
In the old U-Boot the PHY support had to be significantly enhanced due to requirements for hot-plugging and how some of the PHYs are configured. It gets quite complicated with phys like the Inphi where one phy can handle either four ports (XFI/SGMII) or a single 4-lane port (XLAUI). It gets even worse since in some boards we use reclocking chips and there is one chip that handles the receive path of a QSFP and another that handles the transmit path. Further complicating things, with a QSFP it can be treated either as XLAUI or as four XFI ports, so you can have four ports spread across two chips, with each port using different slices of each chip. In the case of the Inphi/Cortina chip, a single device can handle one or four ports based on the configuration and it is configured by "slice" which is basically an offset into the MDIO register space. We had to jump through hoops in order to have this stuff work in a sane way in the device tree. We added entries for SFP and QSFP slots in the device tree which point to the MACs, GPIOs and I2C bus because pointing them to the phys just got too insane. This will need to be ported to the new U-Boot. It should not break the existing support since most of it was implemented outside of the core PHY handling code. In the port, it would be far better if this could be integrated in. The SFP management code is architecture agnostic as is all of the PHY support. The callbacks for the SFP support are used by the MAC which then notifies the PHY since the MAC often needs to reconfigure itself. It can handle some crazy configurations.
While I see some phy drivers that we also support, i.e. Cortina, our drivers tend to have a lot more functionality. For example, all of our phy drivers that support firmware support commands for upgrading the firmware as well as things like cable testing and other features.
PHY drivers and ethernet drivers should be really reduced to the required functionality to enable basic networking like Ping, DHCP, TFTP. U-Boot is still "just" a bootloader and not a system managemnt tool ;) You should do that stuff either in Linux or in a downstream fork.
Our bootloader needs to be able to be booted from a variety of sources, including SPI, eMMC, NOR flash and booting over the PCI bus from a host system. This is one reason we use virtual memory. The other reason is that it eliminates the need to perform relocation. Our start.S code handles all of these different cases as well as exception handling.
This is already supported for MIPS. You should try to use the generic SPL framework for that. Whether you like the relocation or not, it's one of the basic design principles of U-Boot. I guess it likely won't be accepted if you circumvent this. In fact by now we're sharing the same technology as Linux to have relocatable binaries without using gcc's -fPIC or -mabicalls to reduce the binary footprint. You can configure gd->ram_top to any address of your liking as reference address for the relocation.
I will also say up front that the memory initialization code is a mess and quite large (it was written by a hardware engineer who never heard of functions).
One thing is that this will break mips unless it is refactored like ARM is, for example, separating armv7 and armv8. This way we could have arch/mips/cpu/octeon. I did this with the old bootloader to separate our stuff. I'm open to suggestions as for the naming. I don't see how we can share much of the code with the other MIPS CPUs.
We have the same mach directory handling as in Linux MIPS. So you could easily add all your platform specific code (except drivers) to arch/mips/mach-octeon or (-cavium). Inside that directory you can have an include directory for you cusom header files, you can even override the generic files from arch/mips/include like in Linux. arch/mips/cpu and arch/mips/lib should only contain generic code. As already mentioned you could provide an own start.S inside arch/mips/mach-octeon but if possible you should try to reuse or extend the generic variant.
All in all, I think the final port will add between 500K-1M lines of code for the Octeon CPU. It is much more extensive than what is required for OcteonTX since in the latter case most of the hardware initialization is done by earlier stage bootloaders and the ATF handles things like SFP port management and many of the networking operations.
I'm not sure how well I'll be able to upstream all of this code at this point since I was just handed this task. We already have at least 1M lines of code added to the old U-Boot which is based off of 2013.08 with a lot of backports.

On Fri, Oct 25, 2019 at 05:13:57PM +0200, Daniel Schwierzeck wrote:
Hi Aaron,
Am 23.10.19 um 05:50 schrieb Aaron Williams:
Hi all,
I have been tasked with porting our Octeon U-Boot to the latest U-Boot and merging it upstream. This will involve a very significant amount of code that generally will not be compatible with other MIPS processors due to our needs and requirements. For example, the start.S will need to be completely different than what is present. For example, our existing start.S is 3577 lines of code in order to deal with things like RAS, exceptions, virtual memory and more. We need to use virtual memory since U-Boot can be loaded at any 4MB boundary in memory, not just 0xbfc00000. A number of drivers will need to be updated in order to properly map pointers to physical addresses. This is needed anyway, since I see numerous drivers that assume that a pointer is a DMA address. For MIPS this is never the case (I'm looking at XHCI).
Good to see some progress in mainline Octeon support. Could you briefly describe the differences and commonalities in booting an Octeon CPU compared to other "generic" MIPS cores? Or could you point me to a public Git tree? It can't be that different because Linux kernel is also able to share most of the code ;)
In principle you could compile an own start.S in your mach-octeon directory, but you should try to use the generic start.S which is already customisable and extensible. If needed, we could add more extension points to it. Booting from any custom memory address is already supported and very common for other MIPS based SoC's. Exception support is also already there.
The new Octeon U-Boot will be native 64-bit instead of how the earlier one was 32-bit using the N32 ABI (so 64-bit addresses could be accessed). We had to jump through some hoops to make a 32-bit U-Boot fully support 64-bit hardware.
We have 64 bit support for MIPS. I even sync'ed the asm/io stuff from Linux in the past (which includes support for Octeon) so that you would be able to use the standard IO primitives and ioremap stuff and hook in your platform-specifc memory mappings.
I think we can shrink the code by removing support for starting "simple executive" tasks. Simple executive tasks are bare metal applications that can run on dedicated cores beside Linux (or without Linux). I will also not be porting any support for anything older than Octeon3.
We also make heavy use of our SDK in order to perform hardware initialization and networking. In our old U-Boot, we have almost 900K lines of code. I can cut out much of this but much will remain.
We also have added extensive infrastructure for handling SFP and QSFP cables as well as very extensive phy support for phys from Aquantia/Marvell, Vitesse/Microsemi, Inphi/Cortina and an Avago gearbox. Our customer wants us to port all of this to the new U-Boot and upstream it. I'm worried about the sheer amount of code since it is absolutely massive.
Maybe you should cut down your customers expectations a bit. According to sloccount we currently have 1.6M SLOC for the whole U-Boot. I guess Tom or Wolfgang wouldn't agree with adding another 900k only for one CPU. Actually what should be upstream is the basic CPU, driver and board support to be able to boot a mainline kernel. Everything else like custom bare metal applications or the SFP/PHY handling stuff mentioned below could also be maintained in a downstream tree. Maybe Wolfgang is willing to host one on gitlab.denx.de.
Some of these phy drivers are extremely complex and need to tie into the SFP management. We also need to use a background polling thread while at the command prompt. A fair bit of our phy code is not in the normal phy drivers because it did not fit the model. Some of these phy drivers need to interact with the SFP support code in order to handle hot plug events in order to reconfigure themselves based on the cable type. The existing SFP code handles everything from SFP to SFP28 as well as QSFP and 100G QSFP (never tested).
In the old U-Boot the PHY support had to be significantly enhanced due to requirements for hot-plugging and how some of the PHYs are configured. It gets quite complicated with phys like the Inphi where one phy can handle either four ports (XFI/SGMII) or a single 4-lane port (XLAUI). It gets even worse since in some boards we use reclocking chips and there is one chip that handles the receive path of a QSFP and another that handles the transmit path. Further complicating things, with a QSFP it can be treated either as XLAUI or as four XFI ports, so you can have four ports spread across two chips, with each port using different slices of each chip. In the case of the Inphi/Cortina chip, a single device can handle one or four ports based on the configuration and it is configured by "slice" which is basically an offset into the MDIO register space. We had to jump through hoops in order to have this stuff work in a sane way in the device tree. We added entries for SFP and QSFP slots in the device tree which point to the MACs, GPIOs and I2C bus because pointing them to the phys just got too insane. This will need to be ported to the new U-Boot. It should not break the existing support since most of it was implemented outside of the core PHY handling code. In the port, it would be far better if this could be integrated in. The SFP management code is architecture agnostic as is all of the PHY support. The callbacks for the SFP support are used by the MAC which then notifies the PHY since the MAC often needs to reconfigure itself. It can handle some crazy configurations.
While I see some phy drivers that we also support, i.e. Cortina, our drivers tend to have a lot more functionality. For example, all of our phy drivers that support firmware support commands for upgrading the firmware as well as things like cable testing and other features.
PHY drivers and ethernet drivers should be really reduced to the required functionality to enable basic networking like Ping, DHCP, TFTP. U-Boot is still "just" a bootloader and not a system managemnt tool ;) You should do that stuff either in Linux or in a downstream fork.
Our bootloader needs to be able to be booted from a variety of sources, including SPI, eMMC, NOR flash and booting over the PCI bus from a host system. This is one reason we use virtual memory. The other reason is that it eliminates the need to perform relocation. Our start.S code handles all of these different cases as well as exception handling.
This is already supported for MIPS. You should try to use the generic SPL framework for that. Whether you like the relocation or not, it's one of the basic design principles of U-Boot. I guess it likely won't be accepted if you circumvent this. In fact by now we're sharing the same technology as Linux to have relocatable binaries without using gcc's -fPIC or -mabicalls to reduce the binary footprint. You can configure gd->ram_top to any address of your liking as reference address for the relocation.
I will also say up front that the memory initialization code is a mess and quite large (it was written by a hardware engineer who never heard of functions).
One thing is that this will break mips unless it is refactored like ARM is, for example, separating armv7 and armv8. This way we could have arch/mips/cpu/octeon. I did this with the old bootloader to separate our stuff. I'm open to suggestions as for the naming. I don't see how we can share much of the code with the other MIPS CPUs.
We have the same mach directory handling as in Linux MIPS. So you could easily add all your platform specific code (except drivers) to arch/mips/mach-octeon or (-cavium). Inside that directory you can have an include directory for you cusom header files, you can even override the generic files from arch/mips/include like in Linux. arch/mips/cpu and arch/mips/lib should only contain generic code. As already mentioned you could provide an own start.S inside arch/mips/mach-octeon but if possible you should try to reuse or extend the generic variant.
All in all, I think the final port will add between 500K-1M lines of code for the Octeon CPU. It is much more extensive than what is required for OcteonTX since in the latter case most of the hardware initialization is done by earlier stage bootloaders and the ATF handles things like SFP port management and many of the networking operations.
I'm not sure how well I'll be able to upstream all of this code at this point since I was just handed this task. We already have at least 1M lines of code added to the old U-Boot which is based off of 2013.08 with a lot of backports.
Daniel makes a lot of good points and I defer to him on general MIPS questions. What I do want to add is that it's a good idea to start by focusing on the minimum needs to be able to boot Linux and aim for a medium term goal of having enough upstream that all of the other things that can live downstream, as Daniel suggests, be applied in your internal tree and work over time to minimize that delta, either by re-evaluating use-cases or submitting more code upstream.

On Saturday, October 26, 2019 3:15:36 PM PDT Tom Rini wrote:
External Email
On Fri, Oct 25, 2019 at 05:13:57PM +0200, Daniel Schwierzeck wrote:
Hi Aaron,
Am 23.10.19 um 05:50 schrieb Aaron Williams:
Hi all,
I have been tasked with porting our Octeon U-Boot to the latest U-Boot and merging it upstream. This will involve a very significant amount of code that generally will not be compatible with other MIPS processors due to our needs and requirements. For example, the start.S will need to be completely different than what is present. For example, our existing start.S is 3577 lines of code in order to deal with things like RAS, exceptions, virtual memory and more. We need to use virtual memory since U-Boot can be loaded at any 4MB boundary in memory, not just 0xbfc00000. A number of drivers will need to be updated in order to properly map pointers to physical addresses. This is needed anyway, since I see numerous drivers that assume that a pointer is a DMA address. For MIPS this is never the case (I'm looking at XHCI).
Good to see some progress in mainline Octeon support. Could you briefly describe the differences and commonalities in booting an Octeon CPU compared to other "generic" MIPS cores? Or could you point me to a public Git tree? It can't be that different because Linux kernel is also able to share most of the code ;)
In principle you could compile an own start.S in your mach-octeon directory, but you should try to use the generic start.S which is already customisable and extensible. If needed, we could add more extension points to it. Booting from any custom memory address is already supported and very common for other MIPS based SoC's. Exception support is also already there.
The new Octeon U-Boot will be native 64-bit instead of how the earlier one was 32-bit using the N32 ABI (so 64-bit addresses could be accessed). We had to jump through some hoops to make a 32-bit U-Boot fully support 64-bit hardware.
We have 64 bit support for MIPS. I even sync'ed the asm/io stuff from Linux in the past (which includes support for Octeon) so that you would be able to use the standard IO primitives and ioremap stuff and hook in your platform-specifc memory mappings.
I think we can shrink the code by removing support for starting "simple executive" tasks. Simple executive tasks are bare metal applications that can run on dedicated cores beside Linux (or without Linux). I will also not be porting any support for anything older than Octeon3.
We also make heavy use of our SDK in order to perform hardware initialization and networking. In our old U-Boot, we have almost 900K lines of code. I can cut out much of this but much will remain.
We also have added extensive infrastructure for handling SFP and QSFP cables as well as very extensive phy support for phys from Aquantia/Marvell, Vitesse/Microsemi, Inphi/Cortina and an Avago gearbox. Our customer wants us to port all of this to the new U-Boot and upstream it. I'm worried about the sheer amount of code since it is absolutely massive.
Maybe you should cut down your customers expectations a bit. According to sloccount we currently have 1.6M SLOC for the whole U-Boot. I guess Tom or Wolfgang wouldn't agree with adding another 900k only for one CPU. Actually what should be upstream is the basic CPU, driver and board support to be able to boot a mainline kernel. Everything else like custom bare metal applications or the SFP/PHY handling stuff mentioned below could also be maintained in a downstream tree. Maybe Wolfgang is willing to host one on gitlab.denx.de.
Some of these phy drivers are extremely complex and need to tie into the SFP management. We also need to use a background polling thread while at the command prompt. A fair bit of our phy code is not in the normal phy drivers because it did not fit the model. Some of these phy drivers need to interact with the SFP support code in order to handle hot plug events in order to reconfigure themselves based on the cable type. The existing SFP code handles everything from SFP to SFP28 as well as QSFP and 100G QSFP (never tested).
In the old U-Boot the PHY support had to be significantly enhanced due to requirements for hot-plugging and how some of the PHYs are configured. It gets quite complicated with phys like the Inphi where one phy can handle either four ports (XFI/SGMII) or a single 4-lane port (XLAUI). It gets even worse since in some boards we use reclocking chips and there is one chip that handles the receive path of a QSFP and another that handles the transmit path. Further complicating things, with a QSFP it can be treated either as XLAUI or as four XFI ports, so you can have four ports spread across two chips, with each port using different slices of each chip. In the case of the Inphi/Cortina chip, a single device can handle one or four ports based on the configuration and it is configured by "slice" which is basically an offset into the MDIO register space. We had to jump through hoops in order to have this stuff work in a sane way in the device tree. We added entries for SFP and QSFP slots in the device tree which point to the MACs, GPIOs and I2C bus because pointing them to the phys just got too insane. This will need to be ported to the new U-Boot. It should not break the existing support since most of it was implemented outside of the core PHY handling code. In the port, it would be far better if this could be integrated in. The SFP management code is architecture agnostic as is all of the PHY support. The callbacks for the SFP support are used by the MAC which then notifies the PHY since the MAC often needs to reconfigure itself. It can handle some crazy configurations.
While I see some phy drivers that we also support, i.e. Cortina, our drivers tend to have a lot more functionality. For example, all of our phy drivers that support firmware support commands for upgrading the firmware as well as things like cable testing and other features.
PHY drivers and ethernet drivers should be really reduced to the required functionality to enable basic networking like Ping, DHCP, TFTP. U-Boot is still "just" a bootloader and not a system managemnt tool ;) You should do that stuff either in Linux or in a downstream fork.
Our bootloader needs to be able to be booted from a variety of sources, including SPI, eMMC, NOR flash and booting over the PCI bus from a host system. This is one reason we use virtual memory. The other reason is that it eliminates the need to perform relocation. Our start.S code handles all of these different cases as well as exception handling.
This is already supported for MIPS. You should try to use the generic SPL framework for that. Whether you like the relocation or not, it's one of the basic design principles of U-Boot. I guess it likely won't be accepted if you circumvent this. In fact by now we're sharing the same technology as Linux to have relocatable binaries without using gcc's -fPIC or -mabicalls to reduce the binary footprint. You can configure gd->ram_top to any address of your liking as reference address for the relocation.
I will also say up front that the memory initialization code is a mess and quite large (it was written by a hardware engineer who never heard of functions).
One thing is that this will break mips unless it is refactored like ARM is, for example, separating armv7 and armv8. This way we could have arch/mips/cpu/octeon. I did this with the old bootloader to separate our stuff. I'm open to suggestions as for the naming. I don't see how we can share much of the code with the other MIPS CPUs.
We have the same mach directory handling as in Linux MIPS. So you could easily add all your platform specific code (except drivers) to arch/mips/mach-octeon or (-cavium). Inside that directory you can have an include directory for you cusom header files, you can even override the generic files from arch/mips/include like in Linux. arch/mips/cpu and arch/mips/lib should only contain generic code. As already mentioned you could provide an own start.S inside arch/mips/mach-octeon but if possible you should try to reuse or extend the generic variant.
All in all, I think the final port will add between 500K-1M lines of code for the Octeon CPU. It is much more extensive than what is required for OcteonTX since in the latter case most of the hardware initialization is done by earlier stage bootloaders and the ATF handles things like SFP port management and many of the networking operations.
I'm not sure how well I'll be able to upstream all of this code at this point since I was just handed this task. We already have at least 1M lines of code added to the old U-Boot which is based off of 2013.08 with a lot of backports.
Daniel makes a lot of good points and I defer to him on general MIPS questions. What I do want to add is that it's a good idea to start by focusing on the minimum needs to be able to boot Linux and aim for a medium term goal of having enough upstream that all of the other things that can live downstream, as Daniel suggests, be applied in your internal tree and work over time to minimize that delta, either by re-evaluating use-cases or submitting more code upstream.
This is my goal, unfortunately getting it to this point requires that most of the stuff works. I'll start on the "simpler" boards like the one the customer requires we first support, unfortunately there's not much simple about it. It requires the full networking support, SFP management and one of the more complex phys (and a custom one at that). Booting Linux also requires a lot of stuff work, including our custom command for booting Linux and all the code to bring cores out of reset and initialize them, at least for the current Linux kernel. Hopefully we can move away from this but we will still need to support the current stuff. I think much of our existing code can be used and cleaned up. We had to jump through some hoops due to the fact that our current U-Boot is 32-bit but we're dealing with a 64-bit environment so this allows some code to be cleaned up and simplified, though even though it's 32-bit it can still natively perform 64-bit addressing using the N32 ABI.
The required networking and initialization code alone is massive, and that's just for ping, dhcp and tftp! The Linux code is much smaller because U-Boot needs to do all the low-level hardware initialization first. Fortunately I've generally been fairly strict at following the U-Boot coding standard (such as it was). and tried to keep the code fairly modular. I can move a few drivers out of the arch section and into the driver section. It's also generally well commented (which leads to some of the size).
I'll basically strip out all the support for earlier Octeon devices which will help some, unfortunately most of the current code is for Octeon3.
My goal is to re-use as much existing U-Boot code as possible and make the smallest impact on it as I can. There are a handful of changes I will need to make to the U-Boot core code, but most of these are generally quite minor.
--Aaron

Hi Daniel,
On Friday, October 25, 2019 8:13:57 AM PDT Daniel Schwierzeck wrote:
External Email
Hi Aaron,
Am 23.10.19 um 05:50 schrieb Aaron Williams:
Hi all,
I have been tasked with porting our Octeon U-Boot to the latest U-Boot and merging it upstream. This will involve a very significant amount of code that generally will not be compatible with other MIPS processors due to our needs and requirements. For example, the start.S will need to be completely different than what is present. For example, our existing start.S is 3577 lines of code in order to deal with things like RAS, exceptions, virtual memory and more. We need to use virtual memory since U-Boot can be loaded at any 4MB boundary in memory, not just 0xbfc00000. A number of drivers will need to be updated in order to properly map pointers to physical addresses. This is needed anyway, since I see numerous drivers that assume that a pointer is a DMA address. For MIPS this is never the case (I'm looking at XHCI).
Good to see some progress in mainline Octeon support. Could you briefly describe the differences and commonalities in booting an Octeon CPU compared to other "generic" MIPS cores? Or could you point me to a public Git tree? It can't be that different because Linux kernel is also able to share most of the code ;)
Actually the low level code is significantly different. First of all, we need the U-Boot bootloader to be able to boot from different memory locations. Because of this, we use mapped memory for U-Boot. A side effect of this is that it eliminates the need for relocation when it is shifted to the top of memory. All we need to do is just set a couple of TLB entries.
The assembly code is significantly different and is far more extensive.
Additionally, the way Octeon Linux is booted is different.
The generic start.S is not usable in our case.
We have a significant amount of code for dealing with the cache and for things like copying U-Boot from flash into the L2 cache. We also have to deal with taking other cores out of reset in our start.S. Our exception handler has also been extended to handle multiple cores.
Some other things we have included are a native API that allows Simple Executive applications to make calls into U-Boot for such things as environment variable access as well as access to block devices and filesystems.
We used to have our Octeon SDK available for download but it seems this has been taken down :( I'm trying to find out how I can make it available but I'm getting pushback in sharing our GPLed U-Boot even though it is GPL.
In principle you could compile an own start.S in your mach-octeon directory, but you should try to use the generic start.S which is already customisable and extensible. If needed, we could add more extension points to it. Booting from any custom memory address is already supported and very common for other MIPS based SoC's. Exception support is also already there.
The bootloader needs to be able to start from multiple memory locations without recompiling. Our existing bootloader can run from any 4MB boundary without recompiling or relocation. It can start out of flash (from any sector boundary, not just 0) or L2 cache. Starting by L2 cache is supported by eMMC, SPI and PCI target bootloaders. Additionally the same bootloader can be started from RAM such as when the failsafe bootloader starts the main bootloader. In most cases, the failsafe is the same full-featured bootloader since it fits entirely within the L2 cache. Our only bootloader requirement is that it fits in the L2 cache (except when booting from Flash, though this is preferred for speed) and that it remain under 4 MiB in size.
I believe our exception handling is more extensive than the standard U-Boot exception handler. It includes the stack output as well as numerous COP0 registers and decoding the cause of the exception. The exception handler is also independent of a working C environment. We also need to handle exceptions occurring on multiple cores as they're brought out of reset and not all cases are exceptions. Cores are first powered on and kept in a halted state, then later when we start the Linux kernel or simple executive applications, the exception handler is updated (via a bootbus moveable memory region) and an NMI is generated for the cores where they will begin executing code out of start.S before moving to the code that sets up the environment for booting Linux and/or simple executive applications. In the latter case, TLB entries are programmed in for each core.
The new Octeon U-Boot will be native 64-bit instead of how the earlier one was 32-bit using the N32 ABI (so 64-bit addresses could be accessed). We had to jump through some hoops to make a 32-bit U-Boot fully support 64-bit hardware.
We have 64 bit support for MIPS. I even sync'ed the asm/io stuff from Linux in the past (which includes support for Octeon) so that you would be able to use the standard IO primitives and ioremap stuff and hook in your platform-specifc memory mappings.
That is good to know. What I have run into is the fact that many drivers do not support I/O remapping. I.e. XHCI assumes that a pointer is a DMA address. Also, does the 64-bit support handle multiple cores in U-Boot?
I agree about using the standard ioremap stuff. I'm only pointing out that there are places where it is missing in the common U-Boot code. Where it is present, there won't be any issues since traditionally I used those methods to call our platform specific remapping. I will look to see what is present and if it will work or not.
I think we can shrink the code by removing support for starting "simple executive" tasks. Simple executive tasks are bare metal applications that can run on dedicated cores beside Linux (or without Linux). I will also not be porting any support for anything older than Octeon3.
We also make heavy use of our SDK in order to perform hardware initialization and networking. In our old U-Boot, we have almost 900K lines of code. I can cut out much of this but much will remain.
We also have added extensive infrastructure for handling SFP and QSFP cables as well as very extensive phy support for phys from Aquantia/Marvell, Vitesse/Microsemi, Inphi/Cortina and an Avago gearbox. Our customer wants us to port all of this to the new U-Boot and upstream it. I'm worried about the sheer amount of code since it is absolutely massive.
Maybe you should cut down your customers expectations a bit. According to sloccount we currently have 1.6M SLOC for the whole U-Boot. I guess Tom or Wolfgang wouldn't agree with adding another 900k only for one CPU. Actually what should be upstream is the basic CPU, driver and board support to be able to boot a mainline kernel. Everything else like custom bare metal applications or the SFP/PHY handling stuff mentioned below could also be maintained in a downstream tree. Maybe Wolfgang is willing to host one on gitlab.denx.de.
I will try and cut it down. Much of the code is register definitions. The register definition files are auto-generated and tend to be huge. They're fully commented and include both big and little endian bitfields. In this case I can do like I did for OcteonTX and modify the scripts that generate these headers to strip out the little-endian and comments. There is a huge amount of code for configuring our QLM hardware interfaces. We also have a lot of code for SFP/QSFP ports.
There are some other huge files that can also be eliminated by dropping support for Octeon II and earlier. The error handling files are massive for those chips.
Much of the rest can be shrunk somewhat, but a lot of that code is still required.
There is a huge amount of code for dealing with our quad-lane modules (QLMs). The QLMs can be configured to run in a variety of modes, from PCIe, SGMII, SATA, XLAUI, XFI, Interlaken, SVRIO, QSGMII, XAUI, RXAUI and more. There is a lot of tuning and configuration code needed in order to handle different clocks, equalization, gain, AGC and a whole host of other serdes issues.
The MAC code is also quite large and complex since there are many coprocessors that must be configured. These chips are designed as network processors. While it makes their networking quite powerful and fast, it also means that a lot of programming is needed before they will work. There are input parser engines, buffer management engines, queueing engines, output engines and more that must be fully configured before any packets can be sent or received.
There is a fair bit of code used to bring additional cores out of reset. In our biggest configuration, there can be two Octeon CN78XX chips connected in tandem where each chip has 48 cores. In this case there is a lot of tuning that needs to happen with the lanes connecting the two chips before this configuration works reliably. There is a tuning process that is required to run on both sides (and the second chip runs a small binary image as well to perform its half of the tuning).
I do not know if this will change or not but the way the Linux kernel is booted on Octeon is not compatible with the standard boot commands. Part of this is due to the fact that Linux can be run in parallel with Simple Executive applications. It's even possible to run two copies of Linux simultaneously on different cores. To go along with this, there is also a mechanism with named memory blocks that is used. When bring cores out of reset for SE applications, the TLB entries need to be configured. There also is a fair bit of code dealing with core masks when choosing which cores are used for what.
We also have a named memory block feature which is used by Linux and simple executive applications where blocks of memory can be carved up. U-Boot needs to tie into this.
There are also a numerous other I/O interfaces that we also need to initialize. Unfortunately we also have some erratas we need to work around as well and a few are non-trivial.
The DRAM initialization code is also massive. It handles DDR3 and DDR4 for both registered and unregistered memory with ECC.
In many cases, the reason for the size of the code is due to the complexity of the SoC and the platforms built around it. You can think of CN78XX as being more like an enterprise-class server than a simple embedded device. The CN73XX is not too far behind the CN78XX. The only reason our Octeon TX2 U-Boot is so much smaller is that most of the early initialization takes place before U- Boot is started and the fact that a lot of the networking support (such as SFP management and PHY support) is handled by ATF as well as on-chip managment cores. This is necessary because Linux does not have any SFP management support nor can it handle the complex typologies we're frequently running into today. The requirements of Redhat also preclude any additional software being installed in order for the networking support to run.
One thing I may need to re-introduce to U-Boot is the temperature sensor support for devices like this, since thermal monitoring is important.
Some boards require a background task to perform periodic monitoring for certain events, including the board that needs to be upstreamed. I haven't checked if anything is available now, but what I did in the past was hook into the input function and while waiting for input it calls a user-defined polling function.
If interrupts are supported it makes the polling job easier.
Some of these phy drivers are extremely complex and need to tie into the SFP management. We also need to use a background polling thread while at the command prompt. A fair bit of our phy code is not in the normal phy drivers because it did not fit the model. Some of these phy drivers need to interact with the SFP support code in order to handle hot plug events in order to reconfigure themselves based on the cable type. The existing SFP code handles everything from SFP to SFP28 as well as QSFP and 100G QSFP (never tested).
In the old U-Boot the PHY support had to be significantly enhanced due to requirements for hot-plugging and how some of the PHYs are configured. It gets quite complicated with phys like the Inphi where one phy can handle either four ports (XFI/SGMII) or a single 4-lane port (XLAUI). It gets even worse since in some boards we use reclocking chips and there is one chip that handles the receive path of a QSFP and another that handles the transmit path. Further complicating things, with a QSFP it can be treated either as XLAUI or as four XFI ports, so you can have four ports spread across two chips, with each port using different slices of each chip. In the case of the Inphi/Cortina chip, a single device can handle one or four ports based on the configuration and it is configured by "slice" which is basically an offset into the MDIO register space. We had to jump through hoops in order to have this stuff work in a sane way in the device tree. We added entries for SFP and QSFP slots in the device tree which point to the MACs, GPIOs and I2C bus because pointing them to the phys just got too insane. This will need to be ported to the new U-Boot. It should not break the existing support since most of it was implemented outside of the core PHY handling code. In the port, it would be far better if this could be integrated in. The SFP management code is architecture agnostic as is all of the PHY support. The callbacks for the SFP support are used by the MAC which then notifies the PHY since the MAC often needs to reconfigure itself. It can handle some crazy configurations.
While I see some phy drivers that we also support, i.e. Cortina, our drivers tend to have a lot more functionality. For example, all of our phy drivers that support firmware support commands for upgrading the firmware as well as things like cable testing and other features.
PHY drivers and ethernet drivers should be really reduced to the required functionality to enable basic networking like Ping, DHCP, TFTP. U-Boot is still "just" a bootloader and not a system managemnt tool ;) You should do that stuff either in Linux or in a downstream fork.
This is the case for the most part. Unfortunately, many of these drivers require a lot of code and some require frequent monitoring to make adjustments. The SFP support is required to monitor what cable type is plugged in and to reprogram the phy as needed based on the type of cable. The 10G and 25G phys need different settings for optical/active vs passive copper vs SFP connectors. In addition, some require different settings based on the cable length and in some cases exceptions are needed for certain modules (there are a series of Avago SFP to Gigabit modules that require autonegotiation to be disabled in 1000Base-X mode). In at least one case there needs to be frequent polling to make adjustments (25G) as the equalization settings can change based on temperature. The SFP management code identifies the type of cable connected and its parameters so that the phy driver can adjust the appropriate settings. The SFP management code is generic and not tied to any one type of phy or MAC or brand of module. It also monitors all of the GPIO pins and will make callbacks when needed. Many phys lack the support for doing this themselves. Phys I have worked with that need this support include Cortina/ Inphi and several Microsemi/Vitesse devices.
The Inphy devices will typically handle four XFI lanes with four bi- directional slices with each slice given a different register range. Further complicating matters is that a QSFP port can either be four XFI interfaces or a single XLAUI interface. We have code to update the firmware for the Inphi chips, but this is small compared to the rest of the initialization code. These chips require that equalization and gain be configured on each slice based on the board and cable characteristics as well as LED configuration.
With the Microsemi reclocking chips, each chip has four unidirectional lanes. For a QSFP port, two chips are required with one chip configured for ingress and the other for egress. This can support either XLAUI or four XFI interfaces. When it is configured for XFI there are four XFI interfaces, since now four MACs are shared with two chips with each MAC going to one lane on each chip.
Also making things fun is that Inphi and the reclocking chips do not conform to the clause 45 standard at all. In the case of Inphi, the ID registers are 0.0 and 0.1 instead of 1.2 and 1.3 as they are in Clause 45.
The MAC drivers are also non-trivial. The Octeon chips are designed as network processors with a lot of hardware offloading and coprocessors. Bringing up a "simple Ethernet" interface is anything but simple. There are numerous offload engines that must be configured before it will work. While we do have one "simple" interface that can be configured, it often isn't because it's usually only good for a management port and many boards do not have this and the customers desire to be able to use any port.
Just configuring the interface between the MAC and PHY is also non-trivial. The Octeon (and later CPUs) have what are called "QLMs" or quad lane modules. These QLMs contain programmable serdes which can be configured for PCIe, SATA, XFI, XAUI, RXAUI, SGMII, 1000Base-X, XLAUI and a whole host of other interface types with a lot of tuning for things like equalization and clocks. The amount of QLM initialization code is quite large but necessary. There are a lot of clock and analog tuning parameters and sequences that must be run.
Sadly all of this is needed just for basic ping and DHCP. This isn't like a simple e1000 NIC or the NICs common with most SoCs.
Think of scaling from a Raspberry Pi to a dual-CPU XEON enterprise-class server with 96 cores and 256GiB of RAM with 10, 25 and 40Gbe ports but without a BCM or MCU to handle low-level board changes while also having many enterprise-class requirements for RAS, etc. That is why our code is so large and complex. There are a lot of hardware engines for offloading a lot of tasks since the chips are often used in security appliances. There are engines for ZIP compression, hardware regex engines, packet ordering engines, packet parsing engines, buffer management engines, RAID engines and a whole host of others. Many are not used in U-Boot, but a fair number are required for basic packet I/O.
For example, one of the boxes contains a CN78XX with 8 10G ports (where either can also be configured in XLAUI using 4 to 1 using a QSFP to SFP+ splitter cable. It has 128GiB of registered DDR4 DIMMS, 4 SATA drives, redundant power supplies and a whole host of other things including multiple temperature monitors. This uses an Inphi/Cortina phy chip that requires full SFP management support. With Inphi phys, the phy cannot drive LEDs based on traffic since it has no concept of packets, especially in XLAUI mode since each lane is independent of the others.
Another board, one I specifically have been told to upstream is a NIC that contains a CN73XX and two 10G/25G ports that go through a complex gearbox chip. Since there is no hardware support for LEDs in the Octeon SoC to indicate link and packet I/O this must be done in software (including U-Boot, customer requirement) and SFP port management is also a must. The phy is not at all a traditional phy. It uses i2c instead of MDIO and requires frequent monitoring of the link parameters (it's an older custom gearbox chip, there are newer and better chips that don't require this now). I have a hook while U-Boot is sitting at the prompt which allows for background tasks to operate while it's sitting.
I have several other NICs to support that use a Microsemi reclocking chip that has four unidirectional lanes per chip. The chip has zero intelligence and is shared between ports (and on some devices, multiple chips are shared between ports). Everything must be tuned based on the SFP/QSFP module type and cable length. LEDs also must be software driven. (The software driving of LEDs is eliminated in OcteonTX2). These chips have no way to drive the LEDs themselves to indicate packet I/O or link status.
There are also other boards that use the Microsemi reclocking chips. They were chosen in part due to the power budget and these chips are very low power (and inexpensive).
In all of these phy cases, all of the parameters are maintained in the device tree so the drivers are generic. Unfortunately these drivers also require SFP and QSFP management support.
I figure if there are several boards I need to upstream, it's not much more effort to port all of the boards to the new U-Boot. I've worked hard to minimize the board-specific code and make as much of it generic and based on the device tree as possible.
Someday I would love for SFP/QSFP infrastructure to get into Linux. Some NIC cards do it in their drivers, but I'd like to see generic infrastructure (like my U-Boot support). This might make it harder for some drivers to only support certain brands of modules too :) The generic code I wrote works with most modules except Intel (because they have bad checksums, but counterfeit Intel modules work fine!). It still can be expanded at some point since there is no support for module diagnostics other than identifying if it is present. Pretty much all it does is monitor the GPIO pins and parse and decode the EEPROM. The SFP code is generic enough such that any phy driver that needs it can easily hook into it.
Our bootloader needs to be able to be booted from a variety of sources, including SPI, eMMC, NOR flash and booting over the PCI bus from a host system. This is one reason we use virtual memory. The other reason is that it eliminates the need to perform relocation. Our start.S code handles all of these different cases as well as exception handling.
This is already supported for MIPS. You should try to use the generic SPL framework for that. Whether you like the relocation or not, it's one of the basic design principles of U-Boot. I guess it likely won't be accepted if you circumvent this. In fact by now we're sharing the same technology as Linux to have relocatable binaries without using gcc's -fPIC or -mabicalls to reduce the binary footprint. You can configure gd->ram_top to any address of your liking as reference address for the relocation.
I will look into this. One other complication is the fact that we require both a failsafe as well as a default bootloader. With the older U-Boot we got around all of this by just using TLB entries to map U-Boot to always run in the same virtual address regardless of the physical address. It eliminated any need for -fPIC and helped keep the binary small. For our older bootloader, it always executes at 0xC0000000 regardless of where it sits in physical memory. Using virtual memory also helps keep U-Boot simple and small.
I will also say up front that the memory initialization code is a mess and quite large (it was written by a hardware engineer who never heard of functions).
One thing is that this will break mips unless it is refactored like ARM is, for example, separating armv7 and armv8. This way we could have arch/mips/cpu/octeon. I did this with the old bootloader to separate our stuff. I'm open to suggestions as for the naming. I don't see how we can share much of the code with the other MIPS CPUs.
We have the same mach directory handling as in Linux MIPS. So you could easily add all your platform specific code (except drivers) to arch/mips/mach-octeon or (-cavium). Inside that directory you can have an include directory for you cusom header files, you can even override the generic files from arch/mips/include like in Linux. arch/mips/cpu and arch/mips/lib should only contain generic code. As already mentioned you could provide an own start.S inside arch/mips/mach-octeon but if possible you should try to reuse or extend the generic variant.
We can't use the existing start.S. We have a lot of requirements that are not supported there as well as a fair bit of code dedicated to dealing with the cache and TLBs and bringing additional cores out of reset. We make use of a boot bus movable region in order to do this and handle other cases like NMIs and the watchdog. Our start.S currently sits at around 3800 lines of code. Some is common but most is not.
Our start.S is designed to be able to boot both a failsafe and non-failsafe image and supports adjusting the flash mapping in order to start from an offset other than zero in the flash. There is also a fair bit of code for copying the image out of flash into the L2 cache for a significant speedup for DRAM initialization. I'm trying to get permission to share our existing code but I'm getting push-back (even though it's GPL!?!). How they want me to upstream it without sharing the code is beyond me.
While U-Boot has an exception handler, I believe ours is more comprehensive. It is written entirely in assembler and is not dependent on a working C runtime environment. It also dumps more information than just the registers such as the stack and a number of other exception registers and does some exception decoding. It's quite a bit better than the ARMv8 exception handler IMHO.
Putting this under mach-octeon will make it much easier. I'll try and re-use where I can.
All in all, I think the final port will add between 500K-1M lines of code for the Octeon CPU. It is much more extensive than what is required for OcteonTX since in the latter case most of the hardware initialization is done by earlier stage bootloaders and the ATF handles things like SFP port management and many of the networking operations.
I'm not sure how well I'll be able to upstream all of this code at this point since I was just handed this task. We already have at least 1M lines of code added to the old U-Boot which is based off of 2013.08 with a lot of backports.
I'm trying to get our existing code made available someplace online. I'm getting pushback even though U-Boot is GPL and the license on our SDK is BSD- like (i.e. do whatever you want but don't hold us responsible). It looks like it used to be available but was taken down. I don't undertstand lawyers. All of the code I wrote is GPL. There is some U-Boot specific code in our SDK, but none was copied from U-Boot. There also is some duplication of functionality between U-Boot and our SDK that I'll try and eliminate.
I have implemented just about every feature in U-Boot I could with our Octeon SoC. That's another reason it's so large. Some customer always comes back and says they want feature X to work. Fortunately, the changes to the U-Boot supplied code are generally minimal, despite it being so large.
I likely will need to add some more hooks to board_f.c and board_r.c. I have run into many cases where we need a specific order of initialization that does not match the normal U-Boot order. Perhaps make init_sequence_f and init_sequence_r weak so that they can be overridden if needed by a specific board or architecture. While much of the current init order works, we need some things initialized as quickly as possible and others initialized later. For example, the first thing we call is an early_errate_workaround function in the init sequence before anything else is called.
Regards,
-Aaron

Dear Aaron,
In message 4176494.JIoP81OjG2@flash you wrote:
Actually the low level code is significantly different. First of all, we need the U-Boot bootloader to be able to boot from different memory locations. Because of this, we use mapped memory for U-Boot. A side effect of this is that it eliminates the need for relocation when it is shifted to the top of memory. All we need to do is just set a couple of TLB entries.
The assembly code is significantly different and is far more extensive.
Additionally, the way Octeon Linux is booted is different.
The generic start.S is not usable in our case.
Please excuse my ignorance - I have never touched a Cavium system yet (at least not knowingly), and never looked into any of that code. So for me it would be really helpful if you would not only describe what you have, or what you need, or that things are different or cannot be used, but actually explain _why_ this is the case, and why you cannot use the existing structure of U-Boot mainline code.
I know that it is always difficult to upstream code that has been developed out of tree and without synchronizing the design with the mainline maintainers, but as long as you don't explain why it was mandatory to do things different, it is impossible to understand if this is the only sane way things can be implemented, or if you just don't want to change the code that has grown over the years in an uncontrolled way to avoid the efforts for cleaning it up.
We have a significant amount of code for dealing with the cache and for things like copying U-Boot from flash into the L2 cache. We also have to deal with taking other cores out of reset in our start.S. Our exception handler has also been extended to handle multiple cores.
We should be able to understand why you need this. There might be areas where your code overlaps with things that are already available in U-Boot mainline, and if there are good reasons to duplicate such areas, you should explain them.
Daniel already pointed out that doubling the code size of U-Boot by adding just a single new CPU simply makes no sense. I don't know what you are using U-Boot for, but we should keep in mind that it's a boot loader, which main purpose should be to execute as fast as possible just to be replaced by an operating system.
I have to admit that I have problems understanding why someone would need hot plug support for hardware in U-Boot.
It would be best to restrict initial upstreaming to a minimal sub-set that gives maintainers even a chance to review it.
Best regards,
Wolfgang Denk

Hi Aaron,
Am 27.10.19 um 03:34 schrieb Aaron Williams:
Hi Daniel,
On Friday, October 25, 2019 8:13:57 AM PDT Daniel Schwierzeck wrote:
External Email
Hi Aaron,
Am 23.10.19 um 05:50 schrieb Aaron Williams:
Hi all,
I have been tasked with porting our Octeon U-Boot to the latest U-Boot and merging it upstream. This will involve a very significant amount of code that generally will not be compatible with other MIPS processors due to our needs and requirements. For example, the start.S will need to be completely different than what is present. For example, our existing start.S is 3577 lines of code in order to deal with things like RAS, exceptions, virtual memory and more. We need to use virtual memory since U-Boot can be loaded at any 4MB boundary in memory, not just 0xbfc00000. A number of drivers will need to be updated in order to properly map pointers to physical addresses. This is needed anyway, since I see numerous drivers that assume that a pointer is a DMA address. For MIPS this is never the case (I'm looking at XHCI).
Good to see some progress in mainline Octeon support. Could you briefly describe the differences and commonalities in booting an Octeon CPU compared to other "generic" MIPS cores? Or could you point me to a public Git tree? It can't be that different because Linux kernel is also able to share most of the code ;)
Actually the low level code is significantly different. First of all, we need the U-Boot bootloader to be able to boot from different memory locations. Because of this, we use mapped memory for U-Boot. A side effect of this is that it eliminates the need for relocation when it is shifted to the top of memory. All we need to do is just set a couple of TLB entries.
Understood. but still U-Boot relocates itself from its initial entry memory address to its destination memory address based on gd->ram_top. Maybe this is ineffective nowadays with various SPL/TPL boot methods because U-Boot proper is already loaded to an executable memory location by SPL, but you have to initially deal with that design. Feel free to suggest/submit a patch for the generic board init code to make the reloaction configurable.
The assembly code is significantly different and is far more extensive.
Additionally, the way Octeon Linux is booted is different.
The generic start.S is not usable in our case.
We have a significant amount of code for dealing with the cache and for things like copying U-Boot from flash into the L2 cache. We also have to deal with taking other cores out of reset in our start.S. Our exception handler has also been extended to handle multiple cores.
it's hard to discuss this without example code but I still think the basic principles of cache and exception handling can't be that different from generic MIPS cores. Locking cache lines and loading code to it could be useful for other MIPS platforms and should be added as generic feature. BTW the exception handler code is a port of the Linux one, I only skipped the stack trace output because of the complicated stack unwinding code. I think the current dump of general and CP0 and EPC registers is more than feasible for a bootloader. It already helped me multiple times to quickly locate code locations with e.g. null pointer dereferencing.
Some other things we have included are a native API that allows Simple Executive applications to make calls into U-Boot for such things as environment variable access as well as access to block devices and filesystems.
This is one of the parts that shouldn't be needed for basic upstream support. It your API is a parallel and independent implementation of the API that U-Boot already has for standalone applications, than I'm afraid this won't be accepted and should be kept in a downstream fork.
We used to have our Octeon SDK available for download but it seems this has been taken down :( I'm trying to find out how I can make it available but I'm getting pushback in sharing our GPLed U-Boot even though it is GPL.
In principle you could compile an own start.S in your mach-octeon directory, but you should try to use the generic start.S which is already customisable and extensible. If needed, we could add more extension points to it. Booting from any custom memory address is already supported and very common for other MIPS based SoC's. Exception support is also already there.
The bootloader needs to be able to start from multiple memory locations without recompiling. Our existing bootloader can run from any 4MB boundary without recompiling or relocation. It can start out of flash (from any sector boundary, not just 0) or L2 cache. Starting by L2 cache is supported by eMMC, SPI and PCI target bootloaders. Additionally the same bootloader can be started from RAM such as when the failsafe bootloader starts the main bootloader. In most cases, the failsafe is the same full-featured bootloader since it fits entirely within the L2 cache. Our only bootloader requirement is that it fits in the L2 cache (except when booting from Flash, though this is preferred for speed) and that it remain under 4 MiB in size.
I believe our exception handling is more extensive than the standard U-Boot exception handler. It includes the stack output as well as numerous COP0 registers and decoding the cause of the exception. The exception handler is also independent of a working C environment. We also need to handle exceptions occurring on multiple cores as they're brought out of reset and not all cases are exceptions.
as I wrote above, the current exception handling is already feasible in almost all cases to quickly locate code bugs and doesn't need much code. Adding stack trace output would required adding a lot of more code. But if you only missing some registers or want to dump the stack itself, feel free to extend the current code.
Cores are first powered on and kept in a halted state, then
later when we start the Linux kernel or simple executive applications, the exception handler is updated (via a bootbus moveable memory region) and an NMI is generated for the cores where they will begin executing code out of start.S before moving to the code that sets up the environment for booting Linux and/or simple executive applications. In the latter case, TLB entries are programmed in for each core.
The new Octeon U-Boot will be native 64-bit instead of how the earlier one was 32-bit using the N32 ABI (so 64-bit addresses could be accessed). We had to jump through some hoops to make a 32-bit U-Boot fully support 64-bit hardware.
We have 64 bit support for MIPS. I even sync'ed the asm/io stuff from Linux in the past (which includes support for Octeon) so that you would be able to use the standard IO primitives and ioremap stuff and hook in your platform-specifc memory mappings.
That is good to know. What I have run into is the fact that many drivers do not support I/O remapping. I.e. XHCI assumes that a pointer is a DMA address. Also, does the 64-bit support handle multiple cores in U-Boot?
we already have stuff like dev_remap_addr(struct udevice* dev) as part of the driver model API to map your physical addresses from device tree to virtual addresses. This is used in all drivers compatible with MIPS. That function is backed by the MIPS specific ioremap_nocache() function (also ported from Linux) so that you can hook in platform specific mapping code. If you want to use existing drivers which don't do remapping yet, you have to patch them. But this should be simple, we recently did that on Broadcom or Mediatek platforms, which are sharing drivers between their MIPS and ARM CPUs.
For XHCI you probably only need to patch the xhci_readl() and xhci_writel() functions and establish the memory mappings in your platform specific glue code. But USB support shouldn't be your first priority ;)
I agree about using the standard ioremap stuff. I'm only pointing out that there are places where it is missing in the common U-Boot code. Where it is present, there won't be any issues since traditionally I used those methods to call our platform specific remapping. I will look to see what is present and if it will work or not.
yes, those places need some patching anyway. There is already an ongoing task to address this:
https://gitlab.denx.de/u-boot/custodians/u-boot-mips/issues/15
I think we can shrink the code by removing support for starting "simple executive" tasks. Simple executive tasks are bare metal applications that can run on dedicated cores beside Linux (or without Linux). I will also not be porting any support for anything older than Octeon3.
We also make heavy use of our SDK in order to perform hardware initialization and networking. In our old U-Boot, we have almost 900K lines of code. I can cut out much of this but much will remain.
We also have added extensive infrastructure for handling SFP and QSFP cables as well as very extensive phy support for phys from Aquantia/Marvell, Vitesse/Microsemi, Inphi/Cortina and an Avago gearbox. Our customer wants us to port all of this to the new U-Boot and upstream it. I'm worried about the sheer amount of code since it is absolutely massive.
Maybe you should cut down your customers expectations a bit. According to sloccount we currently have 1.6M SLOC for the whole U-Boot. I guess Tom or Wolfgang wouldn't agree with adding another 900k only for one CPU. Actually what should be upstream is the basic CPU, driver and board support to be able to boot a mainline kernel. Everything else like custom bare metal applications or the SFP/PHY handling stuff mentioned below could also be maintained in a downstream tree. Maybe Wolfgang is willing to host one on gitlab.denx.de.
I will try and cut it down. Much of the code is register definitions. The register definition files are auto-generated and tend to be huge. They're fully commented and include both big and little endian bitfields. In this case I can do like I did for OcteonTX and modify the scripts that generate these headers to strip out the little-endian and comments. There is a huge amount of code for configuring our QLM hardware interfaces. We also have a lot of code for SFP/QSFP ports.
There are some other huge files that can also be eliminated by dropping support for Octeon II and earlier. The error handling files are massive for those chips.
Much of the rest can be shrunk somewhat, but a lot of that code is still required.
There is a huge amount of code for dealing with our quad-lane modules (QLMs). The QLMs can be configured to run in a variety of modes, from PCIe, SGMII, SATA, XLAUI, XFI, Interlaken, SVRIO, QSGMII, XAUI, RXAUI and more. There is a lot of tuning and configuration code needed in order to handle different clocks, equalization, gain, AGC and a whole host of other serdes issues.
The MAC code is also quite large and complex since there are many coprocessors that must be configured. These chips are designed as network processors. While it makes their networking quite powerful and fast, it also means that a lot of programming is needed before they will work. There are input parser engines, buffer management engines, queueing engines, output engines and more that must be fully configured before any packets can be sent or received.
what I meant was that your customer shouldn't expect to get his custom code merged upstream as it is only with some cleanups. Of course an user/customer can decide to use U-Boot as system management and hardware initialisation tool but that doesn't correspond with U-Boot's design. I think most people would agree, that a proper OS like Linux should be doing the heavy network initialisation and hardware-offloading stuff as well as booting all remaining CPU cores. U-Boot's responsibilty should only be to boot that OS in the first CPU ;)
There is a fair bit of code used to bring additional cores out of reset. In our biggest configuration, there can be two Octeon CN78XX chips connected in tandem where each chip has 48 cores. In this case there is a lot of tuning that needs to happen with the lanes connecting the two chips before this configuration works reliably. There is a tuning process that is required to run on both sides (and the second chip runs a small binary image as well to perform its half of the tuning).
I do not know if this will change or not but the way the Linux kernel is booted on Octeon is not compatible with the standard boot commands. Part of this is due to the fact that Linux can be run in parallel with Simple Executive applications. It's even possible to run two copies of Linux simultaneously on different cores. To go along with this, there is also a mechanism with named memory blocks that is used. When bring cores out of reset for SE applications, the TLB entries need to be configured. There also is a fair bit of code dealing with core masks when choosing which cores are used for what.
We also have a named memory block feature which is used by Linux and simple executive applications where blocks of memory can be carved up. U-Boot needs to tie into this.
There are also a numerous other I/O interfaces that we also need to initialize. Unfortunately we also have some erratas we need to work around as well and a few are non-trivial.
The DRAM initialization code is also massive. It handles DDR3 and DDR4 for both registered and unregistered memory with ECC.
In many cases, the reason for the size of the code is due to the complexity of the SoC and the platforms built around it. You can think of CN78XX as being more like an enterprise-class server than a simple embedded device. The CN73XX is not too far behind the CN78XX. The only reason our Octeon TX2 U-Boot is so much smaller is that most of the early initialization takes place before U- Boot is started and the fact that a lot of the networking support (such as SFP management and PHY support) is handled by ATF as well as on-chip managment cores. This is necessary because Linux does not have any SFP management support
last year the PHY framework has been reworked to a phylink framework which supports hot-plugging and dynamically linking of PHY drivers with MAC drivers especially to support SFP modules. A SFP module driver is there as well. There was a talk on ELCE 2018 about this:
https://events19.linuxfoundation.org/wp-content/uploads/2017/12/chevallier-t...
nor can it handle the complex typologies we're frequently running into
today. The requirements of Redhat also preclude any additional software being installed in order for the networking support to run.
One thing I may need to re-introduce to U-Boot is the temperature sensor support for devices like this, since thermal monitoring is important.
this should be easy as U-Boot already has a thermal uclass within the driver model.
Some boards require a background task to perform periodic monitoring for certain events, including the board that needs to be upstreamed. I haven't checked if anything is available now, but what I did in the past was hook into the input function and while waiting for input it calls a user-defined polling function.
If interrupts are supported it makes the polling job easier.
Some of these phy drivers are extremely complex and need to tie into the SFP management. We also need to use a background polling thread while at the command prompt. A fair bit of our phy code is not in the normal phy drivers because it did not fit the model. Some of these phy drivers need to interact with the SFP support code in order to handle hot plug events in order to reconfigure themselves based on the cable type. The existing SFP code handles everything from SFP to SFP28 as well as QSFP and 100G QSFP (never tested).
In the old U-Boot the PHY support had to be significantly enhanced due to requirements for hot-plugging and how some of the PHYs are configured. It gets quite complicated with phys like the Inphi where one phy can handle either four ports (XFI/SGMII) or a single 4-lane port (XLAUI). It gets even worse since in some boards we use reclocking chips and there is one chip that handles the receive path of a QSFP and another that handles the transmit path. Further complicating things, with a QSFP it can be treated either as XLAUI or as four XFI ports, so you can have four ports spread across two chips, with each port using different slices of each chip. In the case of the Inphi/Cortina chip, a single device can handle one or four ports based on the configuration and it is configured by "slice" which is basically an offset into the MDIO register space. We had to jump through hoops in order to have this stuff work in a sane way in the device tree. We added entries for SFP and QSFP slots in the device tree which point to the MACs, GPIOs and I2C bus because pointing them to the phys just got too insane. This will need to be ported to the new U-Boot. It should not break the existing support since most of it was implemented outside of the core PHY handling code. In the port, it would be far better if this could be integrated in. The SFP management code is architecture agnostic as is all of the PHY support. The callbacks for the SFP support are used by the MAC which then notifies the PHY since the MAC often needs to reconfigure itself. It can handle some crazy configurations.
While I see some phy drivers that we also support, i.e. Cortina, our drivers tend to have a lot more functionality. For example, all of our phy drivers that support firmware support commands for upgrading the firmware as well as things like cable testing and other features.
PHY drivers and ethernet drivers should be really reduced to the required functionality to enable basic networking like Ping, DHCP, TFTP. U-Boot is still "just" a bootloader and not a system managemnt tool ;) You should do that stuff either in Linux or in a downstream fork.
This is the case for the most part. Unfortunately, many of these drivers require a lot of code and some require frequent monitoring to make adjustments. The SFP support is required to monitor what cable type is plugged in and to reprogram the phy as needed based on the type of cable. The 10G and 25G phys need different settings for optical/active vs passive copper vs SFP connectors. In addition, some require different settings based on the cable length and in some cases exceptions are needed for certain modules (there are a series of Avago SFP to Gigabit modules that require autonegotiation to be disabled in 1000Base-X mode). In at least one case there needs to be frequent polling to make adjustments (25G) as the equalization settings can change based on temperature. The SFP management code identifies the type of cable connected and its parameters so that the phy driver can adjust the appropriate settings. The SFP management code is generic and not tied to any one type of phy or MAC or brand of module. It also monitors all of the GPIO pins and will make callbacks when needed. Many phys lack the support for doing this themselves. Phys I have worked with that need this support include Cortina/ Inphi and several Microsemi/Vitesse devices.
The Inphy devices will typically handle four XFI lanes with four bi- directional slices with each slice given a different register range. Further complicating matters is that a QSFP port can either be four XFI interfaces or a single XLAUI interface. We have code to update the firmware for the Inphi chips, but this is small compared to the rest of the initialization code. These chips require that equalization and gain be configured on each slice based on the board and cable characteristics as well as LED configuration.
With the Microsemi reclocking chips, each chip has four unidirectional lanes. For a QSFP port, two chips are required with one chip configured for ingress and the other for egress. This can support either XLAUI or four XFI interfaces. When it is configured for XFI there are four XFI interfaces, since now four MACs are shared with two chips with each MAC going to one lane on each chip.
Also making things fun is that Inphi and the reclocking chips do not conform to the clause 45 standard at all. In the case of Inphi, the ID registers are 0.0 and 0.1 instead of 1.2 and 1.3 as they are in Clause 45.
The MAC drivers are also non-trivial. The Octeon chips are designed as network processors with a lot of hardware offloading and coprocessors. Bringing up a "simple Ethernet" interface is anything but simple. There are numerous offload engines that must be configured before it will work. While we do have one "simple" interface that can be configured, it often isn't because it's usually only good for a management port and many boards do not have this and the customers desire to be able to use any port.
Just configuring the interface between the MAC and PHY is also non-trivial. The Octeon (and later CPUs) have what are called "QLMs" or quad lane modules. These QLMs contain programmable serdes which can be configured for PCIe, SATA, XFI, XAUI, RXAUI, SGMII, 1000Base-X, XLAUI and a whole host of other interface types with a lot of tuning for things like equalization and clocks. The amount of QLM initialization code is quite large but necessary. There are a lot of clock and analog tuning parameters and sequences that must be run.
Sadly all of this is needed just for basic ping and DHCP. This isn't like a simple e1000 NIC or the NICs common with most SoCs.
as already stated this heavy networking stuff should be the task of an OS. I understand why you chose another way because Linux only recently got real support for SFP or more hardware-offloading capabilities but maybe you should take the chance and update your system design and submit missing functionality to Linux rather than adding a lot of networm management stuff to U-Boot.
Think of scaling from a Raspberry Pi to a dual-CPU XEON enterprise-class server with 96 cores and 256GiB of RAM with 10, 25 and 40Gbe ports but without a BCM or MCU to handle low-level board changes while also having many enterprise-class requirements for RAS, etc. That is why our code is so large and complex. There are a lot of hardware engines for offloading a lot of tasks since the chips are often used in security appliances. There are engines for ZIP compression, hardware regex engines, packet ordering engines, packet parsing engines, buffer management engines, RAID engines and a whole host of others. Many are not used in U-Boot, but a fair number are required for basic packet I/O.
For example, one of the boxes contains a CN78XX with 8 10G ports (where either can also be configured in XLAUI using 4 to 1 using a QSFP to SFP+ splitter cable. It has 128GiB of registered DDR4 DIMMS, 4 SATA drives, redundant power supplies and a whole host of other things including multiple temperature monitors. This uses an Inphi/Cortina phy chip that requires full SFP management support. With Inphi phys, the phy cannot drive LEDs based on traffic since it has no concept of packets, especially in XLAUI mode since each lane is independent of the others.
Another board, one I specifically have been told to upstream is a NIC that contains a CN73XX and two 10G/25G ports that go through a complex gearbox chip. Since there is no hardware support for LEDs in the Octeon SoC to indicate link and packet I/O this must be done in software (including U-Boot, customer requirement) and SFP port management is also a must. The phy is not at all a traditional phy. It uses i2c instead of MDIO and requires frequent monitoring of the link parameters (it's an older custom gearbox chip, there are newer and better chips that don't require this now). I have a hook while U-Boot is sitting at the prompt which allows for background tasks to operate while it's sitting.
I have several other NICs to support that use a Microsemi reclocking chip that has four unidirectional lanes per chip. The chip has zero intelligence and is shared between ports (and on some devices, multiple chips are shared between ports). Everything must be tuned based on the SFP/QSFP module type and cable length. LEDs also must be software driven. (The software driving of LEDs is eliminated in OcteonTX2). These chips have no way to drive the LEDs themselves to indicate packet I/O or link status.
There are also other boards that use the Microsemi reclocking chips. They were chosen in part due to the power budget and these chips are very low power (and inexpensive).
In all of these phy cases, all of the parameters are maintained in the device tree so the drivers are generic. Unfortunately these drivers also require SFP and QSFP management support.
I figure if there are several boards I need to upstream, it's not much more effort to port all of the boards to the new U-Boot. I've worked hard to minimize the board-specific code and make as much of it generic and based on the device tree as possible.
Someday I would love for SFP/QSFP infrastructure to get into Linux. Some NIC cards do it in their drivers, but I'd like to see generic infrastructure (like my U-Boot support). This might make it harder for some drivers to only support certain brands of modules too :) The generic code I wrote works with most modules except Intel (because they have bad checksums, but counterfeit Intel modules work fine!). It still can be expanded at some point since there is no support for module diagnostics other than identifying if it is present. Pretty much all it does is monitor the GPIO pins and parse and decode the EEPROM. The SFP code is generic enough such that any phy driver that needs it can easily hook into it.
as already noted this is already in Linux:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/driv...
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/driv...
Our bootloader needs to be able to be booted from a variety of sources, including SPI, eMMC, NOR flash and booting over the PCI bus from a host system. This is one reason we use virtual memory. The other reason is that it eliminates the need to perform relocation. Our start.S code handles all of these different cases as well as exception handling.
This is already supported for MIPS. You should try to use the generic SPL framework for that. Whether you like the relocation or not, it's one of the basic design principles of U-Boot. I guess it likely won't be accepted if you circumvent this. In fact by now we're sharing the same technology as Linux to have relocatable binaries without using gcc's -fPIC or -mabicalls to reduce the binary footprint. You can configure gd->ram_top to any address of your liking as reference address for the relocation.
I will look into this. One other complication is the fact that we require both a failsafe as well as a default bootloader. With the older U-Boot we got around all of this by just using TLB entries to map U-Boot to always run in the same virtual address regardless of the physical address. It eliminated any need for -fPIC and helped keep the binary small. For our older bootloader, it always executes at 0xC0000000 regardless of where it sits in physical memory. Using virtual memory also helps keep U-Boot simple and small.
I will also say up front that the memory initialization code is a mess and quite large (it was written by a hardware engineer who never heard of functions).
One thing is that this will break mips unless it is refactored like ARM is, for example, separating armv7 and armv8. This way we could have arch/mips/cpu/octeon. I did this with the old bootloader to separate our stuff. I'm open to suggestions as for the naming. I don't see how we can share much of the code with the other MIPS CPUs.
We have the same mach directory handling as in Linux MIPS. So you could easily add all your platform specific code (except drivers) to arch/mips/mach-octeon or (-cavium). Inside that directory you can have an include directory for you cusom header files, you can even override the generic files from arch/mips/include like in Linux. arch/mips/cpu and arch/mips/lib should only contain generic code. As already mentioned you could provide an own start.S inside arch/mips/mach-octeon but if possible you should try to reuse or extend the generic variant.
We can't use the existing start.S. We have a lot of requirements that are not supported there as well as a fair bit of code dedicated to dealing with the cache and TLBs and bringing additional cores out of reset. We make use of a boot bus movable region in order to do this and handle other cases like NMIs and the watchdog. Our start.S currently sits at around 3800 lines of code. Some is common but most is not.
Our start.S is designed to be able to boot both a failsafe and non-failsafe image and supports adjusting the flash mapping in order to start from an offset other than zero in the flash. There is also a fair bit of code for copying the image out of flash into the L2 cache for a significant speedup for DRAM initialization. I'm trying to get permission to share our existing code but I'm getting push-back (even though it's GPL!?!). How they want me to upstream it without sharing the code is beyond me.
While U-Boot has an exception handler, I believe ours is more comprehensive. It is written entirely in assembler and is not dependent on a working C runtime environment. It also dumps more information than just the registers such as the stack and a number of other exception registers and does some exception decoding. It's quite a bit better than the ARMv8 exception handler IMHO.
Putting this under mach-octeon will make it much easier. I'll try and re-use where I can.
All in all, I think the final port will add between 500K-1M lines of code for the Octeon CPU. It is much more extensive than what is required for OcteonTX since in the latter case most of the hardware initialization is done by earlier stage bootloaders and the ATF handles things like SFP port management and many of the networking operations.
I'm not sure how well I'll be able to upstream all of this code at this point since I was just handed this task. We already have at least 1M lines of code added to the old U-Boot which is based off of 2013.08 with a lot of backports.
I'm trying to get our existing code made available someplace online. I'm getting pushback even though U-Boot is GPL and the license on our SDK is BSD- like (i.e. do whatever you want but don't hold us responsible). It looks like it used to be available but was taken down. I don't undertstand lawyers. All of the code I wrote is GPL. There is some U-Boot specific code in our SDK, but none was copied from U-Boot. There also is some duplication of functionality between U-Boot and our SDK that I'll try and eliminate.
I have implemented just about every feature in U-Boot I could with our Octeon SoC. That's another reason it's so large. Some customer always comes back and says they want feature X to work. Fortunately, the changes to the U-Boot supplied code are generally minimal, despite it being so large.
I likely will need to add some more hooks to board_f.c and board_r.c. I have run into many cases where we need a specific order of initialization that does not match the normal U-Boot order. Perhaps make init_sequence_f and init_sequence_r weak so that they can be overridden if needed by a specific board or architecture. While much of the current init order works, we need some things initialized as quickly as possible and others initialized later. For example, the first thing we call is an early_errate_workaround function in the init sequence before anything else is called.
I guess overriding the complete generic board init code is not acceptable. It was once hard work to unify this. A hook like early_errate_workaround() sounds reasonable but could also be called from start.S before handing over to board_init_f(). But everything else should fit into the exisiting init hooks. There are quite a lot.

Dear Daniel & Aaron,
In message 7fdf93f6-412e-5fcf-da5e-17665daadb30@gmail.com you wrote:
Some other things we have included are a native API that allows Simple Executive applications to make calls into U-Boot for such things as environment variable access as well as access to block devices and filesystems.
This is one of the parts that shouldn't be needed for basic upstream support. It your API is a parallel and independent implementation of the API that U-Boot already has for standalone applications, than I'm afraid this won't be accepted and should be kept in a downstream fork.
The big question here is what these are intended for.
If they are indeed thought as standalone applications, especially containing code that shall not be disclosed unter GPL, then there is a licensing issue - the pretty hard restrictions of the API for standalone applications is intentional, and attempts to work around it are license violations.
But if it's just normal GPL code that is somehow dependent on U-Boot services, then why is it not linked against U-Boot?
Or this might be something ike dynamically loadable modules - well, then a close look is needed because such an approach has to be generic enough (end probably borrow much from Linux).
Best regards,
Wolfgang Denk

Hi Daniel,
On Wednesday, October 30, 2019 9:20:31 AM PDT Daniel Schwierzeck wrote:
Hi Aaron,
Am 27.10.19 um 03:34 schrieb Aaron Williams:
Hi Daniel,
On Friday, October 25, 2019 8:13:57 AM PDT Daniel Schwierzeck wrote:
External Email
Hi Aaron,
Am 23.10.19 um 05:50 schrieb Aaron Williams:
Hi all,
I have been tasked with porting our Octeon U-Boot to the latest U-Boot and merging it upstream. This will involve a very significant amount of code that generally will not be compatible with other MIPS processors due to our needs and requirements. For example, the start.S will need to be completely different than what is present. For example, our existing start.S is 3577 lines of code in order to deal with things like RAS, exceptions, virtual memory and more. We need to use virtual memory since U-Boot can be loaded at any 4MB boundary in memory, not just 0xbfc00000. A number of drivers will need to be updated in order to properly map pointers to physical addresses. This is needed anyway, since I see numerous drivers that assume that a pointer is a DMA address. For MIPS this is never the case (I'm looking at XHCI).
Good to see some progress in mainline Octeon support. Could you briefly describe the differences and commonalities in booting an Octeon CPU compared to other "generic" MIPS cores? Or could you point me to a public Git tree? It can't be that different because Linux kernel is also able to share most of the code ;)
Actually the low level code is significantly different. First of all, we need the U-Boot bootloader to be able to boot from different memory locations. Because of this, we use mapped memory for U-Boot. A side effect of this is that it eliminates the need for relocation when it is shifted to the top of memory. All we need to do is just set a couple of TLB entries.
Understood. but still U-Boot relocates itself from its initial entry memory address to its destination memory address based on gd->ram_top. Maybe this is ineffective nowadays with various SPL/TPL boot methods because U-Boot proper is already loaded to an executable memory location by SPL, but you have to initially deal with that design. Feel free to suggest/submit a patch for the generic board init code to make the reloaction configurable.
We do this relocation as well, however the way we do it is by changing a couple of TLB entries. This lets U-Boot begin execution from any memory location, be it flash, L2 cache or RAM. It also lets us statically link U-Boot to run at a fixed address, in our case 0xC0000000. The relocation happens transparently in the start.S code. This also makes our bootloader smaller. None of the U-Boot code is affected since on MIPS pointers cannot be used for DMA anyway. The functions that map pointers to DMA addresses work as they should. The only issues I have found are drivers that don't use this and would break on MIPS anyway. We have a SPL loader for our CN7XXX series since the L2 cache is too small to otherwise fit the entire bootloader. Even this is a challenge to make fit since the code to initialize DDR4 memory is very large so every bit of space savings helps.
As far as U-Boot is concerned, we just treat it as if relocation is disabled since with virtual memory it isn't needed. I even got it working with the API for running standalone apps without requiring any changes to the existing code other than to add the MIPS specific changes for our environment.
This might be something to consider in the future on some platforms where "relocation" could be performed by just adjusting the TLB or page tables. MIPS makes this particularly easy.
I have attached a copy of our existing start.S code. It needs a bit of work for the new U-Boot since currently locking the cache and allocating GD on the stack are done in board_init_f(). The changes are fairly easy to make. I also need to strip out the code for CN6XXX and earlier.
The assembly code is significantly different and is far more extensive.
Additionally, the way Octeon Linux is booted is different.
The generic start.S is not usable in our case.
We have a significant amount of code for dealing with the cache and for things like copying U-Boot from flash into the L2 cache. We also have to deal with taking other cores out of reset in our start.S. Our exception handler has also been extended to handle multiple cores.
it's hard to discuss this without example code but I still think the basic principles of cache and exception handling can't be that different from generic MIPS cores. Locking cache lines and loading code to it could be useful for other MIPS platforms and should be added as generic feature. BTW the exception handler code is a port of the Linux one, I only skipped the stack trace output because of the complicated stack unwinding code. I think the current dump of general and CP0 and EPC registers is more than feasible for a bootloader. It already helped me multiple times to quickly locate code locations with e.g. null pointer dereferencing.
I have attached our start.S code which includes this. In addition, our version also dumps out the stack. NULL pointers aren't the easiest to catch since typically 0 is a valid memory location. I suppose I could just add a TLB entry to mark the first 4K memory as invalid.
Some other things we have included are a native API that allows Simple Executive applications to make calls into U-Boot for such things as environment variable access as well as access to block devices and filesystems.
This is one of the parts that shouldn't be needed for basic upstream support. It your API is a parallel and independent implementation of the API that U-Boot already has for standalone applications, than I'm afraid this won't be accepted and should be kept in a downstream fork.
That's fine. The code is actually quite small. It has some custom APIs unique to our needs. We have need to call into the phy code from these applications. I don't know if this could work with the general API or not. One reason we did this is because we wanted all addresses passed to U-Boot to be physical addresses. We need to context switch since these applications have their own memory mapping (hence the requirement for physical addresses). We save the TLB mapping of the application and set up the U-Boot TLBs then restore that afterwards. For pointers we just use XKPHYS addresses. With the API, though, I set it up so that applications are linked at another virtual address which can access the U-Boot virtual address directly. I think I used 0xd0000000 for those. This didn't require any changes to the API other than the assembly code and linker scripts.
We used to have our Octeon SDK available for download but it seems this has been taken down :( I'm trying to find out how I can make it available but I'm getting pushback in sharing our GPLed U-Boot even though it is GPL.>
In principle you could compile an own start.S in your mach-octeon directory, but you should try to use the generic start.S which is already customisable and extensible. If needed, we could add more extension points to it. Booting from any custom memory address is already supported and very common for other MIPS based SoC's. Exception support is also already there.
The bootloader needs to be able to start from multiple memory locations without recompiling. Our existing bootloader can run from any 4MB boundary without recompiling or relocation. It can start out of flash (from any sector boundary, not just 0) or L2 cache. Starting by L2 cache is supported by eMMC, SPI and PCI target bootloaders. Additionally the same bootloader can be started from RAM such as when the failsafe bootloader starts the main bootloader. In most cases, the failsafe is the same full-featured bootloader since it fits entirely within the L2 cache. Our only bootloader requirement is that it fits in the L2 cache (except when booting from Flash, though this is preferred for speed) and that it remain under 4 MiB in size.
I believe our exception handling is more extensive than the standard U-Boot exception handler. It includes the stack output as well as numerous COP0 registers and decoding the cause of the exception. The exception handler is also independent of a working C environment. We also need to handle exceptions occurring on multiple cores as they're brought out of reset and not all cases are exceptions.
as I wrote above, the current exception handling is already feasible in almost all cases to quickly locate code bugs and doesn't need much code. Adding stack trace output would required adding a lot of more code. But if you only missing some registers or want to dump the stack itself, feel free to extend the current code.
That's fine. The only other thing we do is we carve out a bit of the L1 cache for a temporary stack. That way the exception handler has zero dependency on memory. Currently it's all in assembly language as well.
Cores are first powered on and kept in a halted state, then
We do more than that. We need to take the cores out of the halted state and do some more processing before starting applications. I hope to provide some examples later.
later when we start the Linux kernel or simple executive applications, the exception handler is updated (via a bootbus moveable memory region) and an NMI is generated for the cores where they will begin executing code out of start.S before moving to the code that sets up the environment for booting Linux and/or simple executive applications. In the latter case, TLB entries are programmed in for each core.
The new Octeon U-Boot will be native 64-bit instead of how the earlier one was 32-bit using the N32 ABI (so 64-bit addresses could be accessed). We had to jump through some hoops to make a 32-bit U-Boot fully support 64-bit hardware.
We have 64 bit support for MIPS. I even sync'ed the asm/io stuff from Linux in the past (which includes support for Octeon) so that you would be able to use the standard IO primitives and ioremap stuff and hook in your platform-specifc memory mappings.
That is good to know. What I have run into is the fact that many drivers do not support I/O remapping. I.e. XHCI assumes that a pointer is a DMA address. Also, does the 64-bit support handle multiple cores in U-Boot?
we already have stuff like dev_remap_addr(struct udevice* dev) as part of the driver model API to map your physical addresses from device tree to virtual addresses. This is used in all drivers compatible with MIPS. That function is backed by the MIPS specific ioremap_nocache() function (also ported from Linux) so that you can hook in platform specific mapping code. If you want to use existing drivers which don't do remapping yet, you have to patch them. But this should be simple, we recently did that on Broadcom or Mediatek platforms, which are sharing drivers between their MIPS and ARM CPUs.
That's what we take advantage of :) This allows the drivers to work fine when virtual memory is used.
For XHCI you probably only need to patch the xhci_readl() and xhci_writel() functions and establish the memory mappings in your platform specific glue code. But USB support shouldn't be your first priority ;)
The readl and writel are used for accessing the registers. Those aren't the problem. The problem comes when setting up the descriptors in memory. The descriptors need to use the memory mapping. That's the part that's missing. It's not difficult to fix. I think I also found a few endian issues as well since we run in big endian mode.
I agree about using the standard ioremap stuff. I'm only pointing out that there are places where it is missing in the common U-Boot code. Where it is present, there won't be any issues since traditionally I used those methods to call our platform specific remapping. I will look to see what is present and if it will work or not.
yes, those places need some patching anyway. There is already an ongoing task to address this:
https://urldefense.proofpoint.com/v2/url?u=https-3A__gitlab.denx.de_u-2Dboot _custodians_u-2Dboot-2Dmips_issues_15&d=DwICaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=3y fMNumMHGMnOfmVc0dViBi3fJfF8ZXRL_aRWSIGwm4&m=knQuIYR9b2vNU-i0lQUe1OVT1ibM48_K zERoDPCHSoA&s=V0kRRm5AwodTkHkcaAvQVrfc2vmMQnw5FESKi5KQW08&e=
I think I can help there. I've already spent a fair bit of time on this with XHCI which I backported. I still have a major common XHCI issue to fix when short packets are received. The U-Boot code does not handle this case properly. It's easy to reproduce the case. Use a USB to Ethernet adapter and have the receive buffer cross a 64K boundary and bad things will happen.
I think we can shrink the code by removing support for starting "simple executive" tasks. Simple executive tasks are bare metal applications that can run on dedicated cores beside Linux (or without Linux). I will also not be porting any support for anything older than Octeon3.
We also make heavy use of our SDK in order to perform hardware initialization and networking. In our old U-Boot, we have almost 900K lines of code. I can cut out much of this but much will remain.
We also have added extensive infrastructure for handling SFP and QSFP cables as well as very extensive phy support for phys from Aquantia/Marvell, Vitesse/Microsemi, Inphi/Cortina and an Avago gearbox. Our customer wants us to port all of this to the new U-Boot and upstream it. I'm worried about the sheer amount of code since it is absolutely massive.
Maybe you should cut down your customers expectations a bit. According to sloccount we currently have 1.6M SLOC for the whole U-Boot. I guess Tom or Wolfgang wouldn't agree with adding another 900k only for one CPU. Actually what should be upstream is the basic CPU, driver and board support to be able to boot a mainline kernel. Everything else like custom bare metal applications or the SFP/PHY handling stuff mentioned below could also be maintained in a downstream tree. Maybe Wolfgang is willing to host one on gitlab.denx.de.
I will try and cut it down. Much of the code is register definitions. The register definition files are auto-generated and tend to be huge. They're fully commented and include both big and little endian bitfields. In this case I can do like I did for OcteonTX and modify the scripts that generate these headers to strip out the little-endian and comments. There is a huge amount of code for configuring our QLM hardware interfaces. We also have a lot of code for SFP/QSFP ports.
There are some other huge files that can also be eliminated by dropping support for Octeon II and earlier. The error handling files are massive for those chips.
Much of the rest can be shrunk somewhat, but a lot of that code is still required.
There is a huge amount of code for dealing with our quad-lane modules (QLMs). The QLMs can be configured to run in a variety of modes, from PCIe, SGMII, SATA, XLAUI, XFI, Interlaken, SVRIO, QSGMII, XAUI, RXAUI and more. There is a lot of tuning and configuration code needed in order to handle different clocks, equalization, gain, AGC and a whole host of other serdes issues.
The MAC code is also quite large and complex since there are many coprocessors that must be configured. These chips are designed as network processors. While it makes their networking quite powerful and fast, it also means that a lot of programming is needed before they will work. There are input parser engines, buffer management engines, queueing engines, output engines and more that must be fully configured before any packets can be sent or received.
what I meant was that your customer shouldn't expect to get his custom code merged upstream as it is only with some cleanups. Of course an user/customer can decide to use U-Boot as system management and hardware initialisation tool but that doesn't correspond with U-Boot's design. I think most people would agree, that a proper OS like Linux should be doing the heavy network initialisation and hardware-offloading stuff as well as booting all remaining CPU cores. U-Boot's responsibilty should only be to boot that OS in the first CPU ;)
There is a fair bit of code used to bring additional cores out of reset. In our biggest configuration, there can be two Octeon CN78XX chips connected in tandem where each chip has 48 cores. In this case there is a lot of tuning that needs to happen with the lanes connecting the two chips before this configuration works reliably. There is a tuning process that is required to run on both sides (and the second chip runs a small binary image as well to perform its half of the tuning).
I do not know if this will change or not but the way the Linux kernel is booted on Octeon is not compatible with the standard boot commands. Part of this is due to the fact that Linux can be run in parallel with Simple Executive applications. It's even possible to run two copies of Linux simultaneously on different cores. To go along with this, there is also a mechanism with named memory blocks that is used. When bring cores out of reset for SE applications, the TLB entries need to be configured. There also is a fair bit of code dealing with core masks when choosing which cores are used for what.
We also have a named memory block feature which is used by Linux and simple executive applications where blocks of memory can be carved up. U-Boot needs to tie into this.
There are also a numerous other I/O interfaces that we also need to initialize. Unfortunately we also have some erratas we need to work around as well and a few are non-trivial.
The DRAM initialization code is also massive. It handles DDR3 and DDR4 for both registered and unregistered memory with ECC.
In many cases, the reason for the size of the code is due to the complexity of the SoC and the platforms built around it. You can think of CN78XX as being more like an enterprise-class server than a simple embedded device. The CN73XX is not too far behind the CN78XX. The only reason our Octeon TX2 U-Boot is so much smaller is that most of the early initialization takes place before U- Boot is started and the fact that a lot of the networking support (such as SFP management and PHY support) is handled by ATF as well as on-chip managment cores. This is necessary because Linux does not have any SFP management support
last year the PHY framework has been reworked to a phylink framework which supports hot-plugging and dynamically linking of PHY drivers with MAC drivers especially to support SFP modules. A SFP module driver is there as well. There was a talk on ELCE 2018 about this:
I will look at this. The code I wrote can handle some really crazy configurations. I may want to modify some of the drivers we have to be "virtual MACs" such as Inphi. Also of note that not all phys use MDIO. Two of the ones I work with use i2c and there has been talk of using other methods of communicating with the phy.
https://urldefense.proofpoint.com/v2/url?u=https-3A__events19.linuxfoundatio n.org_wp-2Dcontent_uploads_2017_12_chevallier-2Dtenart-2Dfrom-2Dthe-2Dethern et-2Dmac-2Dto-2Dthe-2Dlink-2Dpartner.pdf&d=DwICaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r =3yfMNumMHGMnOfmVc0dViBi3fJfF8ZXRL_aRWSIGwm4&m=knQuIYR9b2vNU-i0lQUe1OVT1ibM4 8_KzERoDPCHSoA&s=__bT79VjAensVB_6dAcDvepvNRxCf_TlQVYrRTo8exo&e=
nor can it handle the complex typologies we're frequently running into
today. The requirements of Redhat also preclude any additional software being installed in order for the networking support to run.
One thing I may need to re-introduce to U-Boot is the temperature sensor support for devices like this, since thermal monitoring is important.
this should be easy as U-Boot already has a thermal uclass within the driver model.
I just noticed that. It looked like for a while it was removed. :)
Some boards require a background task to perform periodic monitoring for certain events, including the board that needs to be upstreamed. I haven't checked if anything is available now, but what I did in the past was hook into the input function and while waiting for input it calls a user-defined polling function.
If interrupts are supported it makes the polling job easier.
Some of these phy drivers are extremely complex and need to tie into the SFP management. We also need to use a background polling thread while at the command prompt. A fair bit of our phy code is not in the normal phy drivers because it did not fit the model. Some of these phy drivers need to interact with the SFP support code in order to handle hot plug events in order to reconfigure themselves based on the cable type. The existing SFP code handles everything from SFP to SFP28 as well as QSFP and 100G QSFP (never tested).
In the old U-Boot the PHY support had to be significantly enhanced due to requirements for hot-plugging and how some of the PHYs are configured. It gets quite complicated with phys like the Inphi where one phy can handle either four ports (XFI/SGMII) or a single 4-lane port (XLAUI). It gets even worse since in some boards we use reclocking chips and there is one chip that handles the receive path of a QSFP and another that handles the transmit path. Further complicating things, with a QSFP it can be treated either as XLAUI or as four XFI ports, so you can have four ports spread across two chips, with each port using different slices of each chip. In the case of the Inphi/Cortina chip, a single device can handle one or four ports based on the configuration and it is configured by "slice" which is basically an offset into the MDIO register space. We had to jump through hoops in order to have this stuff work in a sane way in the device tree. We added entries for SFP and QSFP slots in the device tree which point to the MACs, GPIOs and I2C bus because pointing them to the phys just got too insane. This will need to be ported to the new U-Boot. It should not break the existing support since most of it was implemented outside of the core PHY handling code. In the port, it would be far better if this could be integrated in. The SFP management code is architecture agnostic as is all of the PHY support. The callbacks for the SFP support are used by the MAC which then notifies the PHY since the MAC often needs to reconfigure itself. It can handle some crazy configurations.
While I see some phy drivers that we also support, i.e. Cortina, our drivers tend to have a lot more functionality. For example, all of our phy drivers that support firmware support commands for upgrading the firmware as well as things like cable testing and other features.
PHY drivers and ethernet drivers should be really reduced to the required functionality to enable basic networking like Ping, DHCP, TFTP. U-Boot is still "just" a bootloader and not a system managemnt tool ;) You should do that stuff either in Linux or in a downstream fork.
This is the case for the most part. Unfortunately, many of these drivers require a lot of code and some require frequent monitoring to make adjustments. The SFP support is required to monitor what cable type is plugged in and to reprogram the phy as needed based on the type of cable. The 10G and 25G phys need different settings for optical/active vs passive copper vs SFP connectors. In addition, some require different settings based on the cable length and in some cases exceptions are needed for certain modules (there are a series of Avago SFP to Gigabit modules that require autonegotiation to be disabled in 1000Base-X mode). In at least one case there needs to be frequent polling to make adjustments (25G) as the equalization settings can change based on temperature. The SFP management code identifies the type of cable connected and its parameters so that the phy driver can adjust the appropriate settings. The SFP management code is generic and not tied to any one type of phy or MAC or brand of module. It also monitors all of the GPIO pins and will make callbacks when needed. Many phys lack the support for doing this themselves. Phys I have worked with that need this support include Cortina/ Inphi and several Microsemi/Vitesse devices.
The Inphy devices will typically handle four XFI lanes with four bi- directional slices with each slice given a different register range. Further complicating matters is that a QSFP port can either be four XFI interfaces or a single XLAUI interface. We have code to update the firmware for the Inphi chips, but this is small compared to the rest of the initialization code. These chips require that equalization and gain be configured on each slice based on the board and cable characteristics as well as LED configuration.
With the Microsemi reclocking chips, each chip has four unidirectional lanes. For a QSFP port, two chips are required with one chip configured for ingress and the other for egress. This can support either XLAUI or four XFI interfaces. When it is configured for XFI there are four XFI interfaces, since now four MACs are shared with two chips with each MAC going to one lane on each chip.
Also making things fun is that Inphi and the reclocking chips do not conform to the clause 45 standard at all. In the case of Inphi, the ID registers are 0.0 and 0.1 instead of 1.2 and 1.3 as they are in Clause 45.
The MAC drivers are also non-trivial. The Octeon chips are designed as network processors with a lot of hardware offloading and coprocessors. Bringing up a "simple Ethernet" interface is anything but simple. There are numerous offload engines that must be configured before it will work. While we do have one "simple" interface that can be configured, it often isn't because it's usually only good for a management port and many boards do not have this and the customers desire to be able to use any port.
Just configuring the interface between the MAC and PHY is also non-trivial. The Octeon (and later CPUs) have what are called "QLMs" or quad lane modules. These QLMs contain programmable serdes which can be configured for PCIe, SATA, XFI, XAUI, RXAUI, SGMII, 1000Base-X, XLAUI and a whole host of other interface types with a lot of tuning for things like equalization and clocks. The amount of QLM initialization code is quite large but necessary. There are a lot of clock and analog tuning parameters and sequences that must be run.
Sadly all of this is needed just for basic ping and DHCP. This isn't like a simple e1000 NIC or the NICs common with most SoCs.
as already stated this heavy networking stuff should be the task of an OS. I understand why you chose another way because Linux only recently got real support for SFP or more hardware-offloading capabilities but maybe you should take the chance and update your system design and submit missing functionality to Linux rather than adding a lot of networm management stuff to U-Boot.
Unfortunately, without the support in U-Boot, networking just won't work at all. The U-Boot drivers do not use any of the heavy lifting features. Unfortunately there is still a lot of code that needs to execute just for ping.
Think of scaling from a Raspberry Pi to a dual-CPU XEON enterprise-class server with 96 cores and 256GiB of RAM with 10, 25 and 40Gbe ports but without a BCM or MCU to handle low-level board changes while also having many enterprise-class requirements for RAS, etc. That is why our code is so large and complex. There are a lot of hardware engines for offloading a lot of tasks since the chips are often used in security appliances. There are engines for ZIP compression, hardware regex engines, packet ordering engines, packet parsing engines, buffer management engines, RAID engines and a whole host of others. Many are not used in U-Boot, but a fair number are required for basic packet I/O.
For example, one of the boxes contains a CN78XX with 8 10G ports (where either can also be configured in XLAUI using 4 to 1 using a QSFP to SFP+ splitter cable. It has 128GiB of registered DDR4 DIMMS, 4 SATA drives, redundant power supplies and a whole host of other things including multiple temperature monitors. This uses an Inphi/Cortina phy chip that requires full SFP management support. With Inphi phys, the phy cannot drive LEDs based on traffic since it has no concept of packets, especially in XLAUI mode since each lane is independent of the others.
Another board, one I specifically have been told to upstream is a NIC that contains a CN73XX and two 10G/25G ports that go through a complex gearbox chip. Since there is no hardware support for LEDs in the Octeon SoC to indicate link and packet I/O this must be done in software (including U-Boot, customer requirement) and SFP port management is also a must. The phy is not at all a traditional phy. It uses i2c instead of MDIO and requires frequent monitoring of the link parameters (it's an older custom gearbox chip, there are newer and better chips that don't require this now). I have a hook while U-Boot is sitting at the prompt which allows for background tasks to operate while it's sitting.
I have several other NICs to support that use a Microsemi reclocking chip that has four unidirectional lanes per chip. The chip has zero intelligence and is shared between ports (and on some devices, multiple chips are shared between ports). Everything must be tuned based on the SFP/QSFP module type and cable length. LEDs also must be software driven. (The software driving of LEDs is eliminated in OcteonTX2). These chips have no way to drive the LEDs themselves to indicate packet I/O or link status.
There are also other boards that use the Microsemi reclocking chips. They were chosen in part due to the power budget and these chips are very low power (and inexpensive).
In all of these phy cases, all of the parameters are maintained in the device tree so the drivers are generic. Unfortunately these drivers also require SFP and QSFP management support.
I figure if there are several boards I need to upstream, it's not much more effort to port all of the boards to the new U-Boot. I've worked hard to minimize the board-specific code and make as much of it generic and based on the device tree as possible.
Someday I would love for SFP/QSFP infrastructure to get into Linux. Some NIC cards do it in their drivers, but I'd like to see generic infrastructure (like my U-Boot support). This might make it harder for some drivers to only support certain brands of modules too :) The generic code I wrote works with most modules except Intel (because they have bad checksums, but counterfeit Intel modules work fine!). It still can be expanded at some point since there is no support for module diagnostics other than identifying if it is present. Pretty much all it does is monitor the GPIO pins and parse and decode the EEPROM. The SFP code is generic enough such that any phy driver that needs it can easily hook into it.
as already noted this is already in Linux:
https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_ linux_kernel_git_torvalds_linux.git_tree_drivers_net_phy_phylink.c&d=DwICaQ& c=nKjWec2b6R0mOyPaz7xtfQ&r=3yfMNumMHGMnOfmVc0dViBi3fJfF8ZXRL_aRWSIGwm4&m=knQ uIYR9b2vNU-i0lQUe1OVT1ibM48_KzERoDPCHSoA&s=p672bj1xBj_xHCzdr0pvpPNg4qe_LA0Pc R7Sa4J9OQA&e=
https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_ linux_kernel_git_torvalds_linux.git_tree_drivers_net_phy_sfp.c&d=DwICaQ&c=nK jWec2b6R0mOyPaz7xtfQ&r=3yfMNumMHGMnOfmVc0dViBi3fJfF8ZXRL_aRWSIGwm4&m=knQuIYR 9b2vNU-i0lQUe1OVT1ibM48_KzERoDPCHSoA&s=uCs-21llsi62iM9tfPQIHGyU1qVnoYaQVwVX6 TZwaO0&e=
Unfortunately, for high speed interfaces (which our customers use in U-Boot for tftpboot, a fair bit needs to be implemented just to work. The way the code is architect ed there isn't much impact to the existing U-Boot code unless it needs to take advantage of it.
Our bootloader needs to be able to be booted from a variety of sources, including SPI, eMMC, NOR flash and booting over the PCI bus from a host system. This is one reason we use virtual memory. The other reason is that it eliminates the need to perform relocation. Our start.S code handles all of these different cases as well as exception handling.
This is already supported for MIPS. You should try to use the generic SPL framework for that. Whether you like the relocation or not, it's one of the basic design principles of U-Boot. I guess it likely won't be accepted if you circumvent this. In fact by now we're sharing the same technology as Linux to have relocatable binaries without using gcc's -fPIC or -mabicalls to reduce the binary footprint. You can configure gd->ram_top to any address of your liking as reference address for the relocation.
I will look into this. One other complication is the fact that we require both a failsafe as well as a default bootloader. With the older U-Boot we got around all of this by just using TLB entries to map U-Boot to always run in the same virtual address regardless of the physical address. It eliminated any need for -fPIC and helped keep the binary small. For our older bootloader, it always executes at 0xC0000000 regardless of where it sits in physical memory. Using virtual memory also helps keep U-Boot simple and small.
I will also say up front that the memory initialization code is a mess and quite large (it was written by a hardware engineer who never heard of functions).
One thing is that this will break mips unless it is refactored like ARM is, for example, separating armv7 and armv8. This way we could have arch/mips/cpu/octeon. I did this with the old bootloader to separate our stuff. I'm open to suggestions as for the naming. I don't see how we can share much of the code with the other MIPS CPUs.
We have the same mach directory handling as in Linux MIPS. So you could easily add all your platform specific code (except drivers) to arch/mips/mach-octeon or (-cavium). Inside that directory you can have an include directory for you cusom header files, you can even override the generic files from arch/mips/include like in Linux. arch/mips/cpu and arch/mips/lib should only contain generic code. As already mentioned you could provide an own start.S inside arch/mips/mach-octeon but if possible you should try to reuse or extend the generic variant.
We can't use the existing start.S. We have a lot of requirements that are not supported there as well as a fair bit of code dedicated to dealing with the cache and TLBs and bringing additional cores out of reset. We make use of a boot bus movable region in order to do this and handle other cases like NMIs and the watchdog. Our start.S currently sits at around 3800 lines of code. Some is common but most is not.
Our start.S is designed to be able to boot both a failsafe and non-failsafe image and supports adjusting the flash mapping in order to start from an offset other than zero in the flash. There is also a fair bit of code for copying the image out of flash into the L2 cache for a significant speedup for DRAM initialization. I'm trying to get permission to share our existing code but I'm getting push-back (even though it's GPL!?!). How they want me to upstream it without sharing the code is beyond me.
While U-Boot has an exception handler, I believe ours is more comprehensive. It is written entirely in assembler and is not dependent on a working C runtime environment. It also dumps more information than just the registers such as the stack and a number of other exception registers and does some exception decoding. It's quite a bit better than the ARMv8 exception handler IMHO.
Putting this under mach-octeon will make it much easier. I'll try and re-use where I can.
All in all, I think the final port will add between 500K-1M lines of code for the Octeon CPU. It is much more extensive than what is required for OcteonTX since in the latter case most of the hardware initialization is done by earlier stage bootloaders and the ATF handles things like SFP port management and many of the networking operations.
I'm not sure how well I'll be able to upstream all of this code at this point since I was just handed this task. We already have at least 1M lines of code added to the old U-Boot which is based off of 2013.08 with a lot of backports.
I'm trying to get our existing code made available someplace online. I'm getting pushback even though U-Boot is GPL and the license on our SDK is BSD- like (i.e. do whatever you want but don't hold us responsible). It looks like it used to be available but was taken down. I don't undertstand lawyers. All of the code I wrote is GPL. There is some U-Boot specific code in our SDK, but none was copied from U-Boot. There also is some duplication of functionality between U-Boot and our SDK that I'll try and eliminate.
I have implemented just about every feature in U-Boot I could with our Octeon SoC. That's another reason it's so large. Some customer always comes back and says they want feature X to work. Fortunately, the changes to the U-Boot supplied code are generally minimal, despite it being so large.
I likely will need to add some more hooks to board_f.c and board_r.c. I have run into many cases where we need a specific order of initialization that does not match the normal U-Boot order. Perhaps make init_sequence_f and init_sequence_r weak so that they can be overridden if needed by a specific board or architecture. While much of the current init order works, we need some things initialized as quickly as possible and others initialized later. For example, the first thing we call is an early_errate_workaround function in the init sequence before anything else is called.
I guess overriding the complete generic board init code is not acceptable. It was once hard work to unify this. A hook like early_errate_workaround() sounds reasonable but could also be called from start.S before handing over to board_init_f(). But everything else should fit into the exisiting init hooks. There are quite a lot.
I agree. I did some more research and noticed that it's not uncommon to have other functions called before board_init_f by the start code. I also noticed that there appear to be quite a few places where custom board_init_f functions are defined. I will try and avoid this. Back when I did this port in 2012 things were a lot more limited.
Would marking a few functions as weak be acceptable? This would help keep #ifdefs to a minimum. I have found that doing this as well as adding hooks in some key places can really minimize the use of #ifdefs and keep the code cleaner. In our common board code I did this a lot. That way there is nothing specific to any single board in there and any board can override whatever functionality it needs to do. Our existing U-Boot supports 83 boards, though many of these will go away (and some are no longer tested).
-Aaron

Dear Aaron,
In message 1932577.QJWW3v3lL8@flash you wrote:
We do this relocation as well, however the way we do it is by changing a couple of TLB entries. This lets U-Boot begin execution from any memory location, be it flash, L2 cache or RAM. It also lets us statically link U-Boot to run at a fixed address, in our case 0xC0000000. The relocation happens
It seems you have missed the primary purpose of relocation. The interesting thing is not the start address, but the end address of U-Boot in memory, as we alsways try to place the U-Boot code and data at the very end of the available memory (and yes, this includes systems which can cam with different memory sizes). Additionally, we want to be able to reserve additional memry at the end of RAM, above U-Boot, so it can even be kept across warm boots. Features like protected RAM (PRAM), shared log buffers, shared video memory etc. come in to mind here.
This might be something to consider in the future on some platforms where "relocation" could be performed by just adjusting the TLB or page tables. MIPS makes this particularly easy.
This cannot be done, not without castrating U-Boot from a number of features that require allocation at the end of the available RAM, see above.
That's fine. The code is actually quite small. It has some custom APIs unique to our needs. We have need to call into the phy code from these applications. I don't know if this could work with the general API or not. One reason we did
What exactly do you need this for? Why don't you just link your code with the rest of U-Boot?
Best regards,
Wolfgang Denk

On Thursday, October 31, 2019 3:36:10 AM PDT Wolfgang Denk wrote:
Dear Aaron,
In message 1932577.QJWW3v3lL8@flash you wrote:
We do this relocation as well, however the way we do it is by changing a couple of TLB entries. This lets U-Boot begin execution from any memory location, be it flash, L2 cache or RAM. It also lets us statically link U-Boot to run at a fixed address, in our case 0xC0000000. The relocation happens
It seems you have missed the primary purpose of relocation. The interesting thing is not the start address, but the end address of U-Boot in memory, as we alsways try to place the U-Boot code and data at the very end of the available memory (and yes, this includes systems which can cam with different memory sizes). Additionally, we want to be able to reserve additional memry at the end of RAM, above U-Boot, so it can even be kept across warm boots. Features like protected RAM (PRAM), shared log buffers, shared video memory etc. come in to mind here.
This is exactly what we do. We use a high virtual address and always move it to the end of physical memory.
This might be something to consider in the future on some platforms where "relocation" could be performed by just adjusting the TLB or page tables. MIPS makes this particularly easy.
This cannot be done, not without castrating U-Boot from a number of features that require allocation at the end of the available RAM, see above.
That's fine. The code is actually quite small. It has some custom APIs unique to our needs. We have need to call into the phy code from these applications. I don't know if this could work with the general API or not. One reason we did
What exactly do you need this for? Why don't you just link your code with the rest of U-Boot?
We need it to obtain and modify the phy parameters. This is a custom 25G gearbox that needs a lot of hand holding. This may end up being a low priority (not the gearbox, but the API). It's only a few hundred lines of code (the API).
Best regards,
Wolfgang Denk
-Aaron

Dear Aaron,
In message 2710076.TiSPtmOvtb@flash you wrote:
What exactly do you need this for? Why don't you just link your code with the rest of U-Boot?
We need it to obtain and modify the phy parameters. This is a custom 25G gearbox that needs a lot of hand holding. This may end up being a low priority (not the gearbox, but the API). It's only a few hundred lines of code (the API).
Again you don't answer my question. Why do you need a special new API for such code? Why do you not just link that code with the rest of U-Boot?
It has been mentioned before, but just to be sure: this code which uses your new API is licensed under a GPLv2 conforming lincense?
Best regards,
Wolfgang Denk

On Mon, Nov 04, 2019 at 04:44:18PM +0100, Wolfgang Denk wrote:
Dear Aaron,
In message 2710076.TiSPtmOvtb@flash you wrote:
What exactly do you need this for? Why don't you just link your code with the rest of U-Boot?
We need it to obtain and modify the phy parameters. This is a custom 25G gearbox that needs a lot of hand holding. This may end up being a low priority (not the gearbox, but the API). It's only a few hundred lines of code (the API).
Again you don't answer my question. Why do you need a special new API for such code? Why do you not just link that code with the rest of U-Boot?
It has been mentioned before, but just to be sure: this code which uses your new API is licensed under a GPLv2 conforming lincense?
And, to be blunt, if it is not, handling your non-GPLv2 applications via an EFI binary is the way forward, not extending the U-Boot binary ABI, in my opinion.

On Monday, November 4, 2019 8:23:08 AM PST Tom Rini wrote:
On Mon, Nov 04, 2019 at 04:44:18PM +0100, Wolfgang Denk wrote:
Dear Aaron,
In message 2710076.TiSPtmOvtb@flash you wrote:
What exactly do you need this for? Why don't you just link your code with the rest of U-Boot?
We need it to obtain and modify the phy parameters. This is a custom 25G gearbox that needs a lot of hand holding. This may end up being a low priority (not the gearbox, but the API). It's only a few hundred lines of code (the API).
Again you don't answer my question. Why do you need a special new API for such code? Why do you not just link that code with the rest of U-Boot?
It has been mentioned before, but just to be sure: this code which uses your new API is licensed under a GPLv2 conforming lincense?
And, to be blunt, if it is not, handling your non-GPLv2 applications via an EFI binary is the way forward, not extending the U-Boot binary ABI, in my opinion.
To be blunt, the current U-Boot EFI driver does not provide the required functionality. It would need to be extended in order to work. In addition, spinlocks would be required in order to handle the case of reentrancy. Also, how does the EFI loader deal with loading multiple applications across multiple cores? The block support is the least important part of it. There are several other services not related to block devices or network calls.
-Aaron

Dear Aaron,
In message 1838672.aZrPjDvGh8@flash you wrote:
To be blunt, the current U-Boot EFI driver does not provide the required functionality. It would need to be extended in order to work. In addition, spinlocks would be required in order to handle the case of reentrancy. Also, how does the EFI loader deal with loading multiple applications across multiple cores? The block support is the least important part of it. There are several other services not related to block devices or network calls.
Maybe you are just trying to squeeze too much of operating system functionality into a mere boot loader?
Using tools for purposes they have not been designed for has never been a good idea...
Best regards,
Wolfgang Denk

Hi Wolfgang,
On Tuesday, November 5, 2019 12:37:26 AM PST Wolfgang Denk wrote:
Dear Aaron,
In message 1838672.aZrPjDvGh8@flash you wrote:
To be blunt, the current U-Boot EFI driver does not provide the required functionality. It would need to be extended in order to work. In addition, spinlocks would be required in order to handle the case of reentrancy. Also, how does the EFI loader deal with loading multiple applications across multiple cores? The block support is the least important part of it. There are several other services not related to block devices or network calls.
Maybe you are just trying to squeeze too much of operating system functionality into a mere boot loader?
Using tools for purposes they have not been designed for has never been a good idea...
Best regards,
Wolfgang Denk
With the complexity of U-Boot, it certainly exceeds a number of operating systems I've used :)
U-Boot OS might be fun for people writing applications where they want bare metal (i.e. hard real-time), though that's already provided with the API and examples.
Our API is very much at arms length. It consists of a descriptor placed into a named block of memory that has the physical address of a single entry point, version information and a magic number, similar to EFI. There has to be some way to hand the CPU over to U-Boot, after all. That single entry point is basically a syscall. It saves the context of the caller and performs a TLB context switch and sets up a new stack for U-Boot and the TLB mapping (we run U-Boot at 0xFFFFFFFFC0000000). There is also a spinlock so that no other core may enter U-Boot until the current request finishes. The C code then interprets the opcode and copies any data (using physical addresses) into buffers used by U-Boot then when done it copies the data back to the application's pointers (which are physical addresses). U-Boot code other than the API never sees outside pointers and all data is copied to a local buffer. It's not fast but it's been very reliable. The external program doesn't need to know anything other than pass some parameters and call the address to hand the CPU context over to U-Boot. Neither side knows anything about the other. You can't get much more arms length than that except perhaps requiring U-Boot to use an interrupt. They are by just about any definition, completely separate binaries. I'm no lawyer, but reading the GPL FAQ I think we fall well within the arms length separation.
At least on MIPS, U-Boot doesn't seem to care which core it's running on as long as only one core is executing at a time. It's proven to be quite reliable. It's not meant to be a heavy-duty OS and by design it limits how much I/O can be performed. It's only meant to load and save configuration and a few other operations. Even functions like getc/putc are not supported (since the native application can do that). The main functions used are for changing the phy parameters and the MAC quad-lane-module parameters like amplitude and equalization which goes along with the phy code.
It also provides some very basic file I/O and block I/O and environment variable support like EFI. EFI would be nice to use, but it would require the proper lock support and a few other things to work in a multi-core environment.
It could be converted over to EFI, though EFI would need to be expanded in order to provide the spinlocks and a few other minor changes for the SoC. EFI would also need to be expanded to allow for platform-specific calls to be supported related to the phy and QLM.
Ideally we won't need this at all with some of the work we're doing on the Linux kernel.
Regards,
Aaron

Dear Aaron,
In message 2609392.0ByMiX4J6F@flash you wrote:
U-Boot OS might be fun for people writing applications where they want bare metal (i.e. hard real-time), though that's already provided with the API and examples.
Urgh... no!!! U-Boot is definitely *not* suitable for any kind of real-time tasks. By design it implements strict single-tasking with usally polling hardware access only. No multi-tasking, no interrupts, no locking, no timers, nothing...
You can't get much more arms length than that except perhaps requiring U-Boot to use an interrupt. They are by just about any definition, completely separate binaries. I'm no lawyer, but reading the GPL FAQ I think we fall well within the arms length separation.
Definitely not. You could not implement any of this without heavily relyin on and deriving from internal interfaces of U-Boot which are not exported for non-GPL use.
Best regards,
Wolfgang Denk

________________________________ From: Wolfgang Denk wd@denx.de Sent: Tuesday, November 5, 2019 3:36 AM To: Aaron Williams awilliams@marvell.com Cc: Tom Rini trini@konsulko.com; Daniel Schwierzeck daniel.schwierzeck@gmail.com; u-boot@lists.denx.de u-boot@lists.denx.de Subject: Re: [EXT] Re: Cavium/Marvell Octeon Support
Hi Wolfgang,
I apologize in advance for the lack of email formatting (blame our IT department for forcing Linux users to use the broken Outhouse web client).
Dear Aaron,
In message 2609392.0ByMiX4J6F@flash you wrote:
U-Boot OS might be fun for people writing applications where they want bare metal (i.e. hard real-time), though that's already provided with the API and examples.
Urgh... no!!! U-Boot is definitely *not* suitable for any kind of real-time tasks. By design it implements strict single-tasking with usally polling hardware access only. No multi-tasking, no interrupts, no locking, no timers, nothing...
And I wouldn't ask U-Boot to do this. We don't do any multi-tasking with U-Boot with the exception of SoC specific code that deals with starting simple executive applications. Our API uses a single giant spinlock to prevent there being any multi-tasking within U-Boot.
Now there is other SoC specific code that does use locks and does support multiple cores simultaneously running code. This is needed when we start these Simple Executive applications. The code allows for multiple applications as well as the Linux kernel to be started simultaneously from within U-Boot. The code is executed by all cores in use and does things like set up memory and TLB mapping for the simple executive applications for each core. None of this code would be exposed outside of our SoC code and there is zero interaction with any of U-Boot's code. Each simple executive application has a core mask of cores assigned to it. Obviously in order to be able to do this there is locking within the SoC specific code. It does not involve any code outside of the SoC in order to do this.
You can't get much more arms length than that except perhaps requiring U-Boot to use an interrupt. They are by just about any definition, completely separate binaries. I'm no lawyer, but reading the GPL FAQ I think we fall well within the arms length separation.
Definitely not. You could not implement any of this without heavily relyin on and deriving from internal interfaces of U-Boot which are not exported for non-GPL use.
See https://www.gnu.org/licenses/old-licenses/gpl-2.0-faq.en.html#GPLInProprieta...
This behaves exactly in the manner that is permitted by the GPL. They are completely separate programs.
Best regards,
Wolfgang Denk
Regards,
Aaron Williams
-- DENX Software Engineering GmbH, Managing Director: Wolfgang Denk HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de The IQ of the group is the lowest IQ of a member of the group divided by the number of people in the group.

Dear Aaron,
In message BYAPR18MB24402A81E226896D208669F5B17E0@BYAPR18MB2440.namprd18.prod.outlook.com you wrote:
Definitely not. You could not implement any of this without heavily relyin on and deriving from internal interfaces of U-Boot which are not exported for non-GPL use.
See https://www.gnu.org/licenses/old-licenses/gpl-2.0-faq.en.html#GPLInProp= rietarySystem
This behaves exactly in the manner that is permitted by the GPL. They are completely separate programs.
Are they?
You wrote:
"There is no linking. Only a call table descriptor is published in a named block of memory."
I can only interpret from that that there is a call table, where your applications call into interfaces that have not been exported for non-GPL use. This is not what I call "completely separate".
Best regards,
Wolfgang Denk

Hi Wolfgang,
On Wednesday, November 6, 2019 7:06:17 AM PST Wolfgang Denk wrote:
Dear Aaron,
In message
BYAPR18MB24402A81E226896D208669F5B17E0@BYAPR18MB2440.namprd18.prod.outlook.com you wrote:
Definitely not. You could not implement any of this without heavily relyin on and deriving from internal interfaces of U-Boot which are not exported for non-GPL use.
See https://urldefense.proofpoint.com/v2/url?u=https-3A__www.gnu.org_licenses _old-2Dlicenses_gpl-2D2.0-2Dfaq.en.html-23GPLInProp-3D&d=DwIDaQ&c=nKjWec2b 6R0mOyPaz7xtfQ&r=3yfMNumMHGMnOfmVc0dViBi3fJfF8ZXRL_aRWSIGwm4&m=a19tqjpYreP S1AEd1tHmUya1hcqvHmvs57fTB9c5I50&s=rp_kzh8HU_FV56RrXpf-0DCuegF0rrporRqWwdT MiR0&e= rietarySystem
This behaves exactly in the manner that is permitted by the GPL. They are completely separate programs.
Are they?
You wrote:
"There is no linking. Only a call table descriptor is published in a named block of memory."
I can only interpret from that that there is a call table, where your applications call into interfaces that have not been exported for non-GPL use. This is not what I call "completely separate".
Best regards,
Wolfgang Denk
Calling directly into U-Boot would be bad. We don't do that. It wouldn't work anyway on our 32-bit bootloader due to the required TLB mapping.
There is no call table. There is a single XKPhys address that points to some assembly code that saves the state of the calling application and sets up the memory mapping and stack for U-Boot (we map it to 0xFFFFFFFFC0000000) then look at an opcode that's passed and parameters. From there it performs one of several functions based on the opcode. On the way out the reverse is done, the state is restored and the TLB restored before returning to the outside application. The calling application has its own virtual memory map, so that has to be saved and restored on entry by the assembly code as well.
Since U-Boot uses a TLB for mapping, it's just not possible for an outside application to call into U-Boot using a function table, so everything must go through the one assembly function. The old U-Boot code was written before EFI support was added. It looks like I'll be removing it anyway, though. We have never exported any U-Boot functions save for the assembly code and the API functionality. The API functionality was not usable by our applications since our applications were typically 64-bit whereas our old U-Boot was 32-bit running in mapped memory (0xFFFFFFFFC0000000/0xC0000000) and physically located at the top of physical memory.
-Aaron

On Wed, Nov 06, 2019 at 10:18:45PM +0000, Aaron Williams wrote:
Hi Wolfgang,
On Wednesday, November 6, 2019 7:06:17 AM PST Wolfgang Denk wrote:
Dear Aaron,
In message
BYAPR18MB24402A81E226896D208669F5B17E0@BYAPR18MB2440.namprd18.prod.outlook.com you wrote:
Definitely not. You could not implement any of this without heavily relyin on and deriving from internal interfaces of U-Boot which are not exported for non-GPL use.
See https://urldefense.proofpoint.com/v2/url?u=https-3A__www.gnu.org_licenses _old-2Dlicenses_gpl-2D2.0-2Dfaq.en.html-23GPLInProp-3D&d=DwIDaQ&c=nKjWec2b 6R0mOyPaz7xtfQ&r=3yfMNumMHGMnOfmVc0dViBi3fJfF8ZXRL_aRWSIGwm4&m=a19tqjpYreP S1AEd1tHmUya1hcqvHmvs57fTB9c5I50&s=rp_kzh8HU_FV56RrXpf-0DCuegF0rrporRqWwdT MiR0&e= rietarySystem
This behaves exactly in the manner that is permitted by the GPL. They are completely separate programs.
Are they?
You wrote:
"There is no linking. Only a call table descriptor is published in a named block of memory."
I can only interpret from that that there is a call table, where your applications call into interfaces that have not been exported for non-GPL use. This is not what I call "completely separate".
Best regards,
Wolfgang Denk
Calling directly into U-Boot would be bad. We don't do that. It wouldn't work anyway on our 32-bit bootloader due to the required TLB mapping.
There is no call table. There is a single XKPhys address that points to some assembly code that saves the state of the calling application and sets up the memory mapping and stack for U-Boot (we map it to 0xFFFFFFFFC0000000) then look at an opcode that's passed and parameters. From there it performs one of several functions based on the opcode. On the way out the reverse is done, the state is restored and the TLB restored before returning to the outside application. The calling application has its own virtual memory map, so that has to be saved and restored on entry by the assembly code as well.
Since U-Boot uses a TLB for mapping, it's just not possible for an outside application to call into U-Boot using a function table, so everything must go through the one assembly function. The old U-Boot code was written before EFI support was added. It looks like I'll be removing it anyway, though. We have never exported any U-Boot functions save for the assembly code and the API functionality. The API functionality was not usable by our applications since our applications were typically 64-bit whereas our old U-Boot was 32-bit running in mapped memory (0xFFFFFFFFC0000000/0xC0000000) and physically located at the top of physical memory.
Alright, so I think here's the important thing to look at moving forward. In mainline U-Boot, the options for communication between closed source components and U-Boot itself (where GPLv2 is the minimum license) are either the defined ABI or making use of the EFI ABI. We do not want to add or support a 3rd method. Thanks!

On Tue, Nov 05, 2019 at 02:08:54AM +0000, Aaron Williams wrote:
On Monday, November 4, 2019 8:23:08 AM PST Tom Rini wrote:
On Mon, Nov 04, 2019 at 04:44:18PM +0100, Wolfgang Denk wrote:
Dear Aaron,
In message 2710076.TiSPtmOvtb@flash you wrote:
What exactly do you need this for? Why don't you just link your code with the rest of U-Boot?
We need it to obtain and modify the phy parameters. This is a custom 25G gearbox that needs a lot of hand holding. This may end up being a low priority (not the gearbox, but the API). It's only a few hundred lines of code (the API).
Again you don't answer my question. Why do you need a special new API for such code? Why do you not just link that code with the rest of U-Boot?
It has been mentioned before, but just to be sure: this code which uses your new API is licensed under a GPLv2 conforming lincense?
And, to be blunt, if it is not, handling your non-GPLv2 applications via an EFI binary is the way forward, not extending the U-Boot binary ABI, in my opinion.
To be blunt, the current U-Boot EFI driver does not provide the required functionality. It would need to be extended in order to work. In addition, spinlocks would be required in order to handle the case of reentrancy. Also, how does the EFI loader deal with loading multiple applications across multiple cores? The block support is the least important part of it. There are several other services not related to block devices or network calls.
If there are parts of the EFI specification that we do not implement, but could implement, it would be a much appreciated contribution to the code. If once you're up in the EFI world there are things you cannot do that you need to do, that should be taken up with the UEFI consortium.

Hi Wolfgang,
On Monday, November 4, 2019 7:44:18 AM PST Wolfgang Denk wrote:
Dear Aaron,
In message 2710076.TiSPtmOvtb@flash you wrote:
What exactly do you need this for? Why don't you just link your code with the rest of U-Boot?
We need it to obtain and modify the phy parameters. This is a custom 25G gearbox that needs a lot of hand holding. This may end up being a low priority (not the gearbox, but the API). It's only a few hundred lines of code (the API).
Again you don't answer my question. Why do you need a special new API for such code? Why do you not just link that code with the rest of U-Boot?
The code in question that is calling the API is not GPL and hence cannot be linked with U-Boot though the phy code is GPL. The applications that are calling also have their own virtual memory configuration and there can be multiple applications running on multiple cores that can make simultaneous calls. Because of the way the phy must be maintained with a lot of state information, the code controlling it cannot be spread between the separate independent applications which run on their own dedicated cores and address spaces. The API I wrote takes care of the required context switching and provides the services for these applications, such as control of the phy, access to devices like eMMC, tuning our QLM interfaces (this code is required for U-Boot networking anyway), etc. There is no linking. Only a call table descriptor is published in a named block of memory. The API also provides the necessary spinlocks and switch stacks. The code in question adds around 36K in total, so it is fairly small. The main differences are the addition of a number of calls that are unique to our needs in addition to the method of calling since a context switch is required in addition to the spinlocks.
The phy in question also does not fit in the normal phy framework. It doesn't even communicate with SMI. It is a complex gearbox where there needs to be interaction between applications and the gearbox where some code runs on the phy itself but a lot needs to be external.
The API also provides a number of other services such as access to and saving environment variables as well as access to block devices and filesystems. It is centralized in U-Boot because 1) the functionality is already available in U-Boot which is in memory anyway and 2) it's centralized and accessible by all applications so it can safely provide services to multiple applications simultaneously.
These applications are primarily bare-metal applications.
It may be that this functionality isn't needed. I will try and remove it if I can.
It has been mentioned before, but just to be sure: this code which uses your new API is licensed under a GPLv2 conforming lincense?
There should be no need. None of the code is linked against U-Boot, either at compile time nor at runtime. The application doesn't even know where it is located except by looking for a named block of memory.
This is another thing we make use of in Octeon. There is a concept of named blocks in memory. These named blocks are used by U-Boot, simple executive applications and the Linux kernel. This allows physical memory to be partitioned between Linux and Simple Executive applications as well as providing some blocks that are used by some hardware blocks. I believe this support is already in the upstream Linux kernel for Octeon.
Best regards,
Wolfgang Denk
Regards,
Aaron

Dear Aaron,
In message 5376617.97hUrJXovB@flash you wrote:
Again you don't answer my question. Why do you need a special new API for such code? Why do you not just link that code with the rest of U-Boot?
The code in question that is calling the API is not GPL and hence cannot be linked with U-Boot though the phy code is GPL.
Ouch. I was afraid to hear that.
Please be aware that your newly created API does NOT implement a GPL license exception. the only interface that allows for non-GPL code to be run under control of U-Boot is the standalone program interface, which is intentionally very restricted.
In other words: what you are doing here is a clear (and intentional, which makes it even worse) GPL license violation.
It has been mentioned before, but just to be sure: this code which uses your new API is licensed under a GPLv2 conforming lincense?
There should be no need. None of the code is linked against U-Boot, either at compile time nor at runtime. The application doesn't even know where it is located except by looking for a named block of memory.
It does not have to be linked. You access internal interfaces of U-Boot that have not been exported for non-GPL use, so your code still has to be licensed under GPLv2 or a compatible license.
Best regards,
Wolfgang Denk

On Tue, Nov 05, 2019 at 09:33:35AM +0100, Wolfgang Denk wrote:
Dear Aaron,
In message 5376617.97hUrJXovB@flash you wrote:
Again you don't answer my question. Why do you need a special new API for such code? Why do you not just link that code with the rest of U-Boot?
The code in question that is calling the API is not GPL and hence cannot be linked with U-Boot though the phy code is GPL.
Ouch. I was afraid to hear that.
Please be aware that your newly created API does NOT implement a GPL license exception. the only interface that allows for non-GPL code to be run under control of U-Boot is the standalone program interface, which is intentionally very restricted.
In other words: what you are doing here is a clear (and intentional, which makes it even worse) GPL license violation.
It has been mentioned before, but just to be sure: this code which uses your new API is licensed under a GPLv2 conforming lincense?
There should be no need. None of the code is linked against U-Boot, either at compile time nor at runtime. The application doesn't even know where it is located except by looking for a named block of memory.
It does not have to be linked. You access internal interfaces of U-Boot that have not been exported for non-GPL use, so your code still has to be licensed under GPLv2 or a compatible license.
I'm just following up to say that I agree with Wolfgang here.

On Wed, Oct 23, 2019 at 03:50:00AM +0000, Aaron Williams wrote:
Hi all,
I have been tasked with porting our Octeon U-Boot to the latest U-Boot and merging it upstream.
[snip]
I want to jump back up back to the top of this thread. And first I want to say that I am glad that there is official desire to upstream support. This is good. My concern is that the plan seems to be, at a very high level, "get everything we have for every feature upstream". But as been said elsewhere this would roughly double the total LOC for the project, and it's not like we're a new project with a small handful of things :) It's impossible for the community to review that much code in any meaningful way over anything less than a period of several years. I know you've said that to support various customer use cases you need all sorts of other things, and while I'm certain that's true, I believe the plan needs to be to step back and pick the smallest possible testable unit, and upstream it. And add to it, small pieces at a time. Thanks!

On Wednesday, October 30, 2019 3:05:25 PM PDT Tom Rini wrote:
External Email
On Wed, Oct 23, 2019 at 03:50:00AM +0000, Aaron Williams wrote:
Hi all,
I have been tasked with porting our Octeon U-Boot to the latest U-Boot and merging it upstream.
[snip]
I want to jump back up back to the top of this thread. And first I want to say that I am glad that there is official desire to upstream support. This is good. My concern is that the plan seems to be, at a very high level, "get everything we have for every feature upstream". But as been said elsewhere this would roughly double the total LOC for the project, and it's not like we're a new project with a small handful of things :) It's impossible for the community to review that much code in any meaningful way over anything less than a period of several years. I know you've said that to support various customer use cases you need all sorts of other things, and while I'm certain that's true, I believe the plan needs to be to step back and pick the smallest possible testable unit, and upstream it. And add to it, small pieces at a time. Thanks!
It might be easier if I were a maintainer for our SOC to limit the needed review of a fair bit of the code. I have already found I can cut out a large chunk of our code by removing support for our older models. Much of the code has been very well tested, for example our serdes initialization and DRAM initialization code. That's not to say I can't do some cleanup.
The changes to U-Boot itself should be relatively small as long as I can keep much of our code under arch/mips/arch-octeon and arch/mips/cpu/octeon much like how our ARM code is. For anything that is applicable to other architectures I will place it in the appropriate locations.
Our ARM code is quite a bit simpler than MIPS because on ARM most of the heavy lifting is done by our "BDK" bootloader as well as ATF. On ARM, U-Boot doesn't need to deal with SFPs, serdes initialization, DRAM initialization, hot plug or a myriad of other issues.
I noticed the same with X86. If it weren't for these other layers the X86 code would also be quite large.
I have already identified quite a few very large files that will be removed, such as the error handling code for or Octeon2 and earlier CPUs.
Additionally I have identified a number of register definition files I can get rid of, including one that's 2.3MB in size! These files tend to be huge because they contain definitions for every single chip and revision as well as big and little endian definitions. On top of that, there are a huge number of comments. Each field contains all of the text that is in our hardware reference manuals. There is no dearth of comments. I should be able to cut the size of the remaining files to 1/4th their current size or even smaller.
Some files will still remain quite large, however, such as our serdes initialization and DRAM initialization code (which I plan to re-architect because the original author didn't believe in functions due to stack limitations. (it is well commented though). If you ever want to learn all the gory details of DDR4 link training and finding trends and so-forth it's all in there. The current memory initialization code is over 1MB in size. I plan to cut this down and break it up in a clean manner. The initialization code has grown in complexity and size over the years as various instabilities have been identified and fixed. The DRAM initialization code for our OcteonTX2 CPU is almost as large, though this code has been cleaned up and re-written. There really is no way to avoid this. The OcteonTX and OcteonTX2 DDR initialization code is similar to that for Octeon. In the case of U-Boot on our ARM SoCs, though, initialization is done before U-Boot is loaded.
I'll move the init code to drivers/ram/marvell/octeon. It will be about twice as large as the AXP driver (which only handles DDR3). The serdes init code I figure could go under drivers/soc/marvell/octeon.
I noticed that there are several directories under drivers for memory. There's drivers/ram, drivers/memory and drivers/ddr. These should be consolidated. I think some code might be able to be common, such as the SPD decoding code. It's even possible that some algorithms might be able to be made common such as deskew training and read/write leveling.
In terms of Octeon specific features, there really aren't too many of those but most of the ones we have are essential in the bootloader. There's no avoiding the Serdes and low-level network initialization. The serdes init code works across all networking interface types (SGMII, 1000Base-X, XAUI, RXAUI, XFI, XLAUI, 25G (XLAUI), SATA, PCIe, SRIO plus all the variants (i.e. KR). It also configures all the clocks and equalization. It's not like a simple gigabit NIC nor is it offloaded to some other layer. Some of this code will come later, for example support for NUMA with CN78XX (96 cores, 256GiB of RAM).
Currently we are using 39MB under arch/mips. I think I can easily cut this down to 15MB or smaller, especially by moving some code here to the appropriate driver directories (i.e. DRAM, pcie, watchdog, etc.)
It will still be a large SoC, though.
-Aaron

Dear Aaron,
In message 1889679.7FQr5zsBR1@flash you wrote:
Currently we are using 39MB under arch/mips. I think I can easily cut this down to 15MB or smaller, especially by moving some code here to the appropriate driver directories (i.e. DRAM, pcie, watchdog, etc.)
It will still be a large SoC, though.
Have you already looked at formal requirements, like coding style etc.? Did you ever run your additions through checkpatch.pl, for example?
Best regards,
Wolfgang Denk

Hi Wolfgang,
On Thursday, October 31, 2019 3:40:27 AM PDT Wolfgang Denk wrote:
Dear Aaron,
In message 1889679.7FQr5zsBR1@flash you wrote:
Currently we are using 39MB under arch/mips. I think I can easily cut this down to 15MB or smaller, especially by moving some code here to the appropriate driver directories (i.e. DRAM, pcie, watchdog, etc.)
It will still be a large SoC, though.
Have you already looked at formal requirements, like coding style etc.? Did you ever run your additions through checkpatch.pl, for example?
We did follow the formal coding style. Everything will go through checkpatch. My biggest complaint about it is the 80 columns for debug and other print statements.
Best regards,
Wolfgang Denk
-Aaron

On Thu, Oct 31, 2019 at 06:01:34PM +0000, Aaron Williams wrote:
Hi Wolfgang,
On Thursday, October 31, 2019 3:40:27 AM PDT Wolfgang Denk wrote:
Dear Aaron,
In message 1889679.7FQr5zsBR1@flash you wrote:
Currently we are using 39MB under arch/mips. I think I can easily cut this down to 15MB or smaller, especially by moving some code here to the appropriate driver directories (i.e. DRAM, pcie, watchdog, etc.)
It will still be a large SoC, though.
Have you already looked at formal requirements, like coding style etc.? Did you ever run your additions through checkpatch.pl, for example?
We did follow the formal coding style. Everything will go through checkpatch. My biggest complaint about it is the 80 columns for debug and other print statements.
checkpatch doesn't complain about those when they use standard logging functions, however.

Hi Wolfgang,
On Monday, November 4, 2019 9:22:16 AM PST Tom Rini wrote:
On Thu, Oct 31, 2019 at 06:01:34PM +0000, Aaron Williams wrote:
Hi Wolfgang,
On Thursday, October 31, 2019 3:40:27 AM PDT Wolfgang Denk wrote:
Dear Aaron,
In message 1889679.7FQr5zsBR1@flash you wrote:
Currently we are using 39MB under arch/mips. I think I can easily cut this down to 15MB or smaller, especially by moving some code here to the appropriate driver directories (i.e. DRAM, pcie, watchdog, etc.)
It will still be a large SoC, though.
Have you already looked at formal requirements, like coding style etc.? Did you ever run your additions through checkpatch.pl, for example?
We did follow the formal coding style. Everything will go through checkpatch. My biggest complaint about it is the 80 columns for debug and other print statements.
checkpatch doesn't complain about those when they use standard logging functions, however.
It complains plenty about printf(), debug() and a number of other standard U- Boot logging calls.
Regards,
Aaron

On Tue, Nov 05, 2019 at 02:13:13AM +0000, Aaron Williams wrote:
Hi Wolfgang,
On Monday, November 4, 2019 9:22:16 AM PST Tom Rini wrote:
On Thu, Oct 31, 2019 at 06:01:34PM +0000, Aaron Williams wrote:
Hi Wolfgang,
On Thursday, October 31, 2019 3:40:27 AM PDT Wolfgang Denk wrote:
Dear Aaron,
In message 1889679.7FQr5zsBR1@flash you wrote:
Currently we are using 39MB under arch/mips. I think I can easily cut this down to 15MB or smaller, especially by moving some code here to the appropriate driver directories (i.e. DRAM, pcie, watchdog, etc.)
It will still be a large SoC, though.
Have you already looked at formal requirements, like coding style etc.? Did you ever run your additions through checkpatch.pl, for example?
We did follow the formal coding style. Everything will go through checkpatch. My biggest complaint about it is the 80 columns for debug and other print statements.
checkpatch doesn't complain about those when they use standard logging functions, however.
It complains plenty about printf(), debug() and a number of other standard U- Boot logging calls.
Yes, but not about pr_debug, etc, which are what really should be used. Thanks!

On Wed, Oct 30, 2019 at 11:36:19PM +0000, Aaron Williams wrote:
On Wednesday, October 30, 2019 3:05:25 PM PDT Tom Rini wrote:
External Email
On Wed, Oct 23, 2019 at 03:50:00AM +0000, Aaron Williams wrote:
Hi all,
I have been tasked with porting our Octeon U-Boot to the latest U-Boot and merging it upstream.
[snip]
I want to jump back up back to the top of this thread. And first I want to say that I am glad that there is official desire to upstream support. This is good. My concern is that the plan seems to be, at a very high level, "get everything we have for every feature upstream". But as been said elsewhere this would roughly double the total LOC for the project, and it's not like we're a new project with a small handful of things :) It's impossible for the community to review that much code in any meaningful way over anything less than a period of several years. I know you've said that to support various customer use cases you need all sorts of other things, and while I'm certain that's true, I believe the plan needs to be to step back and pick the smallest possible testable unit, and upstream it. And add to it, small pieces at a time. Thanks!
It might be easier if I were a maintainer for our SOC to limit the needed review of a fair bit of the code. I have already found I can cut out a large chunk of our code by removing support for our older models. Much of the code has been very well tested, for example our serdes initialization and DRAM initialization code. That's not to say I can't do some cleanup.
Don't worry, I totally expect you to become the maintainer for your SoC, that's key to making sure the SoC-specific stuff is done right :) But, you're not the first big SoC to migrate from an internal fork to mainline and have a seemingly impossible number of LOC to deal with. It's why I'm saying you need to start with something absolutely as small as possible, and move forward.
The changes to U-Boot itself should be relatively small as long as I can keep much of our code under arch/mips/arch-octeon and arch/mips/cpu/octeon much like how our ARM code is. For anything that is applicable to other architectures I will place it in the appropriate locations.
Our ARM code is quite a bit simpler than MIPS because on ARM most of the heavy lifting is done by our "BDK" bootloader as well as ATF. On ARM, U-Boot doesn't need to deal with SFPs, serdes initialization, DRAM initialization, hot plug or a myriad of other issues.
I noticed the same with X86. If it weren't for these other layers the X86 code would also be quite large.
I have already identified quite a few very large files that will be removed, such as the error handling code for or Octeon2 and earlier CPUs.
Additionally I have identified a number of register definition files I can get rid of, including one that's 2.3MB in size! These files tend to be huge because they contain definitions for every single chip and revision as well as big and little endian definitions. On top of that, there are a huge number of comments. Each field contains all of the text that is in our hardware reference manuals. There is no dearth of comments. I should be able to cut the size of the remaining files to 1/4th their current size or even smaller.
Some files will still remain quite large, however, such as our serdes initialization and DRAM initialization code (which I plan to re-architect because the original author didn't believe in functions due to stack limitations. (it is well commented though). If you ever want to learn all the gory details of DDR4 link training and finding trends and so-forth it's all in there. The current memory initialization code is over 1MB in size. I plan to cut this down and break it up in a clean manner. The initialization code has grown in complexity and size over the years as various instabilities have been identified and fixed. The DRAM initialization code for our OcteonTX2 CPU is almost as large, though this code has been cleaned up and re-written. There really is no way to avoid this. The OcteonTX and OcteonTX2 DDR initialization code is similar to that for Octeon. In the case of U-Boot on our ARM SoCs, though, initialization is done before U-Boot is loaded.
I'll move the init code to drivers/ram/marvell/octeon. It will be about twice as large as the AXP driver (which only handles DDR3). The serdes init code I figure could go under drivers/soc/marvell/octeon.
I noticed that there are several directories under drivers for memory. There's drivers/ram, drivers/memory and drivers/ddr. These should be consolidated. I think some code might be able to be common, such as the SPD decoding code. It's even possible that some algorithms might be able to be made common such as deskew training and read/write leveling.
In terms of Octeon specific features, there really aren't too many of those but most of the ones we have are essential in the bootloader. There's no avoiding the Serdes and low-level network initialization. The serdes init code works across all networking interface types (SGMII, 1000Base-X, XAUI, RXAUI, XFI, XLAUI, 25G (XLAUI), SATA, PCIe, SRIO plus all the variants (i.e. KR). It also configures all the clocks and equalization. It's not like a simple gigabit NIC nor is it offloaded to some other layer. Some of this code will come later, for example support for NUMA with CN78XX (96 cores, 256GiB of RAM).
Currently we are using 39MB under arch/mips. I think I can easily cut this down to 15MB or smaller, especially by moving some code here to the appropriate driver directories (i.e. DRAM, pcie, watchdog, etc.)
It will still be a large SoC, though.
Most modern SoCs are pretty large. Taking this one step at a time and evaluating and re-architecting code along the way and we'll get there. You're probably going to run in to a lot of code that needs to be adapted to new frameworks, too. What I strongly encourage from the example of previous SoCs that started out this way is to think of your internal tree as a reference only. Sure, you'll want to grab as much of the complex init sequence code when moving things over, but it shouldn't be thought of as "move board X/Y/Z over" but "start adding board X with minimal peripherals" and add on top.

On Thursday, October 31, 2019 6:26:51 AM PDT Tom Rini wrote:
On Wed, Oct 30, 2019 at 11:36:19PM +0000, Aaron Williams wrote:
On Wednesday, October 30, 2019 3:05:25 PM PDT Tom Rini wrote:
External Email
On Wed, Oct 23, 2019 at 03:50:00AM +0000, Aaron Williams wrote:
Hi all,
I have been tasked with porting our Octeon U-Boot to the latest U-Boot and merging it upstream.
[snip]
I want to jump back up back to the top of this thread. And first I want to say that I am glad that there is official desire to upstream support. This is good. My concern is that the plan seems to be, at a very high level, "get everything we have for every feature upstream". But as been said elsewhere this would roughly double the total LOC for the project, and it's not like we're a new project with a small handful of things :) It's impossible for the community to review that much code in any meaningful way over anything less than a period of several years. I know you've said that to support various customer use cases you need all sorts of other things, and while I'm certain that's true, I believe the plan needs to be to step back and pick the smallest possible testable unit, and upstream it. And add to it, small pieces at a time. Thanks!
It might be easier if I were a maintainer for our SOC to limit the needed review of a fair bit of the code. I have already found I can cut out a large chunk of our code by removing support for our older models. Much of the code has been very well tested, for example our serdes initialization and DRAM initialization code. That's not to say I can't do some cleanup.
Don't worry, I totally expect you to become the maintainer for your SoC, that's key to making sure the SoC-specific stuff is done right :) But, you're not the first big SoC to migrate from an internal fork to mainline and have a seemingly impossible number of LOC to deal with. It's why I'm saying you need to start with something absolutely as small as possible, and move forward.
The changes to U-Boot itself should be relatively small as long as I can keep much of our code under arch/mips/arch-octeon and arch/mips/cpu/octeon much like how our ARM code is. For anything that is applicable to other architectures I will place it in the appropriate locations.
Our ARM code is quite a bit simpler than MIPS because on ARM most of the heavy lifting is done by our "BDK" bootloader as well as ATF. On ARM, U-Boot doesn't need to deal with SFPs, serdes initialization, DRAM initialization, hot plug or a myriad of other issues.
I noticed the same with X86. If it weren't for these other layers the X86 code would also be quite large.
I have already identified quite a few very large files that will be removed, such as the error handling code for or Octeon2 and earlier CPUs.
Additionally I have identified a number of register definition files I can get rid of, including one that's 2.3MB in size! These files tend to be huge because they contain definitions for every single chip and revision as well as big and little endian definitions. On top of that, there are a huge number of comments. Each field contains all of the text that is in our hardware reference manuals. There is no dearth of comments. I should be able to cut the size of the remaining files to 1/4th their current size or even smaller.
Some files will still remain quite large, however, such as our serdes initialization and DRAM initialization code (which I plan to re-architect because the original author didn't believe in functions due to stack limitations. (it is well commented though). If you ever want to learn all the gory details of DDR4 link training and finding trends and so-forth it's all in there. The current memory initialization code is over 1MB in size. I plan to cut this down and break it up in a clean manner. The initialization code has grown in complexity and size over the years as various instabilities have been identified and fixed. The DRAM initialization code for our OcteonTX2 CPU is almost as large, though this code has been cleaned up and re-written. There really is no way to avoid this. The OcteonTX and OcteonTX2 DDR initialization code is similar to that for Octeon. In the case of U-Boot on our ARM SoCs, though, initialization is done before U-Boot is loaded.
I'll move the init code to drivers/ram/marvell/octeon. It will be about twice as large as the AXP driver (which only handles DDR3). The serdes init code I figure could go under drivers/soc/marvell/octeon.
I noticed that there are several directories under drivers for memory. There's drivers/ram, drivers/memory and drivers/ddr. These should be consolidated. I think some code might be able to be common, such as the SPD decoding code. It's even possible that some algorithms might be able to be made common such as deskew training and read/write leveling.
In terms of Octeon specific features, there really aren't too many of those but most of the ones we have are essential in the bootloader. There's no avoiding the Serdes and low-level network initialization. The serdes init code works across all networking interface types (SGMII, 1000Base-X, XAUI, RXAUI, XFI, XLAUI, 25G (XLAUI), SATA, PCIe, SRIO plus all the variants (i.e. KR). It also configures all the clocks and equalization. It's not like a simple gigabit NIC nor is it offloaded to some other layer. Some of this code will come later, for example support for NUMA with CN78XX (96 cores, 256GiB of RAM).
Currently we are using 39MB under arch/mips. I think I can easily cut this down to 15MB or smaller, especially by moving some code here to the appropriate driver directories (i.e. DRAM, pcie, watchdog, etc.)
It will still be a large SoC, though.
Most modern SoCs are pretty large. Taking this one step at a time and evaluating and re-architecting code along the way and we'll get there. You're probably going to run in to a lot of code that needs to be adapted to new frameworks, too. What I strongly encourage from the example of previous SoCs that started out this way is to think of your internal tree as a reference only. Sure, you'll want to grab as much of the complex init sequence code when moving things over, but it shouldn't be thought of as "move board X/Y/Z over" but "start adding board X with minimal peripherals" and add on top.
This is the goal. It should be easier to develop the first port without networking support since the image can be booted over PCIe though the networking support will be key because the customer disables this access. We plan to adapt to the new model. I've been working with it for some time with our OcteonTX line which was just upstreamed.
-Aaron
participants (5)
-
Aaron Williams
-
Chris Packham
-
Daniel Schwierzeck
-
Tom Rini
-
Wolfgang Denk