[PATCH 0/6] binman: Support compressing files in parallel

When building an image with large files it makes sense to compress them in parallel. This series adds support for this as well a few other performance-related tweaks. It also includes support for timing of the different types of operation (e.g. compression), for debugging purposes.
Simon Glass (6): binman: Put compressed data into separate files binman: Support multithreading for building images binman: Split node-reading out from constructor in files binman: Use bytearray instead of string patman: Use bytearray instead of string binman: Add basic support for debugging performance
tools/binman/binman.rst | 18 ++++++ tools/binman/cmdline.py | 4 ++ tools/binman/control.py | 7 +++ tools/binman/etype/blob.py | 5 ++ tools/binman/etype/collection.py | 2 +- tools/binman/etype/files.py | 3 + tools/binman/etype/section.py | 40 ++++++++++++-- tools/binman/ftest.py | 41 ++++++++++++-- tools/binman/image.py | 3 + tools/binman/state.py | 95 ++++++++++++++++++++++++++++++++ tools/patman/cros_subprocess.py | 6 +- tools/patman/tools.py | 9 ++- 12 files changed, 218 insertions(+), 15 deletions(-)

At present compression uses the same temporary file for all invocations. With multithreading this causes the data to become corrupted. Use a different filename each time.
Signed-off-by: Simon Glass sjg@chromium.org ---
tools/patman/tools.py | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/tools/patman/tools.py b/tools/patman/tools.py index ec95a543bd9..877e37cd8da 100644 --- a/tools/patman/tools.py +++ b/tools/patman/tools.py @@ -466,6 +466,9 @@ def Compress(indata, algo, with_header=True): This requires 'lz4' and 'lzma_alone' tools. It also requires an output directory to be previously set up, by calling PrepareOutputDir().
+ Care is taken to use unique temporary files so that this function can be + called from multiple threads. + Args: indata: Input data to compress algo: Algorithm to use ('none', 'gzip', 'lz4' or 'lzma') @@ -475,14 +478,16 @@ def Compress(indata, algo, with_header=True): """ if algo == 'none': return indata - fname = GetOutputFilename('%s.comp.tmp' % algo) + fname = tempfile.NamedTemporaryFile(prefix='%s.comp.tmp' % algo, + dir=outdir).name WriteFile(fname, indata) if algo == 'lz4': data = Run('lz4', '--no-frame-crc', '-B4', '-5', '-c', fname, binary=True) # cbfstool uses a very old version of lzma elif algo == 'lzma': - outfname = GetOutputFilename('%s.comp.otmp' % algo) + outfname = tempfile.NamedTemporaryFile(prefix='%s.comp.otmp' % algo, + dir=outdir).name Run('lzma_alone', 'e', fname, outfname, '-lc1', '-lp0', '-pb0', '-d8') data = ReadFile(outfname) elif algo == 'gzip':

At present compression uses the same temporary file for all invocations. With multithreading this causes the data to become corrupted. Use a different filename each time.
Signed-off-by: Simon Glass sjg@chromium.org ---
tools/patman/tools.py | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
Applied to u-boot-dm, thanks!

Some images may take a while to build, e.g. if they are large and use slow compression. Support compiling sections in parallel to speed things up.
Signed-off-by: Simon Glass sjg@chromium.org ---
tools/binman/binman.rst | 18 ++++++++++++++++++ tools/binman/cmdline.py | 4 ++++ tools/binman/control.py | 4 ++++ tools/binman/etype/section.py | 36 ++++++++++++++++++++++++++++++++--- tools/binman/ftest.py | 33 ++++++++++++++++++++++++++++---- tools/binman/image.py | 3 +++ tools/binman/state.py | 23 ++++++++++++++++++++++ 7 files changed, 114 insertions(+), 7 deletions(-)
diff --git a/tools/binman/binman.rst b/tools/binman/binman.rst index bc635aa00a5..09e7b571982 100644 --- a/tools/binman/binman.rst +++ b/tools/binman/binman.rst @@ -1142,6 +1142,22 @@ adds a -v<level> option to the call to binman:: make BINMAN_VERBOSE=5
+Building sections in parallel +----------------------------- + +By default binman uses multiprocessing to speed up compilation of large images. +This works at a section level, with one thread for each entry in the section. +This can speed things up if the entries are large and use compression. + +This feature can be disabled with the '-T' flag, which defaults to a suitable +value for your machine. This depends on the Python version, e.g on v3.8 it uses +12 threads on an 8-core machine. See ConcurrentFutures_ for more details. + +The special value -T0 selects single-threaded mode, useful for debugging during +development, since dealing with exceptions and problems in threads is more +difficult. This avoids any use of ThreadPoolExecutor. + + History / Credits -----------------
@@ -1190,3 +1206,5 @@ Some ideas: -- Simon Glass sjg@chromium.org 7/7/2016 + +.. _ConcurrentFutures: https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures... diff --git a/tools/binman/cmdline.py b/tools/binman/cmdline.py index 95f9ba27fbd..d6156df408b 100644 --- a/tools/binman/cmdline.py +++ b/tools/binman/cmdline.py @@ -32,6 +32,10 @@ controlled by a description in the board device tree.''' default=False, help='Display the README file') parser.add_argument('--toolpath', type=str, action='append', help='Add a path to the directories containing tools') + parser.add_argument('-T', '--threads', type=int, + default=None, help='Number of threads to use (0=single-thread)') + parser.add_argument('--test-section-timeout', action='store_true', + help='Use a zero timeout for section multi-threading (for testing)') parser.add_argument('-v', '--verbosity', default=1, type=int, help='Control verbosity: 0=silent, 1=warnings, 2=notices, ' '3=info, 4=detail, 5=debug') diff --git a/tools/binman/control.py b/tools/binman/control.py index f57e34daaaa..b2113b6e64f 100644 --- a/tools/binman/control.py +++ b/tools/binman/control.py @@ -628,9 +628,13 @@ def Binman(args): tools.PrepareOutputDir(args.outdir, args.preserve) tools.SetToolPaths(args.toolpath) state.SetEntryArgs(args.entry_arg) + state.SetThreads(args.threads)
images = PrepareImagesAndDtbs(dtb_fname, args.image, args.update_fdt, use_expanded) + if args.test_section_timeout: + # Set the first image to timeout, used in testThreadTimeout() + images[list(images.keys())[0]].test_section_timeout = True missing = False for image in images.values(): missing |= ProcessImage(image, args.update_fdt, args.map, diff --git a/tools/binman/etype/section.py b/tools/binman/etype/section.py index c3bac026c14..92d3f3add4c 100644 --- a/tools/binman/etype/section.py +++ b/tools/binman/etype/section.py @@ -9,10 +9,12 @@ images to be created. """
from collections import OrderedDict +import concurrent.futures import re import sys
from binman.entry import Entry +from binman import state from dtoc import fdt_util from patman import tools from patman import tout @@ -525,15 +527,43 @@ class Entry_section(Entry): def GetEntryContents(self): """Call ObtainContents() for each entry in the section """ + def _CheckDone(entry): + if not entry.ObtainContents(): + next_todo.append(entry) + return entry + todo = self._entries.values() for passnum in range(3): + threads = state.GetThreads() next_todo = [] - for entry in todo: - if not entry.ObtainContents(): - next_todo.append(entry) + + if threads == 0: + for entry in todo: + _CheckDone(entry) + else: + with concurrent.futures.ThreadPoolExecutor( + max_workers=threads) as executor: + future_to_data = { + entry: executor.submit(_CheckDone, entry) + for entry in todo} + timeout = 60 + if self.GetImage().test_section_timeout: + timeout = 0 + done, not_done = concurrent.futures.wait( + future_to_data.values(), timeout=timeout) + # Make sure we check the result, so any exceptions are + # generated. Check the results in entry order, since tests + # may expect earlier entries to fail first. + for entry in todo: + job = future_to_data[entry] + job.result() + if not_done: + self.Raise('Timed out obtaining contents') + todo = next_todo if not todo: break + if todo: self.Raise('Internal error: Could not complete processing of contents: remaining %s' % todo) diff --git a/tools/binman/ftest.py b/tools/binman/ftest.py index 14daa69a69b..ceaa6a8737c 100644 --- a/tools/binman/ftest.py +++ b/tools/binman/ftest.py @@ -315,7 +315,8 @@ class TestFunctional(unittest.TestCase): def _DoTestFile(self, fname, debug=False, map=False, update_dtb=False, entry_args=None, images=None, use_real_dtb=False, use_expanded=False, verbosity=None, allow_missing=False, - extra_indirs=None): + extra_indirs=None, threads=None, + test_section_timeout=False): """Run binman with a given test file
Args: @@ -338,6 +339,8 @@ class TestFunctional(unittest.TestCase): allow_missing: Set the '--allow-missing' flag so that missing external binaries just produce a warning instead of an error extra_indirs: Extra input directories to add using -I + threads: Number of threads to use (None for default, 0 for + single-threaded) """ args = [] if debug: @@ -349,6 +352,10 @@ class TestFunctional(unittest.TestCase): if self.toolpath: for path in self.toolpath: args += ['--toolpath', path] + if threads is not None: + args.append('-T%d' % threads) + if test_section_timeout: + args.append('--test-section-timeout') args += ['build', '-p', '-I', self._indir, '-d', self.TestFile(fname)] if map: args.append('-m') @@ -381,7 +388,7 @@ class TestFunctional(unittest.TestCase): fname: Filename of .dts file to read outfile: Output filename for compiled device-tree binary
- Returns: + #Returns: Contents of device-tree binary """ tmpdir = tempfile.mkdtemp(prefix='binmant.') @@ -419,7 +426,7 @@ class TestFunctional(unittest.TestCase):
def _DoReadFileDtb(self, fname, use_real_dtb=False, use_expanded=False, map=False, update_dtb=False, entry_args=None, - reset_dtbs=True, extra_indirs=None): + reset_dtbs=True, extra_indirs=None, threads=None): """Run binman and return the resulting image
This runs binman with a given test file and then reads the resulting @@ -446,6 +453,8 @@ class TestFunctional(unittest.TestCase): function. If reset_dtbs is True, then the original test dtb is written back before this function finishes extra_indirs: Extra input directories to add using -I + threads: Number of threads to use (None for default, 0 for + single-threaded)
Returns: Tuple: @@ -470,7 +479,8 @@ class TestFunctional(unittest.TestCase): try: retcode = self._DoTestFile(fname, map=map, update_dtb=update_dtb, entry_args=entry_args, use_real_dtb=use_real_dtb, - use_expanded=use_expanded, extra_indirs=extra_indirs) + use_expanded=use_expanded, extra_indirs=extra_indirs, + threads=threads) self.assertEqual(0, retcode) out_dtb_fname = tools.GetOutputFilename('u-boot.dtb.out')
@@ -4609,6 +4619,21 @@ class TestFunctional(unittest.TestCase): self.assertIn('Expected __bss_size symbol in vpl/u-boot-vpl', str(e.exception))
+ def testSectionsSingleThread(self): + """Test sections without multithreading""" + data = self._DoReadFileDtb('055_sections.dts', threads=0)[0] + expected = (U_BOOT_DATA + tools.GetBytes(ord('!'), 12) + + U_BOOT_DATA + tools.GetBytes(ord('a'), 12) + + U_BOOT_DATA + tools.GetBytes(ord('&'), 4)) + self.assertEqual(expected, data) + + def testThreadTimeout(self): + """Test handling a thread that takes too long""" + with self.assertRaises(ValueError) as e: + self._DoTestFile('055_sections.dts', test_section_timeout=True) + self.assertIn("Node '/binman/section@0': Timed out obtaining contents", + str(e.exception)) +
if __name__ == "__main__": unittest.main() diff --git a/tools/binman/image.py b/tools/binman/image.py index 10778f47fe9..cdc58b39a40 100644 --- a/tools/binman/image.py +++ b/tools/binman/image.py @@ -36,6 +36,8 @@ class Image(section.Entry_section): fdtmap_data: Contents of the fdtmap when loading from a file allow_repack: True to add properties to allow the image to be safely repacked later + test_section_timeout: Use a zero timeout for section multi-threading + (for testing)
Args: copy_to_orig: Copy offset/size to orig_offset/orig_size after reading @@ -74,6 +76,7 @@ class Image(section.Entry_section): self.allow_repack = False self._ignore_missing = ignore_missing self.use_expanded = use_expanded + self.test_section_timeout = False if not test: self.ReadNode()
diff --git a/tools/binman/state.py b/tools/binman/state.py index e4a3abd2805..809429fbd9d 100644 --- a/tools/binman/state.py +++ b/tools/binman/state.py @@ -7,6 +7,7 @@
import hashlib import re +import threading
from dtoc import fdt import os @@ -56,6 +57,9 @@ allow_entry_expansion = True # to the new ones, the compressed size increases, etc. allow_entry_contraction = False
+# Number of threads to use for binman (None means machine-dependent) +num_threads = None + def GetFdtForEtype(etype): """Get the Fdt object for a particular device-tree entry
@@ -421,3 +425,22 @@ def AllowEntryContraction(): raised """ return allow_entry_contraction + +def SetThreads(threads): + """Set the number of threads to use when building sections + + Args: + threads: Number of threads to use (None for default, 0 for + single-threaded) + """ + global num_threads + + num_threads = threads + +def GetThreads(): + """Get the number of threads to use when building sections + + Returns: + Number of threads to use (None for default, 0 for single-threaded) + """ + return num_threads

Some images may take a while to build, e.g. if they are large and use slow compression. Support compiling sections in parallel to speed things up.
Signed-off-by: Simon Glass sjg@chromium.org ---
tools/binman/binman.rst | 18 ++++++++++++++++++ tools/binman/cmdline.py | 4 ++++ tools/binman/control.py | 4 ++++ tools/binman/etype/section.py | 36 ++++++++++++++++++++++++++++++++--- tools/binman/ftest.py | 33 ++++++++++++++++++++++++++++---- tools/binman/image.py | 3 +++ tools/binman/state.py | 23 ++++++++++++++++++++++ 7 files changed, 114 insertions(+), 7 deletions(-)
Applied to u-boot-dm, thanks!

The constructor should not read the node information. Move it to the ReadNode() method instead. This allows this etype to be subclassed.
Signed-off-by: Simon Glass sjg@chromium.org ---
tools/binman/etype/files.py | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/tools/binman/etype/files.py b/tools/binman/etype/files.py index 5db36abef0b..9b04a496a85 100644 --- a/tools/binman/etype/files.py +++ b/tools/binman/etype/files.py @@ -34,6 +34,9 @@ class Entry_files(Entry_section): from binman import state
super().__init__(section, etype, node) + + def ReadNode(self): + super().ReadNode() self._pattern = fdt_util.GetString(self._node, 'pattern') if not self._pattern: self.Raise("Missing 'pattern' property")

The constructor should not read the node information. Move it to the ReadNode() method instead. This allows this etype to be subclassed.
Signed-off-by: Simon Glass sjg@chromium.org ---
tools/binman/etype/files.py | 3 +++ 1 file changed, 3 insertions(+)
Applied to u-boot-dm, thanks!

This is faster if data is being concatenated. Update the section and collection etypes.
Signed-off-by: Simon Glass sjg@chromium.org ---
tools/binman/etype/collection.py | 2 +- tools/binman/etype/section.py | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/tools/binman/etype/collection.py b/tools/binman/etype/collection.py index 1625575fe98..442b40b48b3 100644 --- a/tools/binman/etype/collection.py +++ b/tools/binman/etype/collection.py @@ -40,7 +40,7 @@ class Entry_collection(Entry): """ # Join up all the data self.Info('Getting contents, required=%s' % required) - data = b'' + data = bytearray() for entry_phandle in self.content: entry_data = self.section.GetContentsByPhandle(entry_phandle, self, required) diff --git a/tools/binman/etype/section.py b/tools/binman/etype/section.py index 92d3f3add4c..e2949fc9163 100644 --- a/tools/binman/etype/section.py +++ b/tools/binman/etype/section.py @@ -166,7 +166,7 @@ class Entry_section(Entry): pad_byte = (entry._pad_byte if isinstance(entry, Entry_section) else self._pad_byte)
- data = b'' + data = bytearray() # Handle padding before the entry if entry.pad_before: data += tools.GetBytes(self._pad_byte, entry.pad_before) @@ -200,7 +200,7 @@ class Entry_section(Entry): Returns: Contents of the section (bytes) """ - section_data = b'' + section_data = bytearray()
for entry in self._entries.values(): entry_data = entry.GetData(required)

This is faster if data is being concatenated. Update the section and collection etypes.
Signed-off-by: Simon Glass sjg@chromium.org ---
tools/binman/etype/collection.py | 2 +- tools/binman/etype/section.py | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-)
Applied to u-boot-dm, thanks!

If the process outputs a lot of data on stdout this can be quite slow, since the bytestring is regenerated each time. Use a bytearray instead.
Signed-off-by: Simon Glass sjg@chromium.org ---
tools/patman/cros_subprocess.py | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/tools/patman/cros_subprocess.py b/tools/patman/cros_subprocess.py index efd0a5aaf72..fdd51386856 100644 --- a/tools/patman/cros_subprocess.py +++ b/tools/patman/cros_subprocess.py @@ -169,11 +169,11 @@ class Popen(subprocess.Popen): self.stdin.close() if self.stdout: read_set.append(self.stdout) - stdout = b'' + stdout = bytearray() if self.stderr and self.stderr != self.stdout: read_set.append(self.stderr) - stderr = b'' - combined = b'' + stderr = bytearray() + combined = bytearray()
input_offset = 0 while read_set or write_set:

If the process outputs a lot of data on stdout this can be quite slow, since the bytestring is regenerated each time. Use a bytearray instead.
Signed-off-by: Simon Glass sjg@chromium.org ---
tools/patman/cros_subprocess.py | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
Applied to u-boot-dm, thanks!

One of binman's attributes is that it is extremely fast, at least for a Python program. Add some simple timing around operations that might take a while, such as reading an image and compressing it. This should help to maintain the performance as new features are added.
This is for debugging purposes only.
Signed-off-by: Simon Glass sjg@chromium.org ---
tools/binman/control.py | 3 ++ tools/binman/etype/blob.py | 5 +++ tools/binman/ftest.py | 8 +++++ tools/binman/state.py | 72 ++++++++++++++++++++++++++++++++++++++ 4 files changed, 88 insertions(+)
diff --git a/tools/binman/control.py b/tools/binman/control.py index b2113b6e64f..dcba02ff7f8 100644 --- a/tools/binman/control.py +++ b/tools/binman/control.py @@ -646,6 +646,9 @@ def Binman(args):
if missing: tout.Warning("\nSome images are invalid") + + # Use this to debug the time take to pack the image + #state.TimingShow() finally: tools.FinaliseOutputDir() finally: diff --git a/tools/binman/etype/blob.py b/tools/binman/etype/blob.py index 018f8c9a319..fae86ca3ec0 100644 --- a/tools/binman/etype/blob.py +++ b/tools/binman/etype/blob.py @@ -6,6 +6,7 @@ #
from binman.entry import Entry +from binman import state from dtoc import fdt_util from patman import tools from patman import tout @@ -59,8 +60,12 @@ class Entry_blob(Entry): the data in chunks and avoid reading it all at once. For now this seems like an unnecessary complication. """ + state.TimingStart('read') indata = tools.ReadFile(self._pathname) + state.TimingAccum('read') + state.TimingStart('compress') data = self.CompressData(indata) + state.TimingAccum('compress') self.SetContents(data) return True
diff --git a/tools/binman/ftest.py b/tools/binman/ftest.py index ceaa6a8737c..d93361778ec 100644 --- a/tools/binman/ftest.py +++ b/tools/binman/ftest.py @@ -4634,6 +4634,14 @@ class TestFunctional(unittest.TestCase): self.assertIn("Node '/binman/section@0': Timed out obtaining contents", str(e.exception))
+ def testTiming(self): + """Test output of timing information""" + data = self._DoReadFile('055_sections.dts') + with test_util.capture_sys_output() as (stdout, stderr): + state.TimingShow() + self.assertIn('read:', stdout.getvalue()) + self.assertIn('compress:', stdout.getvalue()) +
if __name__ == "__main__": unittest.main() diff --git a/tools/binman/state.py b/tools/binman/state.py index 809429fbd9d..17d72cadecb 100644 --- a/tools/binman/state.py +++ b/tools/binman/state.py @@ -5,8 +5,10 @@ # Holds and modifies the state information held by binman #
+from collections import defaultdict import hashlib import re +import time import threading
from dtoc import fdt @@ -60,6 +62,27 @@ allow_entry_contraction = False # Number of threads to use for binman (None means machine-dependent) num_threads = None
+ +class Timing: + """Holds information about an operation that is being timed + + Properties: + name: Operation name (only one of each name is stored) + start: Start time of operation in seconds (None if not start) + accum:: Amount of time spent on this operation so far, in seconds + """ + def __init__(self, name): + self.name = name + self.start = None # cause an error if TimingStart() is not called + self.accum = 0.0 + + +# Holds timing info for each name: +# key: name of Timing info (Timing.name) +# value: Timing object +timing_info = {} + + def GetFdtForEtype(etype): """Get the Fdt object for a particular device-tree entry
@@ -444,3 +467,52 @@ def GetThreads(): Number of threads to use (None for default, 0 for single-threaded) """ return num_threads + +def GetTiming(name): + """Get the timing info for a particular operation + + The object is created if it does not already exist. + + Args: + name: Operation name to get + + Returns: + Timing object for the current thread + """ + threaded_name = '%s:%d' % (name, threading.get_ident()) + timing = timing_info.get(threaded_name) + if not timing: + timing = Timing(threaded_name) + timing_info[threaded_name] = timing + return timing + +def TimingStart(name): + """Start the timer for an operation + + Args: + name: Operation name to start + """ + timing = GetTiming(name) + timing.start = time.monotonic() + +def TimingAccum(name): + """Stop and accumlate the time for an operation + + This measures the time since the last TimingStart() and adds that to the + accumulated time. + + Args: + name: Operation name to start + """ + timing = GetTiming(name) + timing.accum += time.monotonic() - timing.start + +def TimingShow(): + """Show all timing information""" + duration = defaultdict(float) + for threaded_name, timing in timing_info.items(): + name = threaded_name.split(':')[0] + duration[name] += timing.accum + + for name, seconds in duration.items(): + print('%10s: %10.1fms' % (name, seconds * 1000))

One of binman's attributes is that it is extremely fast, at least for a Python program. Add some simple timing around operations that might take a while, such as reading an image and compressing it. This should help to maintain the performance as new features are added.
This is for debugging purposes only.
Signed-off-by: Simon Glass sjg@chromium.org ---
tools/binman/control.py | 3 ++ tools/binman/etype/blob.py | 5 +++ tools/binman/ftest.py | 8 +++++ tools/binman/state.py | 72 ++++++++++++++++++++++++++++++++++++++ 4 files changed, 88 insertions(+)
Applied to u-boot-dm, thanks!
participants (1)
-
Simon Glass