[PATCH v2 0/6] Port gen_compile_commands.py from Linux to U-Boot

Hello U-Boot community,
I'm submitting a patch series that ports the gen_compile_commands.py script from the Linux kernel's sources to U-Boot. This script, originally located in scripts/clang-tools/gen_compile_commands.py, enables the generation of compile_commands.json file for improved code navigation and analysis. The series consists of six patches: the initial script import and the necessary modifications for U-Boot compatibility.
Your feedback on these contributions would be greatly appreciated.
Best regards,
Changes in v2: - Add compile_commands.json to gitignore - Add documentation
Joao Marcos Costa (6): scripts: Port Linux's gen_compile_commands.py to U-Boot scripts/gen_compile_commands.py: adapt _LINE_PATTERN scripts/gen_compile_commands.py: fix docstring scripts/gen_compile_commands.py: add acknowledgments .gitignore: add compile_commands.json doc: add documentation for gen_compile_commands.py
.gitignore | 3 + doc/develop/gen_compile_commands.rst | 46 ++++++ scripts/gen_compile_commands.py | 230 +++++++++++++++++++++++++++ 3 files changed, 279 insertions(+) create mode 100644 doc/develop/gen_compile_commands.rst create mode 100755 scripts/gen_compile_commands.py

This script generates a database of compiler flags, namely compile_commands.json. It is quite useful for text editors that use clangd LSP (e.g. Vim, Neovim).
It was ported from Linux's sources: - tag: v6.4 - revision 6995e2de6891c724bfeb2db33d7b87775f913ad1
Modifications for U-Boot compatibility will be added in a follow-up commit.
Signed-off-by: Joao Marcos Costa jmcosta944@gmail.com --- scripts/gen_compile_commands.py | 228 ++++++++++++++++++++++++++++++++ 1 file changed, 228 insertions(+) create mode 100755 scripts/gen_compile_commands.py
diff --git a/scripts/gen_compile_commands.py b/scripts/gen_compile_commands.py new file mode 100755 index 0000000000..15ba56527a --- /dev/null +++ b/scripts/gen_compile_commands.py @@ -0,0 +1,228 @@ +#!/usr/bin/env python3 +# SPDX-License-Identifier: GPL-2.0 +# +# Copyright (C) Google LLC, 2018 +# +# Author: Tom Roeder tmroeder@google.com +# +"""A tool for generating compile_commands.json in the Linux kernel.""" + +import argparse +import json +import logging +import os +import re +import subprocess +import sys + +_DEFAULT_OUTPUT = 'compile_commands.json' +_DEFAULT_LOG_LEVEL = 'WARNING' + +_FILENAME_PATTERN = r'^..*.cmd$' +_LINE_PATTERN = r'^savedcmd_[^ ]*.o := (.* )([^ ]*.c) *(;|$)' +_VALID_LOG_LEVELS = ['DEBUG', 'INFO', 'WARNING', 'ERROR', 'CRITICAL'] +# The tools/ directory adopts a different build system, and produces .cmd +# files in a different format. Do not support it. +_EXCLUDE_DIRS = ['.git', 'Documentation', 'include', 'tools'] + +def parse_arguments(): + """Sets up and parses command-line arguments. + + Returns: + log_level: A logging level to filter log output. + directory: The work directory where the objects were built. + ar: Command used for parsing .a archives. + output: Where to write the compile-commands JSON file. + paths: The list of files/directories to handle to find .cmd files. + """ + usage = 'Creates a compile_commands.json database from kernel .cmd files' + parser = argparse.ArgumentParser(description=usage) + + directory_help = ('specify the output directory used for the kernel build ' + '(defaults to the working directory)') + parser.add_argument('-d', '--directory', type=str, default='.', + help=directory_help) + + output_help = ('path to the output command database (defaults to ' + + _DEFAULT_OUTPUT + ')') + parser.add_argument('-o', '--output', type=str, default=_DEFAULT_OUTPUT, + help=output_help) + + log_level_help = ('the level of log messages to produce (defaults to ' + + _DEFAULT_LOG_LEVEL + ')') + parser.add_argument('--log_level', choices=_VALID_LOG_LEVELS, + default=_DEFAULT_LOG_LEVEL, help=log_level_help) + + ar_help = 'command used for parsing .a archives' + parser.add_argument('-a', '--ar', type=str, default='llvm-ar', help=ar_help) + + paths_help = ('directories to search or files to parse ' + '(files should be *.o, *.a, or modules.order). ' + 'If nothing is specified, the current directory is searched') + parser.add_argument('paths', type=str, nargs='*', help=paths_help) + + args = parser.parse_args() + + return (args.log_level, + os.path.abspath(args.directory), + args.output, + args.ar, + args.paths if len(args.paths) > 0 else [args.directory]) + + +def cmdfiles_in_dir(directory): + """Generate the iterator of .cmd files found under the directory. + + Walk under the given directory, and yield every .cmd file found. + + Args: + directory: The directory to search for .cmd files. + + Yields: + The path to a .cmd file. + """ + + filename_matcher = re.compile(_FILENAME_PATTERN) + exclude_dirs = [ os.path.join(directory, d) for d in _EXCLUDE_DIRS ] + + for dirpath, dirnames, filenames in os.walk(directory, topdown=True): + # Prune unwanted directories. + if dirpath in exclude_dirs: + dirnames[:] = [] + continue + + for filename in filenames: + if filename_matcher.match(filename): + yield os.path.join(dirpath, filename) + + +def to_cmdfile(path): + """Return the path of .cmd file used for the given build artifact + + Args: + Path: file path + + Returns: + The path to .cmd file + """ + dir, base = os.path.split(path) + return os.path.join(dir, '.' + base + '.cmd') + + +def cmdfiles_for_a(archive, ar): + """Generate the iterator of .cmd files associated with the archive. + + Parse the given archive, and yield every .cmd file used to build it. + + Args: + archive: The archive to parse + + Yields: + The path to every .cmd file found + """ + for obj in subprocess.check_output([ar, '-t', archive]).decode().split(): + yield to_cmdfile(obj) + + +def cmdfiles_for_modorder(modorder): + """Generate the iterator of .cmd files associated with the modules.order. + + Parse the given modules.order, and yield every .cmd file used to build the + contained modules. + + Args: + modorder: The modules.order file to parse + + Yields: + The path to every .cmd file found + """ + with open(modorder) as f: + for line in f: + obj = line.rstrip() + base, ext = os.path.splitext(obj) + if ext != '.o': + sys.exit('{}: module path must end with .o'.format(obj)) + mod = base + '.mod' + # Read from *.mod, to get a list of objects that compose the module. + with open(mod) as m: + for mod_line in m: + yield to_cmdfile(mod_line.rstrip()) + + +def process_line(root_directory, command_prefix, file_path): + """Extracts information from a .cmd line and creates an entry from it. + + Args: + root_directory: The directory that was searched for .cmd files. Usually + used directly in the "directory" entry in compile_commands.json. + command_prefix: The extracted command line, up to the last element. + file_path: The .c file from the end of the extracted command. + Usually relative to root_directory, but sometimes absolute. + + Returns: + An entry to append to compile_commands. + + Raises: + ValueError: Could not find the extracted file based on file_path and + root_directory or file_directory. + """ + # The .cmd files are intended to be included directly by Make, so they + # escape the pound sign '#', either as '#' or '$(pound)' (depending on the + # kernel version). The compile_commands.json file is not interepreted + # by Make, so this code replaces the escaped version with '#'. + prefix = command_prefix.replace('#', '#').replace('$(pound)', '#') + + # Use os.path.abspath() to normalize the path resolving '.' and '..' . + abs_path = os.path.abspath(os.path.join(root_directory, file_path)) + if not os.path.exists(abs_path): + raise ValueError('File %s not found' % abs_path) + return { + 'directory': root_directory, + 'file': abs_path, + 'command': prefix + file_path, + } + + +def main(): + """Walks through the directory and finds and parses .cmd files.""" + log_level, directory, output, ar, paths = parse_arguments() + + level = getattr(logging, log_level) + logging.basicConfig(format='%(levelname)s: %(message)s', level=level) + + line_matcher = re.compile(_LINE_PATTERN) + + compile_commands = [] + + for path in paths: + # If 'path' is a directory, handle all .cmd files under it. + # Otherwise, handle .cmd files associated with the file. + # built-in objects are linked via vmlinux.a + # Modules are listed in modules.order. + if os.path.isdir(path): + cmdfiles = cmdfiles_in_dir(path) + elif path.endswith('.a'): + cmdfiles = cmdfiles_for_a(path, ar) + elif path.endswith('modules.order'): + cmdfiles = cmdfiles_for_modorder(path) + else: + sys.exit('{}: unknown file type'.format(path)) + + for cmdfile in cmdfiles: + with open(cmdfile, 'rt') as f: + result = line_matcher.match(f.readline()) + if result: + try: + entry = process_line(directory, result.group(1), + result.group(2)) + compile_commands.append(entry) + except ValueError as err: + logging.info('Could not add line from %s: %s', + cmdfile, err) + + with open(output, 'wt') as f: + json.dump(compile_commands, f, indent=2, sort_keys=True) + + +if __name__ == '__main__': + main()

For U-Boot's context, the regular expression defined by _LINE_PATTERN should be adapted. Replace 'savedcmd' by 'cmd'.
Signed-off-by: Joao Marcos Costa jmcosta944@gmail.com --- scripts/gen_compile_commands.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/scripts/gen_compile_commands.py b/scripts/gen_compile_commands.py index 15ba56527a..0227522959 100755 --- a/scripts/gen_compile_commands.py +++ b/scripts/gen_compile_commands.py @@ -19,7 +19,7 @@ _DEFAULT_OUTPUT = 'compile_commands.json' _DEFAULT_LOG_LEVEL = 'WARNING'
_FILENAME_PATTERN = r'^..*.cmd$' -_LINE_PATTERN = r'^savedcmd_[^ ]*.o := (.* )([^ ]*.c) *(;|$)' +_LINE_PATTERN = r'^cmd_[^ ]*.o := (.* )([^ ]*.c) *(;|$)' _VALID_LOG_LEVELS = ['DEBUG', 'INFO', 'WARNING', 'ERROR', 'CRITICAL'] # The tools/ directory adopts a different build system, and produces .cmd # files in a different format. Do not support it.

The referred tool is now in U-Boot. Replace "the Linux kernel" by "U-Boot" to make the docstring coherent.
Signed-off-by: Joao Marcos Costa jmcosta944@gmail.com --- scripts/gen_compile_commands.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/scripts/gen_compile_commands.py b/scripts/gen_compile_commands.py index 0227522959..63d036a773 100755 --- a/scripts/gen_compile_commands.py +++ b/scripts/gen_compile_commands.py @@ -5,7 +5,7 @@ # # Author: Tom Roeder tmroeder@google.com # -"""A tool for generating compile_commands.json in the Linux kernel.""" +"""A tool for generating compile_commands.json in U-Boot."""
import argparse import json

Add acknowledgments for porting and modifying the script. Of course, the license, author, and copyright notice remain the same as in the original script.
Signed-off-by: Joao Marcos Costa jmcosta944@gmail.com --- scripts/gen_compile_commands.py | 1 + 1 file changed, 1 insertion(+)
diff --git a/scripts/gen_compile_commands.py b/scripts/gen_compile_commands.py index 63d036a773..1a9c49b34a 100755 --- a/scripts/gen_compile_commands.py +++ b/scripts/gen_compile_commands.py @@ -4,6 +4,7 @@ # Copyright (C) Google LLC, 2018 # # Author: Tom Roeder tmroeder@google.com +# Ported and modified for U-Boot by Joao Marcos Costa jmcosta944@gmail.com # """A tool for generating compile_commands.json in U-Boot."""

Add Clang's compilation database file (i.e. compile_commands.json) to .gitignore, at the root of the repository.
Signed-off-by: Joao Marcos Costa jmcosta944@gmail.com --- .gitignore | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/.gitignore b/.gitignore index 002f95de4f..261a1d6754 100644 --- a/.gitignore +++ b/.gitignore @@ -109,3 +109,6 @@ __pycache__
# moveconfig database /moveconfig.db + +# Clang's compilation database file +/compile_commands.json

This documentation briefly explains what is a compilation database, and how to use the script to generate one.
This is not a portage, as there was no original documentation in the Linux sources.
Acknowledge the documentation in the script's header.
Signed-off-by: Joao Marcos Costa jmcosta944@gmail.com --- doc/develop/gen_compile_commands.rst | 46 ++++++++++++++++++++++++++++ scripts/gen_compile_commands.py | 1 + 2 files changed, 47 insertions(+) create mode 100644 doc/develop/gen_compile_commands.rst
diff --git a/doc/develop/gen_compile_commands.rst b/doc/develop/gen_compile_commands.rst new file mode 100644 index 0000000000..09466938fe --- /dev/null +++ b/doc/develop/gen_compile_commands.rst @@ -0,0 +1,46 @@ +.. SPDX-License-Identifier: GPL-2.0-only + +========== +gen_compile_commands +========== + +gen_compile_commands (scripts/gen_compile_commands.py) is a script used to +generate a compilation database (compile_commands.json). This database consists +of an array of "command objects" describing how each translation unit was +compiled. + +Example:: + + { + "command": "gcc -Wp,-MD,arch/x86/cpu/.lapic.o.d -nostdinc -isystem (...)" + "directory": "/home/jmcosta/u-boot", + "file": "/home/jmcosta/u-boot/arch/x86/cpu/lapic.c" + } + +Such information comes from parsing the respective .cmd file of each translation +unit. In the previous example, that would be `arch/x86/cpu/.lapic.o.cmd`. + +The compilation database is quite useful for text editors (and IDEs) that use +Clangd LSP. It allows jumping to definitions and declarations. Since it relies +on parsing .cmd files, one needs to have a target (e.g. configs/*_defconfig) +built before running the script. + +Example:: + + make sandbox_defconfig + make + ./scripts/gen_compile_commands.py + +The database will be in the root of the repository. No further modifications are +needed for it to be usable by the LSP, unless you set a name for the database +other than it's default one (compile_commands.json). + +Options +======= + +For further details on how to use the script and its options, please refer to +its help message, as in the example below. + +Help:: + + ./scripts/gen_compile_commands.py --help diff --git a/scripts/gen_compile_commands.py b/scripts/gen_compile_commands.py index 1a9c49b34a..f0c6bafdc5 100755 --- a/scripts/gen_compile_commands.py +++ b/scripts/gen_compile_commands.py @@ -5,6 +5,7 @@ # # Author: Tom Roeder tmroeder@google.com # Ported and modified for U-Boot by Joao Marcos Costa jmcosta944@gmail.com +# Briefly documented at doc/develop/gen_compile_commands.rst # """A tool for generating compile_commands.json in U-Boot."""

On Fri, Sep 01, 2023 at 10:03:53PM +0200, Joao Marcos Costa wrote:
This documentation briefly explains what is a compilation database, and how to use the script to generate one.
This is not a portage, as there was no original documentation in the Linux sources.
Acknowledge the documentation in the script's header.
Signed-off-by: Joao Marcos Costa jmcosta944@gmail.com
doc/develop/gen_compile_commands.rst | 46 ++++++++++++++++++++++++++++
This isn't included in one of the index files, which will be the first error 'make htmldocs' throws, and I think I saw other syntax errors as well in the file. The docs themselves look fine, thanks.

Hello,
Em sáb., 2 de set. de 2023 às 15:11, Tom Rini trini@konsulko.com escreveu:
This isn't included in one of the index files, which will be the first error 'make htmldocs' throws, and I think I saw other syntax errors as well in the file. The docs themselves look fine, thanks.
-- Tom
I'm not sure what's the most pertinent category in 'develop/index.rst'.
Would 'general' be a good idea? I thought about putting it in the same category as checkpatch, but this isn't refactoring per se.
Thanks.

On Sat, Sep 02, 2023 at 03:38:13PM +0200, João Marcos Costa wrote:
Hello,
Em sáb., 2 de set. de 2023 às 15:11, Tom Rini trini@konsulko.com escreveu:
This isn't included in one of the index files, which will be the first error 'make htmldocs' throws, and I think I saw other syntax errors as well in the file. The docs themselves look fine, thanks.
-- Tom
I'm not sure what's the most pertinent category in 'develop/index.rst'.
Would 'general' be a good idea? I thought about putting it in the same category as checkpatch, but this isn't refactoring per se.
So, the main use case is to make integration with IDEs easier? Maybe put it under doc/build/ or referenced from doc/build/tools.rst with a small section like "Integration with IDEs" and a few words about how some IDEs such as ... use Clangd LSP for ... and see :doc:... for more information on how to generate these files.
participants (3)
-
Joao Marcos Costa
-
João Marcos Costa
-
Tom Rini