Re: [gitdm PATCH 2/2] logparser.py: Try and be more robust with unicode handling

12 Jul 2022

On Tue, Jul 12, 2022 at 04:58:46AM -0600, Simon Glass wrote:
...
On Thu, 7 Jul 2022 at 13:22, Tom Rini trini@konsulko.com wrote:
...
Given the sometimes oddly formatted data that can come through when
removing code, we need to be as flexible as possible when handling it.
Set our encoding to unicode_escape and if we still run in to a problem,
it's likely going to be OK to ignore it.
Signed-off-by: Tom Rini trini@konsulko.com
I've emailed this to Jonathan Corbet as well as he's the upstream for
the project, and this does work for me.  But I'm not a python guru by
any means.  But trying to run the stats for v2022.04..v2022.07-rc6 blows
up in places otherwise.
logparser.py | 1 +
 1 file changed, 1 insertion(+)
Reviewed-by: Simon Glass sjg@chromium.org
BTW I have found that using binary is helpful in many places, the
convert to UTF-8 when displaying things.
...

diff --git a/logparser.py b/logparser.py
index efbc72f868eb..d5906e97689d 100644
--- a/logparser.py
+++ b/logparser.py
@@ -37,6 +37,7 @@ class LogPatchSplitter:
         self.fd = fd
         self.buffer = None
         self.patch = []

   sys.stdin.reconfigure(encoding='unicode_escape', errors='ignore')


def __iter__(self):
       return self


So, I followed up with Jonathan, but hadn't yet for  the list.
unicode_escape works, but then the results don't read right.  It turned
out utf-8 was the right encoding, but the first time I tried testing it
I had some other problem locally.
-- 
Tom