Skip to content

fix(ls): use GetACP to detect UTF-8 encoding on Windows#11722

Open
mattsu2020 wants to merge 2 commits intouutils:mainfrom
mattsu2020:ls_fix_utf-8
Open

fix(ls): use GetACP to detect UTF-8 encoding on Windows#11722
mattsu2020 wants to merge 2 commits intouutils:mainfrom
mattsu2020:ls_fix_utf-8

Conversation

@mattsu2020
Copy link
Copy Markdown
Contributor

@mattsu2020 mattsu2020 commented Apr 9, 2026

Summary

  • On Windows, locale environment variables (LC_ALL, LC_COLLATE, LANG) are typically unset, causing get_locale_from_env() to default to UEncoding::Ascii
  • This makes non-ASCII filenames display as octal escape sequences ($'\303\255') or ? characters in ls output
  • Fix by querying the system ANSI code page via GetACP() FFI when no locale variables are set — if the active code page is 65001 (UTF-8), use UEncoding::Utf8
  • This aligns with GNU coreutils' gnulib approach where locale_charset() calls GetACP() on Windows

Before (Windows PowerShell)

PS> coreutils.exe ls
'f'$'\303\255''l'$'\303\250\342\202\202'  ''$'\346\226\207\344\273\266''1'
''$'\321\204\320\260\320\271\320\273''3'
PS> coreutils.exe ls -N
f??l?????  ????????3  ??????1

After (with ACP 65001 / "Use Unicode UTF-8 for worldwide language support" enabled)

PS> coreutils.exe ls
fílè₂  файл3  文件1

Fixes #11103

@sylvestre
Copy link
Copy Markdown
Contributor

please add a test to make sure we don't regress in the future
thanks

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 9, 2026

GNU testsuite comparison:

Skipping an intermittent issue tests/tty/tty-eof (passes in this run but fails in the 'main' branch)
Congrats! The gnu test tests/cut/cut-huge-range is now passing!

}

#[cfg(test)]
mod tests {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not here please
in tests/by-util/test_ls.rs

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 9, 2026

GNU testsuite comparison:

GNU test failed: tests/tail/tail-n0f. tests/tail/tail-n0f is passing on 'main'. Maybe you have to rebase?
Skipping an intermittent issue tests/pr/bounded-memory (passes in this run but fails in the 'main' branch)
Note: The gnu test tests/csplit/csplit-heap is now being skipped but was previously passing.
Note: The gnu test tests/tail/pipe-f is now being skipped but was previously passing.
Congrats! The gnu test tests/cut/cut-huge-range is now passing!

On Windows, locale environment variables (LC_ALL, LC_COLLATE, LANG)
are typically unset, causing get_locale_from_env() to default to
UEncoding::Ascii. This makes non-ASCII filenames display as octal
escape sequences or `?` characters in ls output.

Fix by querying the system ANSI code page via GetACP() when no locale
variables are set. If the active code page is 65001 (UTF-8), use
UEncoding::Utf8. This aligns with GNU coreutils' gnulib approach
which calls locale_charset() -> GetACP() on Windows.

Fixes: uutils#11103
…tests

This commit simplifies string handling by removing unnecessary `expect()` calls and improves code formatting in the ls test module. The changes include:
- Removing redundant `expect()` calls for valid Unicode strings
- Consolidating multi-line method chaining into single lines for better readability
- These are purely cosmetic improvements that maintain the same functionality while making the code cleaner
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 9, 2026

GNU testsuite comparison:

Note: The gnu test tests/cut/bounded-memory is now being skipped but was previously passing.
Note: The gnu test tests/unexpand/bounded-memory is now being skipped but was previously passing.
Congrats! The gnu test tests/cp/link-heap is now passing!
Congrats! The gnu test tests/dd/no-allocate is now passing!
Congrats! The gnu test tests/seq/seq-epipe is now passing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ls produce bad output in Windows PowerShell if file name contains non-ASCII characters

2 participants