fix(ls): use GetACP to detect UTF-8 encoding on Windows#11722
Open
mattsu2020 wants to merge 2 commits intouutils:mainfrom
Open
fix(ls): use GetACP to detect UTF-8 encoding on Windows#11722mattsu2020 wants to merge 2 commits intouutils:mainfrom
mattsu2020 wants to merge 2 commits intouutils:mainfrom
Conversation
Contributor
|
please add a test to make sure we don't regress in the future |
|
GNU testsuite comparison: |
sylvestre
reviewed
Apr 9, 2026
| } | ||
|
|
||
| #[cfg(test)] | ||
| mod tests { |
Contributor
There was a problem hiding this comment.
not here please
in tests/by-util/test_ls.rs
|
GNU testsuite comparison: |
On Windows, locale environment variables (LC_ALL, LC_COLLATE, LANG) are typically unset, causing get_locale_from_env() to default to UEncoding::Ascii. This makes non-ASCII filenames display as octal escape sequences or `?` characters in ls output. Fix by querying the system ANSI code page via GetACP() when no locale variables are set. If the active code page is 65001 (UTF-8), use UEncoding::Utf8. This aligns with GNU coreutils' gnulib approach which calls locale_charset() -> GetACP() on Windows. Fixes: uutils#11103
…tests This commit simplifies string handling by removing unnecessary `expect()` calls and improves code formatting in the ls test module. The changes include: - Removing redundant `expect()` calls for valid Unicode strings - Consolidating multi-line method chaining into single lines for better readability - These are purely cosmetic improvements that maintain the same functionality while making the code cleaner
|
GNU testsuite comparison: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
LC_ALL,LC_COLLATE,LANG) are typically unset, causingget_locale_from_env()to default toUEncoding::Ascii$'\303\255') or?characters inlsoutputGetACP()FFI when no locale variables are set — if the active code page is 65001 (UTF-8), useUEncoding::Utf8locale_charset()callsGetACP()on WindowsBefore (Windows PowerShell)
After (with ACP 65001 / "Use Unicode UTF-8 for worldwide language support" enabled)
Fixes #11103