guix/guix
Tomas Volf 50927338dd
gexp: Improve support of Unicode characters.
Support for non-ASCII characters was mixed.  Some gexp forms did support them,
while others did not.  Combined with current value for
%default-port-conversion-strategy, that sometimes led to unpleasant surprises.
For example:

    (scheme-file "utf8" #~(with-output-to-file #$output
                            (λ _ (display "猫"))))

Was written to the store as:

    ((? _ (display "\u732b")))

No, that is not font issue on your part, that is an actual #\? instead of the
lambda character.  Which, surprisingly, does not do what it should when
executed.

The solution is to switch to C.UTF-8 locale where possible, since it is now
always available.  Or to explicitly set the port encoding.

No tests are provided, since majority of tests/gexp.scm use guile in version
2, and it tends to work under it.  The issues occur mostly with guile 3.

I did test it locally using:

      #!/bin/sh
      set -eu
      set -x

      [ -f guix.scm ] || { echo >&2 Run from root of Guix repo.; exit 1; }
      [ -f gnu.scm  ] || { echo >&2 Run from root of Guix repo.; exit 1; }

      cat >猫.scm <<'EOF'
      (define-module (猫)
        #:export (say))

      (define (say)
        "nyaaaa~~~~!")
      EOF

      mkdir -p dir-with-utf8-file
      cp 猫.scm dir-with-utf8-file/

      cat >repro.scm <<'EOF'
      (use-modules (guix build utils)
                   (guix derivations)
                   (guix gexp)
                   (guix store)
                   (ice-9 ftw)
                   (ice-9 textual-ports))

      (define cat "猫")

      (define (drv-content drv)
        (call-with-input-file (derivation->output-path drv)
          get-string-all))

      (define (out-content out)
        (call-with-input-file out
          get-string-all))

      (define (drv-listing drv)
        (scandir (derivation->output-path drv)))

      (define (dir-listing dir)
        (scandir dir))

      (define-macro (test exp lower? report)
        (let ((type (car exp)))
          `(false-if-exception
            (let ((drv (with-store %store
                         (run-with-store %store
                           (,(if lower? lower-object identity) ,exp)))))
              (format #t "~%~a:~%" ',type)
              (when (with-store %store
                      (build-derivations %store (list drv)))
                (format #t "~a~%" (,report drv)))))))

      (test (computed-file "utf8"
                           #~(with-output-to-file #$output
                               (λ _ (display #$cat))))
            #t drv-content)

      (test (program-file "utf8"
                          #~((λ _ (display #$cat))))
            #t drv-content)

      (test (scheme-file "utf8"
                         #~((λ _ (display #$cat))))
            #t drv-content)

      (test (text-file* "utf8" cat cat cat)
            #f drv-content)

      (test (compiled-modules '((猫)))
            #f drv-listing)

      (test (file-union "utf8" `((,cat ,(plain-file "utf8" cat))))
            #t drv-listing)

      ;;; No fix needed:
      (test (imported-modules '((猫)))
            #f dir-listing)

      (test (local-file "dir-with-utf8-file" #:recursive? #t)
            #t dir-listing)

      (test (plain-file "utf8" cat)
            #t out-content)

      (test (mixed-text-file "utf8" cat cat cat)
            #t drv-content)

      (test (directory-union "utf8" (list (local-file "dir-with-utf8-file"
                                                      #:recursive? #t)))
            #t dir-listing)
      EOF

      guix shell -CWN -D guix glibc-locales -- \
           env LANG=C.UTF-8 ./pre-inst-env guix repl -- ./repro.scm

Before this commit, the output is:

      + '[' -f guix.scm ']'
      + '[' -f gnu.scm ']'
      + cat
      + mkdir -p dir-with-utf8-file
      + cp 猫.scm dir-with-utf8-file/
      + cat
      + guix shell -CWN -D guix glibc-locales -- env LANG=C.UTF-8 ./pre-inst-env guix repl -- ./repro.scm

      computed-file:
      ?

      program-file:
      #!/gnu/store/mfkz7fvlfpv3ppwbkv0imb19nrf95akf-guile-3.0.9/bin/guile --no-auto-compile
      !#
      ((? _ (display "\u732b")))

      scheme-file:
      ((? _ (display "\u732b")))

      text-file*:
      ???

      compiled-modules:
      building path(s) `/gnu/store/ay3jifyvliigfgnz67jf0kgngzpya5a5-module-import-compiled'
      Backtrace:
                 5 (primitive-load "/gnu/store/rn7b0dq6iqfmmqyqzamix2mjmfy?")
      In ice-9/eval.scm:
          619:8  4 (_ #f)
      In srfi/srfi-1.scm:
         460:18  3 (fold #<procedure 7ffff79245e0 at ice-9/eval.scm:336:1?> ?)
      In ice-9/eval.scm:
         245:16  2 (_ #(#(#<directory (guix build utils) 7ffff779f320>) # ?))
      In ice-9/boot-9.scm:
        1982:24  1 (_ _)
      In unknown file:
                 0 (stat "./???.scm" #<undefined>)

      ERROR: In procedure stat:
      In procedure stat: No such file or directory: "./???.scm"
      builder for `/gnu/store/dxg87135zcd6a1c92dlrkyvxlbhfwfld-module-import-compiled.drv' failed with exit code 1

      file-union:
      (. .. ?)

      imported-modules:
      (. .. 猫.scm)

      local-file:
      (. .. 猫.scm)

      plain-file:
      猫

      mixed-text-file:
      猫猫猫

      directory-union:
      (. .. 猫.scm)

Which I think you will agree is far from optimal.  After my fix the output
changes to:

      + '[' -f guix.scm ']'
      + '[' -f gnu.scm ']'
      + cat
      + mkdir -p dir-with-utf8-file
      + cp 猫.scm dir-with-utf8-file/
      + cat
      + guix shell -CWN -D guix glibc-locales -- env LANG=C.UTF-8 ./pre-inst-env guix repl -- ./repro.scm

      computed-file:
      猫

      program-file:
      #!/gnu/store/8kbmn359jqkgsbqgqxnmiryvd9ynz8w7-guile-3.0.9/bin/guile --no-auto-compile
      !#
      ((λ _ (display "猫")))

      scheme-file:
      ((λ _ (display "猫")))

      text-file*:
      猫猫猫

      compiled-modules:
      (. .. 猫.go)

      file-union:
      (. .. 猫)

      imported-modules:
      (. .. 猫.scm)

      local-file:
      (. .. 猫.scm)

      plain-file:
      猫

      mixed-text-file:
      猫猫猫

      directory-union:
      (. .. 猫.scm)

Which is actually what the user would expect.

I also added missing arguments to the documentation.

* guix/gexp.scm (computed-file):  Set LANG to C.UTF-8 by default.
(compiled-modules): Try to `setlocale'.
(gexp->script), (gexp->file): New `locale' argument defaulting to C.UTF-8.
(text-file*): Set output port encoding to UTF-8.
* doc/guix.texi (G-Expressions)[computed-file]: Document the changes.  Use
@var.  Document #:guile.
[gexp->script]: Document #:locale.  Fix default value for #:target.
[gexp->file]: Document #:locale, #:system and #:target.

Change-Id: Ib323b51af88a588b780ff48ddd04db8be7c729fb
2025-01-12 16:19:02 +01:00
..
build guix: build-system: cmake: Rework cross compilation. 2025-01-12 16:19:02 +01:00
build-system guix: build-system: cmake: Rework cross compilation. 2025-01-12 16:19:02 +01:00
import import: git: Do not search pre-release words in tag prefix. 2025-01-12 14:18:03 +08:00
platforms gnu: Add basic support for x86_64-pc-gnu target, aka 64bit Hurd. 2024-12-03 08:33:12 +01:00
scripts reconfigure: Make ‘load-system-for-kexec’ errors non-fatal. 2025-01-07 17:44:12 +01:00
store
tests
android-repo-download.scm
avahi.scm
base16.scm
base32.scm
base64.scm
build-system.scm
bzr-download.scm
cache.scm
channels.scm channels: Add #:verify-certificate? and honor it. 2024-12-25 23:51:10 +01:00
ci.scm
colors.scm
combinators.scm
config.scm.in
cpio.scm
cpu.scm guix: cpu: Add detection for znver5. 2024-09-30 13:38:33 +03:00
cve.scm
cvs-download.scm
d3.v3.js
deprecation.scm
derivations.scm derivations: ‘derivation-build-plan’ returns builds in topological order. 2024-11-12 23:46:49 +01:00
describe.scm
diagnostics.scm
discovery.scm
docker.scm docker: Add history field to config.json. 2025-01-07 17:51:18 +01:00
download.scm gnu-maintenance: ‘gnu-ftp’ updater excludes GnuPG-hosted packages. 2024-12-01 20:14:15 +01:00
elf.scm
ftp-client.scm
gexp.scm gexp: Improve support of Unicode characters. 2025-01-12 16:19:02 +01:00
git-authenticate.scm
git-download.scm git-download: Use C.UTF-8 and remove dependency on ‘glibc-utf8-locales’. 2024-08-31 10:44:46 +02:00
git.scm git: Allow X.509 certificate verification to be disabled. 2024-12-25 23:51:10 +01:00
glob.scm
gnu-maintenance.scm gnu-maintenance: ‘generic-html’ update honors <base href="…">. 2024-12-01 20:14:16 +01:00
gnupg.scm
grafts.scm grafts: Use C.UTF-8 and remove dependency on ‘glibc-utf8-locales’. 2024-08-31 10:44:46 +02:00
graph.js
graph.scm
hash.scm
hg-download.scm
http-client.scm
i18n.scm
inferior.scm inferior: Add #:verify-certificate? to ‘cached-channel-instance’. 2024-12-25 23:51:10 +01:00
ipfs.scm
least-authority.scm
licenses.scm licenses: Add blue-oak1.0.0. 2024-11-20 18:01:35 +08:00
lint.scm lint: Check for more types of custom 'check phases. 2024-12-18 13:22:47 +01:00
man-db.scm man-db: Correctly resolve man page links. 2024-09-30 22:52:23 +02:00
memoization.scm
modules.scm
monad-repl.scm
monads.scm
nar.scm
narinfo.scm
openpgp.scm
packages.scm packages: Add #:recursive? to ‘package-input-rewriting’. 2024-12-18 18:32:41 +01:00
pki.scm
platform.scm
profiles.scm man-db: Add support for zstd compressed man pages. 2024-08-31 10:44:22 +02:00
profiling.scm
progress.scm
quirks.scm
read-print.scm read-print: Indent ‘privileged-program’ specially. 2024-09-08 02:00:00 +02:00
records.scm records: Fix ABI check in cross-compilation context. 2024-11-18 10:58:42 +01:00
remote.scm remote: Do not double-quote the repl-command. 2024-12-12 12:52:08 +01:00
repl.scm
rpm.scm
scripts.scm
search-paths.scm search-paths: Swap $C_INCLUDE_PATH and $CPLUS_INCLUDE_PATH. 2024-08-31 10:44:20 +02:00
self.scm
serialization.scm
sets.scm
ssh.scm
status.scm
store.scm
substitutes.scm substitutes: Reduce default negative TTL values. 2024-09-16 00:16:21 +02:00
svn-download.scm guix: download-multi-svn-to-store: Allow exporting from base URL. 2024-08-31 10:45:36 +02:00
swh.scm
tests.scm
transformations.scm transformations: ‘package-with-upstream-version’ can preserve archive type. 2025-01-08 23:03:04 +01:00
ui.scm ui: It’s 2025 now! 2025-01-06 15:02:10 +01:00
upstream.scm upstream: Return #f when ‘source-urls’ is a <git-reference>. 2025-01-11 23:36:57 +01:00
utils.scm import: utils: Move downstream-package-name to (guix utils). 2024-12-30 13:49:57 +01:00
workers.scm