Caching Haskell Nix Packages

Similar to the previous chapter, we can cache our Haskell.nix project in similar way.

$ drv=$(nix-instantiate ./code/05-package-management/haskell-project-v3/nix/07-haskell.nix-materialized)

$ nix-build $drv && nix-store -qR --include-outputs $drv | grep -v .drv | cachix push $CACHIX_STORE
...
/nix/store/8vrdfinxxnwczn4jzknm44bsn3k5nghl-haskell-project-exe-hello-0.1.0.0
compressing and pushing /nix/store/8vrdfinxxnwczn4jzknm44bsn3k5nghl-haskell-project-exe-hello-0.1.0.0 (3.60 MiB)
compressing and pushing /nix/store/6apx83l6ss3hkn0kd4z4rkjbkgs0w4w2-default-Setup-setup (18.07 MiB)
compressing and pushing /nix/store/3pfy3dd8ch77km1wkwd6cdgqn57d4347-haskell-project-exe-hello-0.1.0.0-config (304.02 KiB)
compressing and pushing /nix/store/b2j1nrsjr8cpzmk58d476fc2snz17w75-ghc-8.10.2 (1.71 GiB)
...
All done.

As simple as it might look, the naive approach however has some flaws, especially when dealing with private projects.

Leaking Source Code

The first issue with pushing everything is source code contamination, i.e. the source code of the project leaking to the cache. For instance, suppose we modify the main function to print "Hello, World!" instead of "Hello, Haskell!":

$ sed -i 's/Hello, Haskell!/Hello, World!/g' ./code/05-package-management/haskell-project-v3/haskell/Main.hs
$ cat ./code/05-package-management/haskell-project-v3/haskell/Main.hs
module Main where

main :: IO ()
main = putStrLn "Hello, World!"

If we try to rebuild our Haskell project and push it to Cachix, we can notice that the modified source code is also pushed as well. (Notice the drv=$(nix-instantiate ...) assignment has to be re-run to get the new derivation with the modified source)

$ drv=$(nix-instantiate ./code/05-package-management/haskell-project-v3/nix/07-haskell.nix-materialized)
$ nix-build $drv && nix-store -qR --include-outputs $drv | grep -v .drv | cachix push $CACHIX_STORE
...
/nix/store/hkqkig7y1dx96qbdwkhk0anb0xdmx6hm-haskell-project-exe-hello-0.1.0.0
compressing and pushing /nix/store/hkqkig7y1dx96qbdwkhk0anb0xdmx6hm-haskell-project-exe-hello-0.1.0.0 (3.60 MiB)
compressing and pushing /nix/store/wq6ry5x7b5x3ld0d7wd2wx3vkxp4wi66-haskell-project-src (1.49 KiB)
All done.

We can list the files in /nix/store/6a049f3fv8x2rdxv34k14cxrwi9an43f-haskell-project-src and verify that it indeed contains our modified source code. Yikes!

$ ls -la /nix/store/6a049f3fv8x2rdxv34k14cxrwi9an43f-haskell-project-src
total 180
dr-xr-xr-x 2 user user   4096 Jan  1  1970 .
drwxr-xr-x 1 user user 151552 Jan  7 15:41 ..
-r--r--r-- 1 user user     15 Jan  1  1970 .gitignore
-r--r--r-- 1 user user     65 Jan  1  1970 Main.hs
-r--r--r-- 1 user user     46 Jan  1  1970 Setup.hs
-r--r--r-- 1 user user     12 Jan  1  1970 cabal.project
-r--r--r-- 1 user user    307 Jan  1  1970 haskell-project.cabal

$ cat /nix/store/6a049f3fv8x2rdxv34k14cxrwi9an43f-haskell-project-src/Main.hs
module Main where

main :: IO ()
main = putStrLn "Hello, World!"

Pushing source code to Cachix might not be a big deal for open source projects. However this may be an issue for propritary projects with strict IP policies. This could be partially mitigated by paying for a private Cachix store. But we just have to be aware of it and be careful.

Leaking Secrets

Even for the case of open source projects, indiscriminately pushing everything to Cachix still carries another risk, which is accidentally leaking secrets such as authentication credentials.

Suppose that we have some security credentials stored locally in the secret.key file in the project directory. Since the file is included in .gitignore, it is not pushed to the git repository.

$ echo secret > ./code/05-package-management/haskell-project-v3/haskell/secret.key
$ ls -la ./code/05-package-management/haskell-project-v3/haskell/
total 32
drwxrwxr-x 2 user user 4096 Jan  7 15:58 .
drwxrwxr-x 4 user user 4096 Dec  8 08:23 ..
-rw-rw-r-- 1 user user   26 Jan  7 15:58 .gitignore
-rw-r--r-- 1 user user   67 Jan  7 15:45 Main.hs
-rw-r--r-- 1 user user   46 Dec  7 08:37 Setup.hs
-rw-rw-r-- 1 user user   12 Dec  7 08:37 cabal.project
-rw-rw-r-- 1 user user  307 Jan  7 09:35 haskell-project.cabal
-rw-rw-r-- 1 user user    7 Jan  7 15:58 secret.key

But is secret.key being included when pushing to Cachix? Let's find out:

$ drv=$(nix-instantiate ./code/05-package-management/haskell-project-v3/nix/07-haskell.nix-materialized)
$ nix-build $drv && nix-store -qR --include-outputs $drv | grep -v .drv | cachix push $CACHIX_STORE
...
compressing and pushing /nix/store/nrmyzkww87ndyp44jkn56hrra8m9d9vy-haskell-project-exe-hello-0.1.0.0 (3.60 MiB)
compressing and pushing /nix/store/ryz8an9z9bw7j1357k9b5w99fxvnhb74-haskell-project-src (1.69 KiB)
All done.

$ ls -la /nix/store/ryz8an9z9bw7j1357k9b5w99fxvnhb74-haskell-project-src
total 188
dr-xr-xr-x 2 user user   4096 Jan  1  1970 .
drwxr-xr-x 1 user user 155648 Jan  7 16:00 ..
-r--r--r-- 1 user user     26 Jan  1  1970 .gitignore
-r--r--r-- 1 user user     67 Jan  1  1970 Main.hs
-r--r--r-- 1 user user     46 Jan  1  1970 Setup.hs
-r--r--r-- 1 user user     12 Jan  1  1970 cabal.project
-r--r--r-- 1 user user    307 Jan  1  1970 haskell-project.cabal
-r--r--r-- 1 user user      7 Jan  1  1970 secret.key

$ cat /nix/store/ryz8an9z9bw7j1357k9b5w99fxvnhb74-haskell-project-src/secret.key
secret

That's not good! Our local security credentials have been leaked to Cachix! If we also have a public Cachix store, the credentials can potentially be obtained by anyone!

The real culprit is in how we create our source derivation in project.nix:

src = builtins.path {
  name = "haskell-project-src";
  path = ../../haskell;
  filter = path: type:
    let
      basePath = builtins.baseNameOf path;
    in
    basePath != "dist-newstyle"
  ;
};

Previously, we made a naive attempt of filtering our source directory and excluding only the dist-newstyle directory to avoid rebuilding the Nix build when the directory is modified by local cabal runs. However if we want to push our source code to Cachix, we better be much more careful.

Gitignore.nix

One way we can protect local secrets is by filtering out all gitignored files so that our source code is close to a fresh git checkout when copied into the Nix store. This can be done using Nix helper libraries such as gitignore.nix.

Using gitignore.nix, we can now create a new haskell-project-v4 project with the source filtered with gitignore.nix:

gitignore = (import sources."gitignore.nix" {
  inherit (nixpkgs) lib;
}).gitignoreSource;

src = nixpkgs.lib.cleanSourceWith {
  name = "haskell-project-src";
  src = gitignore ../../haskell;
};

We first add gitignore.nix into sources using niv, and then import it as above. Following that, we use gitignore ../../haskell to filter the gitignored files in the haskell directory. We then use nixpkgs.lib.cleanSourceWith as a hack to give the filtered source a name haskell-project-src, so that we can grep for it during inspection.

Now if we try to build our derivation, we should get the project source with the local secret filtered out:

$ drv=$(nix-instantiate ./code/06-infrastructure/haskell-project-v4/nix/01-gitignore-src)
$ nix-store -qR --include-outputs $drv | grep haskell-project-src
/nix/store/mhlj5xql8g6ib1wna4g9pc6cpraiz1q8-haskell-project-src-root

$ ls -la /nix/store/mhlj5xql8g6ib1wna4g9pc6cpraiz1q8-haskell-project-src-root
total 140
dr-xr-xr-x 2 nix nix   4096 Jan  1  1970 .
drwxr-xr-x 1 nix nix 114688 Jan 11 11:21 ..
-r--r--r-- 1 nix nix     26 Jan  1  1970 .gitignore
-r--r--r-- 1 nix nix     67 Jan  1  1970 Main.hs
-r--r--r-- 1 nix nix     46 Jan  1  1970 Setup.hs
-r--r--r-- 1 nix nix     12 Jan  1  1970 cabal.project
-r--r--r-- 1 nix nix    307 Jan  1  1970 haskell-project.cabal

Caveats

Gitignore.nix can help us filter out files specified in .gitignore. However it might still be possible that developers would add new secrets locally without adding them to .gitignore. In such case, the secret can still potentially leak to Cachix.

The best way to prevent secrets from leaking is to build from a published git or tarball URL. That way it will be less likely for us to accidentally mix up and leak the secrets in our local file systems. This will however require more complex project organization, as we have to place the Nix code separately from the source code.

Otherwise, it is still recommended to avoid pushing source code to Cachix in the first place, both for proprietary and open source projects. After all, users will almost always build a Nix project with their own local source code, or source that are fetched directly from git or remote URLs. There is rarely a need to use Cachix to distribute source code to our users.

Filtering Out Source

One simple way to filter out the source code is to filter out the name of the source derivation using grep before pushing to Cachix:

$ nix-store -qR --include-outputs $drv \
  | grep -v .drv | grep -v haskell-project-src \
  | cachix push $CACHIX_STORE

Note however this may only work if no other paths pushed to Cachix depends on the source code. This is because Cachix automatically pushes the whole closure of a Nix path. For instance this would not work if we try to push the .drv file of the build derivation to Cachix, because that would also capture the source derivation as part of the closure.

This approach also would not work if there are some intermediate derivations that make copy of the original source code and modify them to produce new source derivation. The intermediate derivation may have a different name, or even a generic one, which it would be difficult for us to filter out without inspecting the derivation source.

As a result, it is best to make use of the patchPhase in stdenv.mkDerivation to modify the source code if necessary.

Caching Nix Shell

Another way to exclude source code from derivation is by creating a Nix shell derivation and cache that instead. Haskell.nix provides a shellFor function that creates a Nix shell derivation from the original Haskell.nix project we defined.

{ useMaterialization ? true }:
let
  project = import ./project.nix {
    inherit useMaterialization;
  };
in
project.shellFor {
  withHoogle = false;
}

If we inspect the derivation tree from shell.nix, we can confirm that indeed the source code not present in the list. And so we can safely push only the Haskell.nix dependencies to Cachix.

drv=$(nix-instantiate ./code/06-infrastructure/haskell-project-v4/nix/01-gitignore-src/shell.nix)

$ nix-store -qR --include-outputs $drv | grep haskell-project-src

We first use nix-shell --run true $drv to build only the dependencies of our shell derivation and push them to Cachix.

$ nix-shell --run true $drv && nix-store -qR --include-outputs $drv | grep -v .drv | cachix push $CACHIX_STORE
...
All done.

If we want to cache the final build artifact as well, we can still run nix-build $drv and then push only the build output to Cachix.

$ nix-build ./code/06-infrastructure/haskell-project-v4/nix/01-gitignore-src | cachix push $CACHIX_STORE
...
compressing and pushing /nix/store/9in65nlw9s255x8zh5g7hlvbnl23rqbz-haskell-project-exe-hello-0.1.0.0 (3.60 MiB)
All done.

Double Check Leaking with Code Changes

Our attempt to cache only the Nix shell derivation seems to exclude the source code, but is it really excluded? If we are not careful, we could easily let Nix give a generic name like source to our source derivation. In that case it would not be possible to detect it through grep if our source code has leaked through.

As a result, it is best to double check what is being cached by slightly modifying our source code, and then try pushing to Cachix again.

$ sed -i 's/Hello, Haskell!/Hello, World!/g' ./code/06-infrastructure/haskell-project-v4/haskell/Main.hs
$ cat ./code/06-infrastructure/haskell-project-v4/haskell/Main.hs
module Main where

main :: IO ()
main = putStrLn "Hello, World!"

$ drv=$(nix-instantiate ./code/06-infrastructure/haskell-project-v4/nix/01-gitignore-src/shell.nix)
$ nix-shell --run true $drv && nix-store -qR --include-outputs $drv | grep -v .drv | cachix push $CACHIX_STORE
All done.

$ nix-build ./code/06-infrastructure/haskell-project-v4/nix/01-gitignore-src | cachix push $CACHIX_STORE
these derivations will be built:
  /nix/store/52qqdj4pq564ivyawpvfzsz2s3kv9wmp-haskell-project-exe-hello-0.1.0.0.drv
...
compressing and pushing /nix/store/fdb6b3dj79gqff0lz0xf34lrs4gpb5a0-haskell-project-exe-hello-0.1.0.0 (3.60 MiB)
All done.

As we expect, even though Main.hs has been modified, there is no new source artifact being pushed to Cachix. Only nix-build produced a new binary, which is then pushed to Cachix.

You can apply the same method on your own project to double check if your source code is leaking to Cachix. Even if you do not care about the source code leaking, this can still serve as a good way to check if any secret is leaking.

Caching Multiple Projects

The technique for caching Nix shell can only work if we have projects made of a single Nix derivation. If we instead have a large project with multiple source repositories, it is much harder to filter out the source code if the derivations depend on each others.

In such cases, the simple way is to use grep -v and hope that it can filter out all the source derivations. Otherwise you may need to use project-specific techniques to make sure that only intended Nix artifacts are being cached.

Conclusion

As we seen in this chapter, caching build results is not as straighforward if there are things that we want to prevent from being cached, such as proprietary source code or local secrets. This is probably not a big issue right now, because many people may not even be aware that their source code and secrets are leaking!

Even without considering leaking secrets, there are still too many different ways of caching build results in Nix. While this provides more flexibility for us to control what to cache, the learning curve is way too high for new users who just want to get their Nix builds cached.

Nix and Cachix may need to implement additional features to help make caching easier, and to protect sensitive data. For example, Cachix may add a command line option to exclude paths matching specific pattern to never be pushed.