Pacoloco: A Pacman Cache Proxy Server, useful for slow internet connections and How To Prime Pacoloco with pre-installed packages

Posted on | 855 words | ~5mins
Arch Pacman

What is Pacoloco?

Pacoloco behaves like a regular Pacman mirror except that it downloads a package at first only when it is requested. It acts as a cache between your local Pacman package manager and the remote mirror. When run on a local server, it massively reduces the download time your local Pacman package manager takes to download new, updated packages.

Pacoloco can download new versions of packages automatically every night for example, this is called “prefetching”. It is really nice and reduces upgrade times even on a 50Mbps connection from minutes to just seconds for the downloading part for large upgrade sizes and shares the downloaded packages between machines in your local network. Very nice for rarely updated machines!

Installation

This is fairly straightforward, so I will keep it short: First, install the pacoloco package on a local server, configure it to use some mirrors of your choice and set a time how often it should prefetch packages. You can also configure the directory we you want Pacoloco to place the files it caches into.

The config file is at /etc/pacoloco.yaml

Second, update your Pacman mirrorlist to use the Pacoloco cache, by default the URL for a regular x86 installation looks like this:

http://pacolocohost.example:9129/repo/archlinux/$repo/os/$arch

Priming the Cache

It’s full potential for prefetching achieves Pacoloco only when it knows your currently installed packages, so it’s a good idea to prime it. This is done by getting a list of currently installed packages for each repo (“core”, “extra”) and requesting the corresponding package url just like Pacman would do when you would (re-)install it.

The size of the package cache naturally depends on the number of packages you have installed, but to give you an idea: My cache size with all installed packages from core was just about 3 GB and with all installed ones from extra it is under 7GB, so nothing to worry about. Isn’t zstandard a very nice and efficient compression algorithm? I LOVE it!

You can prime the cache by running this gigantic and relatively inefficient Bash one-liner (set the repo and host variable! and if needed the Pacoloco repo url if your repo isn’t archlinux or you are running on ARM):

export REPO=core; export CACHE_HOST='pacolocohost:9129'; curl -s -q -I HEAD -K <(pacman -Qiq | grep -Eo "Name\s+:.*|Version\s+:.*|Architecture\s+:.*" | awk '{print $3}' | grep . | awk 'ORS=NR%3?FS:RS' |  (grep -of <(pacman -Slq $REPO | sed 's/.*/\^& .*\$/')) | awk -F" " '{print "url=\"http://"ENVIRON["CACHE_HOST"]"/repo/archlinux/"ENVIRON["REPO"]"/os/x86_64/"$1"-"$2"-"$3".pkg.tar.zst\""}')

Depending on your internet connection and the number of installed packages of that repo, this can take some time. You can monitor the progress in Pacoloco’s log.

Here is a breakdown of what this does:

  • pacman -Qiq returns the list of all installed packages, together with some info. We only need the name, version and the package’s architecture grep -Eo "Name\s+:.*|Version\s+:.*|Architecture\s+:.*"
  • We only want the info itself and not its title in each line, so we use awk to keep only the third column using awk '{print $3}'
  • We remove any empty line which would interfere with the next step (there shouldn’t be any, but just make sure): grep .
  • Take three consecutive lines with name, version and architecture and combine them into a single line with awk 'ORS=NR%3?FS:RS'
  • Now, as Pacman doesn’t return any info about which repository (core/extra/…) the packages are from, we need to filter. pacman -Sql core returns all packages from the core repo, also not installed ones. We prepend each lines with a ^ and add .*$ to end so we can use it as a regex (this is not 100% safe as package names are now interpreted as a regex too)
  • Using grep -of we use the aforementioned list of all packages of that repo as a list of patterns we want to match against our list of all installed packages
  • We parse each line with the 3 values of name, version and architecture to create a new url we can later fetch. Here we have to use awk’s ENVIRON["ENV_VARIABLE"] syntax to access to outer environment variable.
  • We now can request the HEAD of all these URLS using curl. With this, Pacoloco will download the package and prefetch it later without us having to deal with the result in our Bash script. We use curl’s parameter -K to supply the list of URLs as a file so that curl can reuse TCP connections for all requests, this is also the reason all urls are prepended with url="..." earlier.

Metrics

Metrics are nice and important, and Pacoloco provides an endpoint for Prometheus at

http://pacolocohost.example:9129/metrics

Most relevant are pacoloco_cache_hits_total and pacoloco_cache_size_bytes.

Using Pacoloco with multiple architectures at the same time

If you have both, regular x86 Arch boxes and some ArchLinuxARM boxes, you can use one single Pacoloco installation for both of them, you just need to add mirrors for each architecture’s repository in your config file.

For ArchLinux on ARM and x86 your config file could look like this:

...
repos:
  archlinux:
    urls: ## add or change official mirror urls as desired, see https://archlinux.org/mirrors/status/
      - http://mirrors.kernel.org/archlinux
  archlinux_armv7h:
    url: http://de4.mirror.archlinuxarm.org
...

with the Pacman mirrorlist on your ARM machine containting this mirror URL:

http://pacolochost.example:9129/repo/archlinux_$arch/$arch/$repo