Hello, I’m Jason. I am a researcher, teacher, programmer, sci-fi nerd, outdoorsy type, and extreme chile pepper enthusiast.

Slow DwarFS on Intel Mac

Sat, Mar 15, 2025

For background, I have been working on a pet project that involves building a single-file archive containing the directory structure used by a BorgBackup repository. The reason isn’t that important here (I’ll post about it later, maybe), but I have been evaluating a couple of things to get an idea of performance tradeoffs:

Build a tar archive with zstd compression of the original files that Borg would be backing up.
Build a tar archive with zstd compression of the Borg repository itself.
Build a SquashFS archive with zstd compression of the original files that Borg would be backing up.
Build a SquashFS archive with zstd compression of the Borg repository itself.

And then I decided to add DwarFS to the list of options, so also these:

Build a DwarFS archive with default compression of the original files that Borg would be backing up.
Build a DwarFS archive with default compression of the Borg repository itself.

The reason that I wanted to look at DwarFS is that it seems to compare favorably to SquashFS and tar+zstd in terms of both space and time (See the comparison.).

But, I found that for some reason on my testing data with my (old) laptop, it is taking an order of magnitude longer to create the DwarFS archive. The plots below show the time to create the archive for each of the three methods on the raw files (~13GiB) and the Borg repo of the same information (~8.3GiB).

The final size is a little better than the others (See Figure 2 below.), but the time it takes to get there is unacceptable. I’m sure this is a “me” problem, but I’m not sure what is causing it. I will have to dig a little deeper on this one.

Figure 2: Resulting archive size (in bytes).

Python uv

Thu, Dec 19, 2024

In my last post (two years ago!!!), I looked at PDM as a package manager because I was feeling a little frustrated with Poetry. TLDR: It didn’t work out, and I stuck with Poetry.

But recently I’ve been using uv, and I’m hooked. It is so fast, and it has a solid solver. I also appreciate the options it has for different ways of managing a project, as well as the handy uvx (or uv tool) command to run “tools” that aren’t part of your project. I think of those as a more temporary version of what I would otherwise install with pipx.

I do have one small complaint about uv though. I want to be able to manage a project without adding a src directory, and without a README.md or hello.py file being forced on me. Mostly, I want to be able to teach this to beginners without having to explain any of these extra things…

For my own use (but not for teaching), I’ve found the following bash function helpful—I just put it into my .profile script:

# Quick `uv init` without src directory, "README.md", or "hello.py".
# Project name is optional (default to directory if not provided).
function uvi() {
    if [ ! -z "$1" ] ; then
        NAME="--name=$1"
    fi
    uv init --app --no-readme --no-workspace ${NAME} && rm hello.py
}
export -f uvi

With this, I can just type uvi and get a new project in my current directory without any of the extras. I just wish there was an easy way to run it in this mode for teaching purposes.

Python PDM First Look

Wed, Mar 23, 2022

I have been using Poetry to manage my Python virtual environments for a while now, and I’ve (mostly) been ~~happy~~ OK with it. But the grass is always greener, as they say, and Poetry doesn’t always leave me feeling all warm and fuzzy.

Here is an example that currently makes me grumble at Poetry:

First, let’s try to set up a brand new project that needs Numpy and Tensorflow. Simple, right?

$ mkdir poetry-example
$ cd poetry-example/
$ poetry init -n

[... command output omitted here ...]

$ time poetry add numpy tensorflow
Creating virtualenv poetry-example in /private/tmp/poetry-example/.venv
Using version ^1.22.3 for numpy
Using version ^2.8.0 for tensorflow

Updating dependencies

Resolving dependencies... (0.0s)
Resolving dependencies... (0.1s)

  SolverProblemError

  The current project's Python requirement (>=3.9,<4.0) is not compatible with some of the required packages Python requirement:
    - tensorflow-io-gcs-filesystem requires Python >=3.7, <3.11, so it will not be satisfied for Python >=3.11,<4.0
    - tensorflow-io-gcs-filesystem requires Python >=3.7, <3.11, so it will not be satisfied for Python >=3.11,<4.0
  
  Because no versions of tensorflow-io-gcs-filesystem match >0.23.1,<0.24.0 || >0.24.0
   and tensorflow-io-gcs-filesystem (0.23.1) requires Python >=3.7, <3.11, tensorflow-io-gcs-filesystem is forbidden.
  And because tensorflow-io-gcs-filesystem (0.24.0) requires Python >=3.7, <3.11, tensorflow-io-gcs-filesystem is forbidden.
  Because no versions of tensorflow match >2.8.0,<3.0.0
   and tensorflow (2.8.0) depends on tensorflow-io-gcs-filesystem (>=0.23.1), tensorflow (>=2.8.0,<3.0.0) requires tensorflow-io-gcs-filesystem (>=0.23.1).
  Thus, tensorflow is forbidden.
  So, because poetry-example depends on tensorflow (^2.8.0), version solving failed.

[... backtrace omitted here ...]

  • Check your dependencies Python requirement: The Python requirement can be specified via the `python` or `markers` properties
    
    For tensorflow-io-gcs-filesystem, a possible solution would be to set the `python` property to ">=3.9,<3.11"
    For tensorflow-io-gcs-filesystem, a possible solution would be to set the `python` property to ">=3.9,<3.11"

    https://python-poetry.org/docs/dependency-specification/#python-restricted-dependencies,
    https://python-poetry.org/docs/dependency-specification/#using-environment-markers

real    0m1.286s
user    0m0.907s
sys 0m0.413s

Well, it was fast, but that’s because it failed. This has been an ongoing problem for some of the projects I need to set up in Poetry (often, related to either Tensorflow or Numpy)… In Poetry’s defense, they do give a very good suggestion to fix the issue - and it will fix the issue in this case. But for beginners, this might be enough frustration to simply give up. I want even a first-time user to be able to get the project started without having to fiddle with the config files for the environment manager.

Today I looked at PDM.

PDM is a modern Python package manager with PEP 582 support. It installs and manages packages in a similar way to npm that doesn’t need to create a virtualenv at all!¹

I read about PEP 582, and found that it seems to be stalled in the “Draft” status, but there is also some grass-roots community support. Essentially, it tries to take package management cues from projects like NPM and bring those into the Python ecosystem at the core level. If it passes, it might finally resolve the “environments in Python are the worse” situation. I hope it manages to pass…

Anyway, here is the same example project, but using PDM to set it up:

$ pdm init -n
Creating a pyproject.toml for PDM...
Using the last selection, add '-i' to ignore it.
Using Python interpreter: ~/.pyenv/versions/3.9.9/bin/python3.9 (3.
9)
Changes are written to pyproject.toml.

$ time pdm add numpy tensorflow
Adding packages to default dependencies: numpy, tensorflow
✔ 🔒 Lock successful
Changes are written to pdm.lock.
Changes are written to pyproject.toml.
Synchronizing working set with lock file: 41 to add, 0 to update, 0 to remove

[... omitting several "Install <packagename> successful" lines here ...]

🎉 All complete!

real    2m32.090s
user    0m33.502s
sys     0m5.583s

Well, look at that! It worked. It took a while, but it figured out the dependencies.

I think I’ll try PDM for a few more things. Not only does the solver seem somewhat better (see below), but I like the idea of not needing a virtual environment (in favor of the PEP 582 way of packaging)… We’ll see how it goes after some more projects though.

Nothing is ever that easy…

I tried the same test with a few more packages. Instead of only Tensorflow and Numpy, I tried to add Tensorflow, Numpy, Pandas, matplotlib, and ipykernel to a brand new project (in a single add) with both package managers.

Poetry said:

...

  SolverProblemError

...

Exactly like before. Not surprising.

PDM said:

  [... many "successful" messages and some "failed" ones omitted here ...]
  ✔ Install zipp 3.7.0 successful
  Retry failed jobs
  ⠸ Installing h5py 2.10.0...
  ⠸ Installing tensorflow 2.2.0...
  ✖ Install scipy 1.4.1 failed
Could not find a version that satisfies the requirement tensorflow==2.2.0 (from
  ✖ Install h5py 2.10.0 failed
  ✖ Install tensorflow 2.2.0 failed
  ✖ Install scipy 1.4.1 failed

ERRORS:
[... sad and unhelpful text below omitted ...]

real    14m41.722s
user    6m2.432s
sys     0m54.376s

Well, darn. It started out so well! In this case even PDM got into trouble (I think it was the Pandas in combo with the others that did it somehow). The error messages below the ERRORS: line were also super unhelpful if you are a Python beginner (they are just a crash backtrace through the library’s codebase). And, it took a long time to do it (or… not do it). Boo, PDM. Do better.

So maybe the long Python environment manager nightmare isn’t quite over yet. But I have to say: I like some of the things PDM can do – I’m going to keep my eye on this one and hope it can become a strong recommendation soon.

Intro paragraph at https://pdm.fming.dev/. ↩︎

Jason L Causey

Slow DwarFS on Intel Mac

Figure 1: Creation time in seconds.

Figure 2: Resulting archive size (in bytes).

Python uv

Python PDM First Look

Today I looked at PDM.