For background, I have been working on a pet project that involves building a single-file archive containing the directory structure used by a BorgBackup repository. The reason isn’t that important here (I’ll post about it later, maybe), but I have been evaluating a couple of things to get an idea of performance tradeoffs:
Build a tar archive with zstd compression of the original files that Borg would be backing up.
Build a tar archive with zstd compression of the Borg repository itself.
Build a SquashFS archive with zstd compression of the original files that Borg would be backing up.
Build a SquashFS archive with zstd compression of the Borg repository itself.
And then I decided to add DwarFS to the list of options, so also these:
Build a DwarFS archive with default compression of the original files that Borg would be backing up.
Build a DwarFS archive with default compression of the Borg repository itself.
The reason that I wanted to look at DwarFS is that it seems to compare favorably to SquashFS and tar+zstd in terms of both space and time (See the comparison.).
But, I found that for some reason on my testing data with my (old) laptop, it is taking an order of magnitude longer to create the DwarFS archive. The plots below show the time to create the archive for each of the three methods on the raw files (~13GiB) and the Borg repo of the same information (~8.3GiB).
Figure 1: Creation time in seconds.
The final size is a little better than the others (See Figure 2 below.), but the time it takes to get there is unacceptable. I’m sure this is a “me” problem, but I’m not sure what is causing it. I will have to dig a little deeper on this one.
In my last post (two years ago!!!), I looked at PDM as a package manager because I was feeling a little frustrated with Poetry. TLDR: It didn’t work out, and I stuck with Poetry.
But recently I’ve been using uv, and I’m hooked. It is so fast, and it has a solid solver. I also appreciate the options it has for different ways of managing a project, as well as the handy uvx (or uv tool) command to run “tools” that aren’t part of your project. I think of those as a more temporary version of what I would otherwise install with pipx.
I do have one small complaint about uv though. I want to be able to manage a project without adding a src directory, and without a README.md or hello.py file being forced on me. Mostly, I want to be able to teach this to beginners without having to explain any of these extra things…
For my own use (but not for teaching), I’ve found the following bash function helpful—I just put it into my .profile script:
# Quick `uv init` without src directory, "README.md", or "hello.py".# Project name is optional (default to directory if not provided).function uvi() {
if [ ! -z "$1" ] ; thenNAME="--name=$1"fi uv init --app --no-readme --no-workspace ${NAME} && rm hello.py
}
export -f uvi
With this, I can just type uvi and get a new project in my current directory without any of the extras. I just wish there was an easy way to run it in this mode for teaching purposes.
I have been using Poetry to manage my Python virtual environments for a
while now, and I’ve (mostly) been happy OK with it. But the grass is
always greener, as they say, and Poetry doesn’t always leave me feeling
all warm and fuzzy.
Here is an example that currently makes me grumble at Poetry:
First, let’s try to set up a brand new project that needs Numpy and
Tensorflow. Simple, right?
$ mkdir poetry-example
$ cd poetry-example/
$ poetry init -n
[... command output omitted here ...]
$ time poetry add numpy tensorflow
Creating virtualenv poetry-example in /private/tmp/poetry-example/.venv
Using version ^1.22.3 for numpy
Using version ^2.8.0 for tensorflow
Updating dependencies
Resolving dependencies... (0.0s)
Resolving dependencies... (0.1s)
SolverProblemError
The current project's Python requirement (>=3.9,<4.0) is not compatible with some of the required packages Python requirement:
- tensorflow-io-gcs-filesystem requires Python >=3.7, <3.11, so it will not be satisfied for Python >=3.11,<4.0
- tensorflow-io-gcs-filesystem requires Python >=3.7, <3.11, so it will not be satisfied for Python >=3.11,<4.0
Because no versions of tensorflow-io-gcs-filesystem match >0.23.1,<0.24.0 || >0.24.0
and tensorflow-io-gcs-filesystem (0.23.1) requires Python >=3.7, <3.11, tensorflow-io-gcs-filesystem is forbidden.
And because tensorflow-io-gcs-filesystem (0.24.0) requires Python >=3.7, <3.11, tensorflow-io-gcs-filesystem is forbidden.
Because no versions of tensorflow match >2.8.0,<3.0.0
and tensorflow (2.8.0) depends on tensorflow-io-gcs-filesystem (>=0.23.1), tensorflow (>=2.8.0,<3.0.0) requires tensorflow-io-gcs-filesystem (>=0.23.1).
Thus, tensorflow is forbidden.
So, because poetry-example depends on tensorflow (^2.8.0), version solving failed.
[... backtrace omitted here ...]
• Check your dependencies Python requirement: The Python requirement can be specified via the `python` or `markers` properties
For tensorflow-io-gcs-filesystem, a possible solution would be to set the `python` property to ">=3.9,<3.11"
For tensorflow-io-gcs-filesystem, a possible solution would be to set the `python` property to ">=3.9,<3.11"
https://python-poetry.org/docs/dependency-specification/#python-restricted-dependencies,
https://python-poetry.org/docs/dependency-specification/#using-environment-markers
real 0m1.286s
user 0m0.907s
sys 0m0.413s
Well, it was fast, but that’s because it failed. This has been
an ongoing problem for some of the projects I need to set up in Poetry
(often, related to either Tensorflow or Numpy)… In Poetry’s defense,
they do give a very good suggestion to fix the issue - and it will fix
the issue in this case. But for beginners, this might be enough
frustration to simply give up. I want even a first-time user to be able
to get the project started without having to fiddle with the config
files for the environment manager.
Today I looked at PDM.
PDM is a modern Python package manager
with PEP 582 support. It installs and manages packages in a similar
way to npm that doesn’t need to create a virtualenv at all!1
I read about PEP 582, and found that it seems to be stalled in the
“Draft” status, but there is also some grass-roots community support.
Essentially, it tries to take package management cues from projects like
NPM and bring those into the Python ecosystem at the core level. If it
passes, it might finally resolve the “environments in Python are the
worse” situation. I hope it manages to pass…
Anyway, here is the same example project, but using PDM to set it up:
$ pdm init -n
Creating a pyproject.toml for PDM...
Using the last selection, add '-i' to ignore it.
Using Python interpreter: ~/.pyenv/versions/3.9.9/bin/python3.9 (3.
9)
Changes are written to pyproject.toml.
$ time pdm add numpy tensorflow
Adding packages to default dependencies: numpy, tensorflow
✔ 🔒 Lock successful
Changes are written to pdm.lock.
Changes are written to pyproject.toml.
Synchronizing working set with lock file: 41 to add, 0 to update, 0 to remove
[... omitting several "Install <packagename> successful" lines here ...]
🎉 All complete!
real 2m32.090s
user 0m33.502s
sys 0m5.583s
Well, look at that! It worked. It took a while, but it figured out the
dependencies.
I think I’ll try PDM for a few more things. Not only does the solver
seem somewhat better (see below), but I like the idea of not needing a
virtual environment (in favor of the PEP 582 way of packaging)… We’ll
see how it goes after some more projects though.
Nothing is ever that easy…
I tried the same test with a few more packages. Instead of only
Tensorflow and Numpy, I tried to add Tensorflow, Numpy, Pandas,
matplotlib, and ipykernel to a brand new project (in a single add)
with both package managers.
Poetry said:
...
SolverProblemError
...
Exactly like before. Not surprising.
PDM said:
[... many "successful" messages and some "failed" ones omitted here ...]
✔ Install zipp 3.7.0 successful
Retry failed jobs
⠸ Installing h5py 2.10.0...
⠸ Installing tensorflow 2.2.0...
✖ Install scipy 1.4.1 failed
Could not find a version that satisfies the requirement tensorflow==2.2.0 (from
✖ Install h5py 2.10.0 failed
✖ Install tensorflow 2.2.0 failed
✖ Install scipy 1.4.1 failed
ERRORS:
[... sad and unhelpful text below omitted ...]
real 14m41.722s
user 6m2.432s
sys 0m54.376s
Well, darn. It started out so well! In this case even PDM got into
trouble (I think it was the Pandas in combo with the others that did it
somehow). The error messages below the ERRORS: line were also super
unhelpful if you are a Python beginner (they are just a crash backtrace
through the library’s codebase). And, it took a long time to do it
(or… not do it). Boo, PDM. Do better.
So maybe the long Python environment manager nightmare isn’t quite over
yet. But I have to say: I like some of the things PDM can do – I’m
going to keep my eye on this one and hope it can become a strong
recommendation soon.