One of the comments to the The modern packager’s security nightmare post posed a very important question: why is it bad to depend on the app developer to address security issues?
In fact, I believe it is important enough to justify a whole post discussing the problem. To clarify, the wider context is bundling dependencies, i.e. relying on the application developer to ensure that all the dependencies included with the application to be free of vulnerabilities.
In my opinion, the root of security in open source software is widely understood auditing. Since the code is public, everyone can read it, analyze it, test it. However, with a typical system install including thousands of packages from hundreds of different upstreams, it is really impossible even for large companies (not to mention individuals) to be able to audit all that code. Instead, we assume that with large enough number of eyes looking at the code, all vulnerabilities will eventually be found and published.
On top of auditing we add trust. Today, CVE authorities are at the root of our vulnerability trust. We trust them to reliably publish reports of vulnerabilities found in various packages. However, once again we can’t expect users to manually make sure that the huge number of the packages they are running are free of vulnerabilities. Instead, the trust is hierarchically moved down to software authors and distributions.
Both software authors and distribution packagers share a common goal — ensuring that their end users are running working, secure software. Why do I believe then that the user’s trust is better placed in distribution packagers than in software authors? I am going to explain this in three points.
How many entities do you trust?
The following graph depicts a fragment of dependency chain between a few C libraries and end-user programs. The right-hand side depicts leaf packages, those intended for the user. On the left side, you can see some of the libraries they use, or that their dependent libraries use. All of this stems from OpenSSL.
Now imagine that a vulnerability is found in OpenSSL. Let’s discuss what would happen in two worlds: one where the distribution is responsible for ensuring the security of all the packages, and another one where upstreams bundle dependencies and are expected to handle it.
In the distribution world, the distribution maintainers are expected to ensure that every node of that dependency graph is secure. If a vulnerability is found in OpenSSL, it is their responsibility to realize that and update the vulnerable package. The end users effectively have to trust distribution maintainers to keep their systems secure.
In the bundled dependency world, the maintainer of every successive node needs to ensure the security of its dependencies. First, the maintainers of cURL, Transmission, Tor, Synergy and Qt5 need to realize that OpenSSL has a new vulnerability, update their bundled versions and make new releases of their software. Then, the maintainers of poppler, cmake, qemu and Transmission (again) need to update their bundled versions of cURL to transitively avoid the vulnerable OpenSSL version. Same goes for the maintainers of Synergy (again) and KeepAssXC that have to update their bundled Qt5. Finally, the maintainers of Inkscape and XeTeX need to update their bundled poppler.
Even if we disregard the amount of work involved and the resulting slowdown in deploying the fix, the end user needs to trust 11 different entities to fix their respective software packages not to ship (transitively) a vulnerable version of OpenSSL. And the most absurd thing is, they will nevertheless need to trust their distribution vendor to actually ship all these updated packages.
I think that we can mostly agree that trusting a single entity provides much smaller attack surface than trusting tens or hundreds of different entities.
The above example has been placed in the C world of relatively few relatively large libraries. The standard C library is pretty fat, the standard C++ library is much fatter. A typical examples of the C approach are libraries like libgcrypt — a single library providing a lot of cryptographic primitives such as ciphers along with modes, hashes, MACs, RNGs and related functions — or OpenSSL — providing both crypto primitives and a TLS implementation on top of it. A common Rust/Cargo approach (though things are not always done this way) is to have a separate crate for every algorithm, in every variant. I’m not saying this is wrong — in fact, I actually like the idea of small libraries that do one thing well. However, it multiplies the aforementioned problem thousandfold.
The bus factor
The second part of the problem is the bus factor.
A typical ‘original’ Linux distribution involves a few hundred developers working together on software. There are much smaller distributions (or forks) but they are generally building on something bigger. These distributions generally have dedicated teams in place to handle security, as well as mechanisms to help them with their work.
For example, in Gentoo security issues can be usually tackled from two ends. On one end, there’s the package maintainer who takes care of daily maintenance tasks and usually notices vulnerability fixes through new releases. On the other end, there’s a dedicated Security team whose members are responsible for monitoring new CVEs. Most of the time, these people work together to resolve security issues, with the maintainer having specific knowledge of the package in question and the Security team sharing general security experience and monitoring the process.
Should any maintainer be away or otherwise unable to fix the vulnerability quickly, the Security team can chime in and take care of whatever needs to be done. These people are specifically focused on that one job, and this means that the chances of things going sideways are comparatively small. Even if the whole distribution were to suddenly disappear, the users have a good chance of noticing that.
Besides a few very large software packages, most of the software projects are small. It is not uncommon for a software package to be maintained by a single person. Now, how many dependencies can a single person or a small team effectively maintain? Even with the best interest at heart, it is unlikely that a software developer whose primary goal is to work on code of the particular project can be reasonably expected to share the same level of dedication and experience regarding the project’s dependencies as dedicated distribution maintainers who are doing full-time maintenance work on them.
Even if we could reasonably assume that we can trust all upstreams to do their best to ensure that their dependencies are not vulnerable, it is inevitable that some of them will not be able to handle this timely. In fact, some projects suddenly become abandoned and then vulnerabilities are not handled at all. Now, the problem is not only that it might happen. The problem is how to detect the problem early, and how to deal with it. Can you be reasonable expected to monitor hundreds of upstreams for activity? Again, the responsibility falls on distribution developers who would have to resolve these issues independently.
How much testing can you perform?
The third point is somewhat less focused on security, and more on bugs in general. Bundling dependencies not only defers handling security in packages to the application developers but also all other upgrades. The comment author argues: I want to ship software that I know works, and I can only do that if I know what my dependencies actually are so I can test against them
. That’s a valid point but there’s a but: how many real life scenarios can you actually test?
Let’s start with the most basic stuff. Your CI most likely runs on a 64-bit x86 system. Some projects test more but it’s still a very limited hardware set to test. If one of your dependencies is broken on non-x86 architecture, your testing is unlikely to catch that. Even if the authors of that dependency are informed about the problem and release a fixed version, you won’t know that the upgrade is necessary unless someone reports the problem to you (and you may actually have a problem attributing the issue to the particular dependency).
In reality, things aren’t always this good. Not all upstreams release fixes quickly. Distribution packagers sometimes have to backport or even apply custom patches to make things work. If packages bundle dependencies, it is not sufficient to apply the fix at the root — it needs to be applied to all packages bundling the dependency. In the end, it is even possible that different upstreams will start applying different patches to the same dependencies to resolve the same problem independently reported to all of them. This means more maintenance work for you, and a maintenance nightmare for distributions.
There are also other kinds of issues that CIs often don’t catch. ‘Unexpected’ system setup, different locale, additional packages installed (Python is sometimes quite fragile to that). Your testing can’t really predict all possible scenarios and protect against them. Pinning to a dependency that you know to be good for you does not actually guarantee that it will be good for all your end users. By blocking upgrades, you may actually expose them to bugs that were already known and fixed.
Summary
Bundling dependencies is bad. You can reasonably assume that you’re going to do a good job at maintaining your package and keeping its bundled dependencies secure. However, you can’t guarantee that the maintainers of other packages involved will do the same. And you can’t reasonably expect the end user to place trust in the security of his system to hundreds of different people. The stakes are high and the risk is huge.
The number of entities involved is just too great. You can’t expect anyone to reasonably monitor them, and with many projects having no more than a single developer, you can’t guarantee that the fixes will be handled promptly and that the package in question will not be abandoned. A single package in middle of a dependency chain could effectively render all its reverse dependencies vulnerable, and multiply the work of their maintainers who have to locally fix the problem.
In the end, the vast majority of Linux users need to trust their distribution to ensure that the packages shipped to them are not vulnerable. While you might think that letting you handle security makes things easier for us, it doesn’t. We still need to monitor all the packages and their dependencies. The main difference is that we can fix it in one place, while upstreams have to fix it everywhere. And while they do, we need to ensure that all of them actually did that, and often it is hard to even find all the bundled dependencies (including inline code copied from other projects) as there are no widely followed standards for doing this.
So even if we ignored all the other technical downsides to bundled dependencies, the sum total of work needed to keep packages bundling it secure is much greater than the cost of unbundling the dependencies.
You should ask yourself the following question: do you really want to be responsible for all these dependencies? As a software developer, you want to focus on writing code. You want to make sure that your application works. There are other people who are ready and willing to take care of the ecosystem your software is going to run in. Fit your program into the environment, instead of building an entirely new world for it. When you do that, you’ve effectively replicating a subset of a distribution. Expecting every single application developer to do the necessary work (and to have the necessary knowledge) does not scale.