On Rust, Memory Safety, and Open Source Infrastructure
By Tara Tarakiyee
In STF
Addressing memory safety in critical infrastructure is a complex issue with multiple approaches. The Sovereign Tech Fund supports several initiatives, and technologist Tara Tarakiyee reflects on the long road ahead.
Early on July 19th, 2001, a computer worm (a type of malware) started to infect computers running Microsoft’s IIS web server. Within 14 hours, it had infected more than 359,000 computers around the globe. This animation below produced by the Center for Applied Internet Data Analysis (CAIDA) shows the spread and reach of the worm, which ended up being called Code Red. Ryan Permeh and Marc Maiffret of eEye Digital Security gave it that name because, according to them, Mountain Dew Code Red was their caffeinated beverage of choice and “the only thing that kept [them] awake while [they] disassembled this exploit.”
View an animation of the spread of CodeRed
While Code Red didn’t manage to cause a lot of damage in the long term (not accounting for side-effects of caffeine overconsumption), it was the first example of a large-scale attack on internet infrastructure, and foreshadowed many large-scale vulnerabilities that followed. Code Red was also an early-warning sign and emphasized the need to secure the software we all rely upon. Code Red exploited a software vulnerability in IIS called a buffer overflow, a type of memory safety bug.
🐛 What is a memory safety bug?
Most programs need to write and retrieve data from a computer’s memory to be able to function properly. That requires that the program keeps track of where in the memory it can write to and where it has already stored data, so it can retrieve it later.
If something goes wrong, for example when the program tries to write data where it’s not supposed to, it can lead to unintentional data leaking, data loss, or crashes. Malicious actors can also exploit these vulnerabilities to force any of the above problems, or take full control of the program, and in the worst cases, take control of the computer it is running on.
Code Red was not the first example of a memory safety bug. In fact, memory safety bugs are not rare at all, many software projects and companies estimate that 65-70% of all bugs found in their codebases are memory safety related.
As it’s the Sovereign Tech Fund’s (STF) mission to invest in the security of critical open source infrastructure, it shouldn’t come as a surprise that we are supporting several initiatives that tackle the topic of memory safety, given how prevalent these vulnerabilities are. In this blog post, we touch upon how complex the issue of memory safety is, what approaches exist and how our investments contribute towards them, and some of the limitations we face.
Rust: The New(est) Infrastructure Language
The best way to stop memory safety vulnerabilities from spreading is to make sure they never exist in the first place. That’s been possible for a while, due to the development of newer programming languages such as Rust, Go, C#, Java, Swift, Python, and JavaScript. These languages can help prevent programmers from introducing those vulnerabilities, or manage memory in a way that makes such vulnerabilities not likely. We won’t go into the general benefits of memory safe languages, as there is a lot of literature out there on the topic, but we will speak to Rust specifically and how it applies to critical infrastructure, which is central to STF’s mission.
Older programming languages such as C, C++, and assembly don’t guarantee memory safety, which is unfortunate because much of the critical infrastructure we all rely upon is written in those languages. It’s worth noting that while C++ does include features that make it incrementally safer than C, it still doesn’t compare to the safety provided by Rust. This is because most memory safe languages often come with overhead that makes them perform less efficiently than C and C++. It can be a fair trade-off in some applications, due to those languages’ additional features (including memory safety), but in the case of infrastructure and other areas where performance is paramount, it can be undesirable.
This is where Rust comes in. Rust combines the memory safety of these newer languages with comparable performance to code written in C or C++. It also is more resilient to other types of vulnerabilities, such as integer overflow, validation vulnerabilities, and data races. This helped make Rust one of the most loved languages of 2023, and over at STF we’ve invested into several initiatives that involve developing and implementing critical infrastructure software in Rust. Some might ask, why even still support critical infrastructure written in manually-managed memory (read: not memory safe) languages?
To say that Rust “protects from memory safety vulnerabilities” may be an oversimplification. Rust enforces programming practices that help ensure that memory safety vulnerabilities are not introduced into the source code. This strict enforcement means, for example, that if you attempt to compile code that could be unsafe, the Rust compiler would stop you. The implication of that is that Rust comes with a steeper learning curve even for the most experienced programmers. It also means that rewriting code from C into Rust is far from a trivial exercise and might require a significant overhaul of the logic or architecture of the software.
There are also limitations to static analysis of code, because the code analysis tool lacks context the programmer might have. For example, there are valid use cases for direct interaction with computer memory in a way that otherwise would be potentially unsafe. Rust accounts for this by including an “unsafe” declaration, where a programmer can indicate to the Rust compiler that they are knowingly including manually-managed memory instructions that it would normally reject, and that the programmer has their reasons to do so. The advantage of this approach is that the “unsafe” declaration flags the code and invites additional scrutiny to avoid potential issues in the future.
For those aforementioned reasons and due to the fact that there is so much legacy code written in manually-managed memory languages, C and C++ won’t be going away anytime soon, and certainly not without a significant investment in people and code. Since we don’t want vulnerabilities in our existing critical infrastructure, this means we have to employ additional strategies to minimize the risk of critical infrastructure that is written in manually-managed memory languages, including developer tools such as fuzzers and sanitizers that can find vulnerabilities introduced during development, and invest into general strategies to tackle vulnerabilities in software development.
One oft-cited criticism of Rust, being a relatively new language, is that the tooling and the package ecosystem can be relatively not as mature and developed as other programming languages. We believe that this particular issue of ecosystem maturity is becoming less relevant over time, and is indicative of how far Rust has come. On the long run, Rust’s more consolidated package repository system provides massive advantages over the more fragmented C/C++ ecosystem.
What STF is Investing In (so far)
We are thankful to be working with extremely brilliant and impactful projects in this space to help introduce more memory safety, and we thought it would be a good idea to give you a brief introduction to all of them. There’s only so much we can say in a sentence or two, so we encourage you to look further into those projects if they pique your interest. We’ve also included insights from some of them where relevant in this blog post, for which we’re very thankful.
DNS
The Domain Name System (DNS) is the phone book of the internet. Wherever you or a program you’re using calls on a domain name, DNS is the standard by which that human-readable name gets translated to a machine-readable IP address. This means you can visit your favorite website without having to memorize 12 numbers (or even 32 hexadecimal numbers) for each one. It’s an important internet standard, and a critical part of our infrastructure.
STF is investing in the “domain” Rust library by NLNet Labs, which aims to create a comprehensive set of building blocks for interacting with DNS in applications. Since many applications these days have to interact with the internet, and therefore DNS is some way, ensuring these interfaces are modern and safe is very important.
Another STF investment is in the Hickory DNS project. Hickory DNS is a Rust implementation of the DNS standard that consists of a client, server, and resolver. A DNS resolver is the software that handles the translation of a domain name into an IP address. Hickory DNS also supports the latest DNS security standards.
Encryption
Another important part of internet infrastructure is Transport Layer Security (TLS), the internet standard by which a majority of online communication is encrypted. It protects our privacy and data security on everything from email and instant messaging to conferencing and internet telephony, and most notably on the web. STF is investing in rustls, a modern memory safe Rust implementation of the TLS protocol. Rustls is a testament to the promise of Rust, and is already on track to outperform the most prevalent TLS implementation, OpenSSL, all while ensuring memory safety.
OpenPGP is another important encryption standard. While not as widespread or prevalent as TLS, it serves a different purpose, and is particularly significant for verifying the authenticity, provenance and integrity of data exchanged online, such as software or emails. To drive the development of the OpenPGP standard forward, it’s essential to have multiple interoperable implementations serving a variety of use cases. This is why STF has invested in three such implementations of OpenPGP in memory-safe languages, Sequoia PGP, gopenpgp and OpenPGP.js.
Each one of these implementations is being deployed and used in critical infrastructure, and serves distinct communities with encryption and verification technology. As a group, because they all implement and contribute to the OpenPGP standard, they can share implementation details and data, and collectively solve interoperability issues that make the standard better and more secure.
Multimedia Encoding/Decoding
Multimedia encoding and decoding technology relies heavily on memory operations and is highly susceptible to memory safety errors. Video and audio has become very prevalent online in the last two decades, and that prevalence means that multimedia encoders and decoders are used everywhere, making them critical infrastructure. STF has invested in rav1d, an open source AV1 decoder library written in Rust and focuses on efficiency and memory safety. AV1 is a video coding format that is highly desirable due to its high efficiency and compression while maintaining video quality. We’ve also invested in the widely used GStreamer multimedia framework, supporting their effort to transition critical components of their library from C to Rust, namely, the Real-Time Transport Protocol, which forms the basis of their streaming and video conferencing protocols.
NTP
One Internet protocol that is vital to the functioning of our digital infrastructure but gets little mention is the Network Time Protocol. As the name implies, the NTP protocol helps provide accurate and reliable time to all sorts of devices over the network. We’re supporting Pendulum, a project to implement both NTP and the Precision Time Protocol in Rust.
Providing memory safety out of the box is a game changer for our critical systems. People tend to take all the time and energy we spend on vulnerabilities, remediations, mitigations, and breaches as a given, as something we just have to deal with. It's not, and we don't. We have the chance to eliminate an entire class of bugs and, with that, a massive amount of vulnerabilities.
Developer Tooling
Linux is one of the most popular operating systems in the world, and Debian is one of the most popular distributions of Linux. One particular library that many Linux distributions like Debian rely upon is the GNU coreutils library. It provides lots of the basic functionality that the operating system can’t function without. We’re working with Debian developers to rewrite some of these basic and critical programs in Rust.
Speaking of developer tooling, we’re working with Trail of Bits to do a substantial contribution to the Python Package Index. A big part of that work is focused on the PyCA Cryptography package, which is a widely-used software library, and much of its internal development actually is in Rust. The goal of the effort in general is to enable more people to code safely in memory safe languages. As part of this collaboration, Trail of Bits will develop a Sigstore unified packaging client policy across the Python, Rust, and Ruby ecosystems.
Memory unsafety is a leading cause of exploitable vulnerabilities in open source codebases, as well as a major source of technical debt and maintenance risk for the open source ecosystem as a whole. Trail of Bits is proud to partner with the Sovereign Tech Fund in its mission to strengthen and secure the pillars of the open source ecosystem.
Notable Mentions: While not directly involving porting manually-managed memory implementations into memory safe languages, we do want to mention some other efforts that we support that indirectly employ the “other” strategies we mentioned in the last section. For example, we’re contributing to Rust developer tooling by supporting the Rusty SBOM project in expanding Software Bill of Materials support in the Rust ecosystem. Other memory safe languages for which we support developer tooling includeRuby and Javascript, making it easier for developers to use these memory safe languages. With the Yoctoproject, we’re supporting improving the build tooling, fuzzing, and development practices needed to minimize the risk of using memory unsafe languages.
Conclusion
It should be clear by now that memory safety is neither a silver bullet nor is it a storm in a teacup. Perhaps one good comparison is the eradication of smallpox. What was once a very common and deadly disease, now no longer exists. It wasn’t easy, and it took a concentrated, long, and coordinated effort by many people. And, sure, other diseases still exist, but everyone would argue that the effort was worth it, simply considering all the lives that were saved by preventing future outbreaks.
With memory safety, we can see a path to eliminating an entire class of vulnerabilities like Code Red. This will help secure our critical infrastructure, and minimize the risk of large-scale vulnerabilities occurring in the future that could potentially cause untold damage to our digital infrastructure, and therefore our economy, institutions, and society, as more of that happens online. It will take a lot of people to do their part if we are to get there, so we would like to take a minute to acknowledge some of the ongoing efforts that we’ve come across in this regard.
- The USA’s Cybersecurity and Infrastructure Security Agency has identified memory safety as one of the pillars of their “secure by default” strategy and issued guidance on memory safe roadmaps.
- We also would like to acknowledge the Internet Security Research Group’s early advocacy and action on the issue through their Prossimo project.
- Consumer Reports also released a comprehensive report on memory safety with concrete recommendations to industry and government.
- The Linux Foundation’s OpenSSF has created an industry special interest group to encourage memory safety.
- Last but not least, we’d like to thank Alex Gaynor for his informative blog post on the topic.
This list is obviously non-exhaustive, and we’d like to end on a note of thanking the Rust community for their hard work the past couple of years, and the reputation they have garnered for being beginner-friendly and welcoming. We’ve seen firsthand many examples of that during the past year of scouting technologies as well as doing research for this blog post, and would like to commend this trend in the community.