How most Rust projects are organized

2022-01-072 minute read

As I'm learning Rust, I regularly ask myself how a more seasoned Rust developer would solve problem X. It's not a matter of finding a solution, but choosing the most popular one, so that my code becomes more recognisable for other Rust developers.

One such question is:

How should I organise the files and folders in my project?

In order to find out how most people structure their Rust project, I decided to look at the masses. I wrote a simple python script collecting ~1000 Rust repos from GitHub, and counted files by name looking for patterns. If you want more info on how exactly I did it, jump to the bottom of this article.

The numbers

I ran the script looking for common files and subfolders inside /src.

It's worth noting that 335 repos were skipped because they did not have a /srcfolder. These probably have a subfolder with a src inside somewhere.

Percentages below 2.0% are filtered out.

$ python3 analyze_repo_structure.py --subfolder /src --percentage-treshold 2
335 repos were skipped because they did not have a `src` folder.

Most common files and folders inside src in a selection of 665 projects written in `rust`:
(67%) 📄 lib.rs
(41%) 📄 main.rs
(19%) 📄 error.rs
(15%) 📄 config.rs
(10%) 📁 bin
(9%)  📄 macros.rs
(9%)  📄 util.rs
(8%)  📄 utils.rs
(5%)  📄 cli.rs
(4%)  📄 app.rs
(4%)  📄 errors.rs
(4%)  📄 prelude.rs
(4%)  📁 utils
(4%)  📄 input.rs
(3%)  📄 types.rs
(3%)  📁 util
(3%)  📄 client.rs
(3%)  📁 config
(3%)  📁 ui
(3%)  📄 context.rs
(3%)  📄 parser.rs
(3%)  📄 server.rs
(2%)  📄 args.rs
(2%)  📄 options.rs
(2%)  📄 event.rs
(2%)  📁 core
(2%)  📁 tests
(2%)  📄 buffer.rs
(2%)  📄 logger.rs
(2%)  📄 style.rs
(2%)  📄 version.rs
(2%)  📁 test
(2%)  📄 result.rs
(2%)  📄 color.rs
(2%)  📄 node.rs
(2%)  📄 request.rs
(2%)  📄 response.rs

Well that's a long list of generic, but somehow recognizable files and folders. The top two is somewhat expected. However, error.rs is new to me. Together with errors.rs, it covers 23% of all rust repos I went through. This is interesting. It seems like a lot of projects put their custom Result type in this file. Actually, std::io and the serde_json crate is also doing this. I think I'll begin following this pattern. Especially for library crates.

It's nice to see config.rs made it on the list. I'm often creating this file myself for defining my program configuration struct.

The bin folder is for repos that have more binaries than the default main.rs FYI.

Following this, we have a list of kinda generic words that I can imagine people use for different purposes. I wonder what version.rs is about 🤔

But what about the root folder?

There might be interesting stuff outside src/ indeed!

We'll probably see a lot of common files like README.md and LICENCE. Anyway, let's have a look.

$ python3 analyze_repo_structure.py --percentage-treshold 2
Most common files and folders in a selection of 1000 projects written in `rust`:
(98%) 📄 .gitignore
(96%) 📄 README.md
(93%) 📄 Cargo.toml
(80%) 📁 .github
(66%) 📁 src
(56%) 📄 LICENSE
(49%) 📄 Cargo.lock
(41%) 📄 CHANGELOG.md
(39%) 📁 tests
(35%) 📁 examples
(33%) 📄 CONTRIBUTING.md
(27%) 📄 rustfmt.toml
(26%) 📄 LICENSE-MIT
(24%) 📄 LICENSE-APACHE
(23%) 📁 docs
(20%) 📄 build.rs
(18%) 📄 .travis.yml
(18%) 📄 CODE_OF_CONDUCT.md
(16%) 📄 .gitattributes
(16%) 📁 .cargo
(15%) 📄 Makefile
(14%) 📁 benches
(14%) 📁 scripts
(14%) 📄 .editorconfig
(12%) 📄 .dockerignore
(11%) 📄 .gitmodules
(10%) 📁 ci
(10%) 📄 Dockerfile
(9%)  📄 .rustfmt.toml
(9%)  📁 assets
(8%)  📄 rust-toolchain
(7%)  📄 clippy.toml
(7%)  📁 .vscode
(7%)  📄 SECURITY.md
(6%)  📁 doc
(6%)  📁 crates
(6%)  📁 tools
(5%)  📄 LICENSE.md
(5%)  📁 fuzz
(4%)  📄 deny.toml
(4%)  📁 docker
(4%)  📄 appveyor.yml
(3%)  📄 bors.toml
(3%)  📄 codecov.yml
(3%)  📄 package.json
(3%)  📄 LICENSE.txt
(3%)  📄 rust-toolchain.toml
(3%)  📁 core
(3%)  📁 test
(3%)  📁 .circleci
(3%)  📁 bin
(3%)  📁 resources
(3%)  📄 .clippy.toml
(3%)  📄 Cross.toml
(2%)  📁 cli
(2%)  📁 contrib
(2%)  📁 images
(2%)  📁 lib
(2%)  📄 release.toml
(2%)  📁 book
(2%)  📁 config
(2%)  📄 ARCHITECTURE.md
(2%)  📄 Makefile.toml
(2%)  📄 RELEASES.md
(2%)  📁 data
(2%)  📁 snap
(2%)  📄 .gitlab-ci.yml
(2%)  📄 .codecov.yml
(2%)  📄 COPYRIGHT
(2%)  📄 shell.nix
(2%)  📄 COPYING

Yet again, no surprises in the top of the list.

I guess the tests folder mostly contain integration tests as unit tests tends to be written alongside the source code in most of the Rust projects I've looked at.

examples are nice, especially for library crates.

Having a dedicated docs-folder is also quite common at 23%.

Fun fact on continuous integration: it seems like 80% are using GitHub Actions and 19% are using Travis CI.

Method

This is how I collected the data.

First, we need to collect a list of Rust projects from GitHub. I did this using the GitHub REST API /search/repositories. The max page_size on this API is 100, so we need to use the pagination to collect more than 100 repos.

def get_repo_list(args):
    repos = []

    page = 0
    per_page = 100
    if args.repo_count < 100:
        per_page = args.repo_count
    
    while len(repos) < args.repo_count:
        response = requests.get(
            'https://api.github.com/search/repositories',
            params = {
                'q': 'language:rust',
                'per_page': per_page,
                'page': page
            }
            )
        response.raise_for_status()
        result = response.json()
        
        repos += result["items"]
        page += 1
    
    return repos

Now we have a large list of repos, so let's start counting file and folder entries on them. We do this using the /contents/:subfolder API.

for repo in repos:
    response = requests.get(
        '%s/contents/%s' % (repo["url"], args.subfolder),
        headers = headers
        )
    if response.status_code == 404:
        skipped_repos += 1
        continue
    
    response.raise_for_status()
    contents = response.json()

    total_repos += 1
    for item in contents:
        name = ""
        if item["type"] == "file":
            name += "📄 " # Oh yes, I'm using emoji in source code 😎
        else:
            name += "📁 " # Oh yes, I'm using emoji in source code 😎
        name += "%s" % (item["name"])

        if file_count.get(name):
            file_count[name] += 1
        else:
            file_count[name] = 1

Finally, I wrapped these loops with proper argument parsing, caching to spare Github of excessive traffic and some error handling.

NB! If you want to run this script yourself, I recommend setting up a personal access token and provide it to the script with the -g <TOKEN> option. Without authorization, you will only be able to fetch 60 repos, and that would not be a large data set.

Hey, I'm Magnus, a developer from Norway.

I'm currently employed at Fink AS.

I also write about technical stuff in general

Newtype or newtrait? Ways around the orphan rule

2024-10-203 minute read

Can we work around the orphan rule in Rust using traits instead of newtypes?

Read more

I made an AI chatbot answering questions for employees at our company

2023-10-219 minute read

In this article, I'll explain to you how I used OpenAI embeddings and completions API to implement an AI chatbot answering questions for employees at our company. The code examples I'll show are written in Typescript, but the same principles apply to any programming language.

Read more

Kunstig Humor – Improv theater meets AI

2023-10-184 minute read

This fall, I've been assisting the comedy group Vrøvl in setting up a show where improvisers and the audience interact with AI live on stage.

Read more

My first stab at 3D game development

2023-08-233 minute read

I wanted to learn about 3D game development, so I set out on a small project inspired by a friend of mine who lives on a farm. His name is Gunnar and he lives on a farm called Steinseth Gård.

Read more

How most Rust projects are organized (Part 2)

2022-01-102 minute read

In part 2, I manually inspected a selection of Rust projects looking for patterns in how files and folders usually are structured.

Read more

I implemented Twitter in the woods using military radios

2018-02-0415 minute read

In this project I implemented a very simple Twitter-like application for use in networks with very low bandwidth and high packet loss rate.

Read more