Writ

Degoogling, and Better Tech

2022-04-25

In 2019 I began the process of "degoogling" my life - making it so that i didn’t use (or didn’t need to use) big tech companies who are in the business of indexing my data. I sometimes mention to people that I’ve been on this process, or some new thing i was able to reclaim control over, and this often snowballs into a thorough explanation of the steps I took to get where I am.

This is meant as an abbreviated guide to selfhosting your own copies of things, in the way that I do. I don’t claim this is the "best" way, or the "safest" way, or even the way that’s right for you. But this is what I do, and I have good reason for it.

I want to emphasize that everything here documents a journey, not just the end result. Many things I did in the beginning were suboptimal, as will yours. There is no one-click "make me a replacement for every company product" solution, you are going to have to learn some amount of system administration to do this. If you’re already experienced, I do mention where i’m currently at.

The Server

The cornerstone of any selfhosting is the actual machine you self-host with. Many people choose a cloud server; I find this to be atrociously expensive. I use an old gaming PC that is past its prime - but any PC will do just fine. In ~2013 i bought a recycled dell optiplex for $100, which is a pathetically weak machine, but could absolutely do everything necessary to selfhost for a family. Do not fool yourself into thinking that you need raw computing power; you just need at least 4gb memory, and a bunch of SATA / pci slots to plug in hard drives. Don’t worry much about how old it gets, the processor, memory, and motherboard of your server matter very little - the only thing we actually care about keeping for a long time are the hard drives.

The hard drives can be any size or capacity to begin with. I started with a single 4tb drive, then upgraded to a RAID0 of three drives, and finally to a ZFS with six. At no point was data ever lost. What you should do depends on which of those words you already know - if you’ve set up a raid, do that. If you know of ZFS, trust me when i say it is unparalleled and a must-have.

Some might advise that you set up multiple machines, possibly with an orchestrator like Kubernetes or Nomad, so that you can keep your applications as pods on VM’s and easily migrate the compute around. I never find this to be valuable for selfhosting. For actual enterprise development, where you’re deploying and iterating on an application that needs to talk with others and revolves around a database, sure. But selfhosting isn’t about databases and making an API, it’s about file storage and having the bare minimum compute. Adding boxes defeats the purpose and complicates the picture. Add drives, not boxes.

The General Pattern

Any flavor of Linux will do (or BSD, if you’re like that). I use Debian, and periodically update when they release a major version. The OS itself matters quite little, as the data is on a ZFS (previously a raid) array, and all of the applications are run via docker-compose files for each individual application. All of this is in a git repo which is on the array itself, meaning that any machine you connect that set of drives to can be used to fully rehydrate the entire server without much hassle.

Every application I list, further down, is going to follow that pattern - a docker-compose.yml and some small configuration files that are mounted to the array. I won’t spend time explaining the setup of each one. For people who know how to apt get but not docker run, do yourself the favor and make that jump. Package upgrades and reinstallation is a nightmare.

Storage

If the cornerstone of selfhosting is the server, the keystone is the storage. You don’t need much processing, but you do need a way to access your data securely via the internet and intranet. I tried owncloud early on, but found it to be severely slow, buggy, and not well-served by being as multi-tenant as it was. It was a horrible experience. I advise not doing that at all, and focusing on a single protocol that lets you manipulate a limited part of your filesystem from anywhere. Don’t look for a full-featured solution.

I personally use webdav, via a server i wrote called boji. For the most part it’s a straightforward webdav server, but if you provide a password with a colon in it (e.g., "password:ultraSecretKey") it will symmetrically encrypt all files you write with that key. This means the files are encrypted at rest, and the key is not stored on the server; even if someone kicks down the door and steals the whole machine, your data is still unreadable to them, but not to you. I won’t go too deeply into dismissing myths about webdav, suffice to say this is the best way I’ve found.

Domains, subdomains

In the beginning I just opened ports for various things i hosted and remembered which numbers corresponded to which service. After a dozen or so services, and becoming increasingly uneasy with port scanners, i opted to have subdomains for each service, all of which CNAME to the same A record pointing at my server. I use ez-ipupdate to update that master A record in case my IP changes. And I use nginx-proxy as ssl termination and routing to specific containers. For this, you’ll need certs, and of course LetsEncrypt is the right way forward. I cron a headless certbot each night that generates/validates an ECDSA key, generates a CSR, and handles the challenges from LE to acquire/refresh a cert. Long story short, any cert will do, but ECDSA is better.

I used to use dyn.com to manage my DNS, but they got bought by Oracle and subsequently dismantled. These days I go with easyDNS due to their long commitment to privacy (yes, i know how the front page looks).

Backups

Selfhosting is great, but everyone worries about losing the server in some catastrophe. I ran without backups for a very long time, and then with a very hacky set of tar.xz.gpg backups uploaded to google drive. But these are both terrible options - restic is dead easy to set up and absolutely the correct way to have encrypted, compressed, delta-encoded backups with multiple snapshots in a minimal disk footprint. I use rsync.net for offsite storage, because they’re dead cheap and actually offer rsync and restic protocols. They’re cloud storage specifically for nerds, they don’t even offer a file browser for your files, because you’re expected to know wtf you’re doing. For some limited time there was even a discount code buried in their technical documentation, saying that if you were that deep in documentation you wouldn’t need support and would be a cheaper customer. They also have a warrant canary, which I have a weekly cronjob to verify.

Intermission

Everything up to this point was the bare minimum necessary to securely and scalably selfhost, it’s a lot of work and it’s why the process takes years to do. None of it is particularly "bonus", each thing really is necessary. If you’re skimming, consider this the point where the article switches from "how to set up the foundation of selfhosting" to "what you can actually do with that foundation".

Applications

Hosted sites

Some websites are built well enough that you don’t even need their servers to run it, just ctrl-s, host on your own nginx, and you’ve got the same functionality forever with none of the ads or analytics.

On the phone

Split into its own section, because running your own services is only part of it. Many apps can be wholly replaced with open source versions, when you can so easily sync files to your server.

DNS

While not on the server itself (for obvious reasons), I also run DNS over TLS with Adguard Home on a Pi to prevent any sort of man in the middle snooping of DNS queries originating from my network, which blocks out a number of trackers and ads. The upstream i prefer is quad9, since Cloudflare likes to put its thumb on the scales of political and anticompetitive waters a little too much, and Google’s resolver is obviously not an option.

Together, these protect the entire house against swaths of ads, trackers, ISP tracking, and any MITM attempts.

I used to use stubby-docker with pihole, but the setup itself was questionable. Stubby-docker wasn’t originally meant to run on ARM, and required a couple compiler flags to make work. The author even expressed surprise that it worked at all on a Pi. Pihole is respectable but not great software, being a huge mishmash of python scripts and other daemons running in a container. You get the idea it isn’t even meant to run in a container, since the documentation often encourages you to restart specific daemons. Adguard is much cleaner, easier to use, gives more information, has a lower footprint, and tends to just work better.

What about communications?

Notably missing from all this is how to talk with other people, or use social media. I use Signal and run a Matrix server to call / talk with friends and family. And as far as social media, I don’t have accounts anymore. Not for any particular reason, but because i was around in the early days when social media was fantastic, enlightening, and useful - and i’ve seen the downfall of every site i used to respect. From stackoverflow to reddit, i was there at the best and left at the worst. But i do sometimes read them when linked by others, and for those, i want a JS-free privacy-respecting open source alternative, such as…

The above two below are standouts, but i can’t recommend libredirect enough. A browser extension that handles the url redirection to these sorts of open-source privacy-respecting alternatives for you.

Email

protonmail. Not selfhosted, but if you go through the work to create MX dns records you can use them as your email provider for your domain. Running your own mailserver is a nightmare (i’ve tried), someday maybe I’ll try again.

As a security trick, use plus-aliased emails when signing up for a service. E.g., if I create an account at amazon.com, I’d use Enpass to generate a new password (and store it encrypted on my server, sync’d via webdav), then for the email I’d do business+amazon@mydomain.invalid. The +amazon means that the service will send only to that address, but you’ll see everything sent to business@mydomain.invalid in your mailbox, regardless of what came after the + in the email. This means you have a "different email per service", and any security breach in one is immediately obvious when another tries to send email to you. If you get an email claiming to be from Google, but it’s sent to your +amazon address, something is phishy.

Almost every mail service supports plus-aliasing, and almost every website happily sends to it.

I still have an old gmail account, which sometimes receives stuff i care about (although it’s mostly spam), which i use FairEmail to view. It’s careful not to send out trackers to senders of email unless you specifically want to.

Not selfhosting, but worth mentioning for privacy

Privacy.com allows you to create virtual prepaid cards all tied to the same debit card. Think of it like a password manager for your money - if one site is compromised and your financial data is stolen, the attackers only get the one virtual card, which you can close/pause/set limits on anytime you like, and is tied to one merchant (and can’t be used at other merchants). This is a godsend for budgeting and security.

Combined with the "email" above, almost every site i register at to has a unique email, password, and payment info, all of which can be changed and altered at my discretion.

All site content protected by CC-BY-4.0 license