Degoogling, and Better Tech

2022-04-25

In 2019 I began the process of "degoogling" my life - making it so that i didn’t use (or didn’t need to use) big tech companies who are in the business of indexing my data. I sometimes mention to people that I’ve been on this process, or some new thing i was able to reclaim control over, and this often snowballs into a thorough explanation of the steps I took to get where I am.

This is meant as an abbreviated guide to selfhosting your own copies of things, in the way that I do. I don’t claim this is the "best" way, or the "safest" way, or even the way that’s right for you. But this is what I do, and I have good reason for it.

I want to emphasize that everything here documents a journey, not just the end result. Many things I did in the beginning were suboptimal, as will yours. There is no one-click "make me a replacement for every company product" solution, you are going to have to learn some amount of system administration to do this. If you’re already experienced, I do mention where i’m currently at.

The Server

The cornerstone of any selfhosting is the actual machine you self-host with. Many people choose a cloud server; I find this to be atrociously expensive. I use an old gaming PC that is past its prime - but any PC will do just fine. In ~2013 i bought a recycled dell optiplex for $100, which is a pathetically weak machine, but could absolutely do everything necessary to selfhost for a family. Do not fool yourself into thinking that you need raw computing power; you just need at least 4gb memory, and a bunch of SATA / pci slots to plug in hard drives. Don’t worry much about how old it gets, the processor, memory, and motherboard of your server matter very little - the only thing we actually care about keeping for a long time are the hard drives.

The hard drives can be any size or capacity to begin with. I started with a single 4tb drive, then upgraded to a RAID0 of three drives, and finally to a ZFS with six. At no point was data ever lost. What you should do depends on which of those words you already know - if you’ve set up a raid, do that. If you know of ZFS, trust me when i say it is unparalleled and a must-have.

Some might advise that you set up multiple machines, possibly with an orchestrator like Kubernetes or Nomad, so that you can keep your applications as pods on VM’s and easily migrate the compute around. I never find this to be valuable for selfhosting. For actual enterprise development, where you’re deploying and iterating on an application that needs to talk with others and revolves around a database, sure. But selfhosting isn’t about databases and making an API, it’s about file storage and having the bare minimum compute. Adding boxes defeats the purpose and complicates the picture. Add drives, not boxes.

The General Pattern

Any flavor of Linux will do (or BSD, if you’re like that). I use Debian, and periodically update when they release a major version. The OS itself matters quite little, as the data is on a ZFS (previously a raid) array, and all of the applications are run via docker-compose files for each individual application. All of this is in a git repo which is on the array itself, meaning that any machine you connect that set of drives to can be used to fully rehydrate the entire server without much hassle.

Every application I list, further down, is going to follow that pattern - a docker-compose.yml and some small configuration files that are mounted to the array. I won’t spend time explaining the setup of each one. For people who know how to apt get but not docker run, do yourself the favor and make that jump. Package upgrades and reinstallation is a nightmare.

Storage

If the cornerstone of selfhosting is the server, the keystone is the storage. You don’t need much processing, but you do need a way to access your data securely via the internet and intranet. I tried owncloud early on, but found it to be severely slow, buggy, and not well-served by being as multi-tenant as it was. It was a horrible experience. I advise not doing that at all, and focusing on a single protocol that lets you manipulate a limited part of your filesystem from anywhere. Don’t look for a full-featured solution.

I personally use webdav, via a server i wrote called boji. For the most part it’s a straightforward webdav server, but if you provide a password with a colon in it (e.g., "password:ultraSecretKey") it will symmetrically encrypt all files you write with that key. This means the files are encrypted at rest, and the key is not stored on the server; even if someone kicks down the door and steals the whole machine, your data is still unreadable to them, but not to you. I won’t go too deeply into dismissing myths about webdav, suffice to say this is the best way I’ve found.

Domains, subdomains

In the beginning I just opened ports for various things i hosted and remembered which numbers corresponded to which service. After a dozen or so services, and becoming increasingly uneasy with port scanners, i opted to have subdomains for each service, all of which CNAME to the same A record pointing at my server. I use ez-ipupdate to update that master A record in case my IP changes. And I use nginx-proxy as ssl termination and routing to specific containers. For this, you’ll need certs, and of course LetsEncrypt is the right way forward. I cron a headless certbot each night that generates/validates an ECDSA key, generates a CSR, and handles the challenges from LE to acquire/refresh a cert. Long story short, any cert will do, but ECDSA is better.

I used to use dyn.com to manage my DNS, but they got bought by Oracle and subsequently dismantled. These days I go with easyDNS due to their long commitment to privacy (yes, i know how the front page looks).

Backups

Selfhosting is great, but everyone worries about losing the server in some catastrophe. I ran without backups for a very long time, and then with a very hacky set of tar.xz.gpg backups uploaded to google drive. But these are both terrible options - restic is dead easy to set up and absolutely the correct way to have encrypted, compressed, delta-encoded backups with multiple snapshots in a minimal disk footprint. I use rsync.net for offsite storage, because they’re dead cheap and actually offer rsync and restic protocols. They’re cloud storage specifically for nerds, they don’t even offer a file browser for your files, because you’re expected to know wtf you’re doing. For some limited time there was even a discount code buried in their technical documentation, saying that if you were that deep in documentation you wouldn’t need support and would be a cheaper customer. They also have a warrant canary, which I have a weekly cronjob to verify.

Intermission

Everything up to this point was the bare minimum necessary to securely and scalably selfhost, it’s a lot of work and it’s why the process takes years to do. None of it is particularly "bonus", each thing really is necessary. If you’re skimming, consider this the point where the article switches from "how to set up the foundation of selfhosting" to "what you can actually do with that foundation".

Applications

Password manager: Enpass. It’s an AES encrypted sqlite file that you can sync on any number of devices. People swear by keepass or even pass, but Enpass is the only one i’ve found that encrypts both the content of your identities and also the existence of any given one. It has great browser/mobile support.
VPN: Wireguard. I have an excellent home internet provider that doesnt track me, so i keep a wg client on my phone at all times. It’s worth knowing how to set this up as well, because you can spin up a cloud instance anywhere to bypass regionlocks much cheaper than any corporate VPN provider can. I’d warn against using OpenVPN, it’s a resource hog on both server and client, taking up to 30% of my phone battery during waking hours.
Calendar/Contacts: Radicale + davx5 for android sync. Radicale does what it says on the tin, it keeps calendar and contacts stored on your server, and it works over the standard protocols and formats. Make sure not to use google, because this is just another vector for them to develop a social graph.
RSS: tiny tiny RSS. Using an RSS reader instead of relying on algorithmic news is a fairly enlightening experience these days, since you can visibly see political fracture. I follow a number of webcomics, humor sites, blogs, and independent news organizations. I’d recommend FocusReader as a mobile client.
Push Notifications: gotify. Basically a way to curl your server and make a notification appear on your phone. It replaces pushbullet, IFTTT, and a few other things for me. Generally, it alerts me if the server has issues, facilitates a nightly report of warrant canaries/backup status/etc, and replaces most iot apps which only serve to give push notifications.
libreElec/kodi: I slapped this on a spare tinkerboard, and hooked it to the TV. While it’s great at playing videos from home, it also has a great Twitch client and android remote (yatse). Twitch has never worked right on my Chromecast, but kodi is able to handle it with ease.
Voice chat: mumble. I used Ventrilo as a teenager - Mumble is the grownup free version of that. Discord has its fans, but is just another corporate platform, and the recent moderation / TOS controversies just bear that out. Teamspeak has its fans, but I’ve always found it to be clunky, slow, and bloated. Mumble does what you want, and nothing more.
For playstation: ps3proxy, i have an original first-generation ps3 with hardware emulation for ps1/ps2 discs. It’s a huge nerd treasure, and this lets it get on PSN without updating firmware. PS3/PS4, for some reason, have no support for jumbo frames, and have a whole mess of network stack problems. I use squid on the server to do the actual transfers over the internet, and the consoles use it as an http proxy, so that their awful network only has to go through my router and not over the internet. Boosts up/down speeds significantly, the difference is night and day.

Hosted sites

Some websites are built well enough that you don’t even need their servers to run it, just ctrl-s, host on your own nginx, and you’ve got the same functionality forever with none of the ads or analytics.

Wordle - I have the distinction of running Wordle from pre-NYT, meaning it has the full word list and no paywall or shifting answers. Not available anymore, and no guarantees that link will stay up.
Cyberchef. Swiss army knife of data analysis and common security tools. Works completely in the browser.
bitaddress.org. Creates bitcoin wallets. You could and probably should use an HD wallet if you don’t know what you’re doing, but if you just want to create a set of addresses securely, this is the way to go.

On the phone

Split into its own section, because running your own services is only part of it. Many apps can be wholly replaced with open source versions, when you can so easily sync files to your server.

Foldersync allows you to sync/move files to/from your phone and any cloud storage provider on a schedule (or instantly upon file change). It’s a godsend. I have it move pictures off my phone onto my server (where they’re encrypted by boji) nightly, as well as copy a number of app-specific data to the server in the same way (any app that offers to periodically back up data, such as Signal, wigle, or openscale is sync’d in this fashion).
File browser: Cx file explorer. May seem odd to include here, but this has seamless webdav support, and makes copy/paste easy between server and phone.
Podcasts: Antennapod. Open source, reads from literally every podcast service out there (and lets you specify an rss link if it cant find it), syncs nightly, downloads to the phone itself so you don’t have to worry about connectivity when you want to listen, and supports authorization for all the major networks. Meaning if you pay for stitcher, you still get premium podcasts from stitcher in antennapod.
Store: Aurora Store allows you to install apps from the Play store without Google tracking you. Helps to bypass regionlock. Of course, F-droid is a go-to whenever possible, but the selection isn’t what it should be. Usually i try to find an app on F-droid first, then fallback to using Aurora to grab from Play.
Bathroom scale: openscale (android). I used to use a Fitbit Aria scale, which stored and indexed my weight/bmi/bf, and sat on my wifi to do it. After almost a decade i pitched the thing out the window and bought a Xiaomi Mi v2 scale ($18 on amazon), which only connects via bluetooth. Normally you’d download Xiaomi’s app, but openscale completely replaces it (i never installed the Xiaomi app at all, even for configuration). Gives you all the same graphs, goals, and insights without any internet use. Data is in a sqlite db that’s sync’d and backed up. Easy to import/export a csv of your metrics from other scales.
Ultra GPS Logger is an embarassingly clunky app that i use to log every car drive, walk, or bike ride that I do. I often ask myself "wow how far did we hike?" after the fact, and this gives a GPX track for wherever we went. It’s handy to roughly guess calorie expenditure, and should wholly replace Google Maps or Stravia tracks. Those GPX files are sync’d to the server with foldersync.
Turboscan takes a picture of a piece of paper, isolates it, and lets you save it as an image or pdf. I mostly use it for receipts, but it’s also a very convenient replacement for a scanner - i don’t bother taking paperwork to the scanner and trying to get it to work anymore, i just use this. Naturally all the scans are saved in the phone, sync’d, and backed up automatically.
OsmAnd is an OpenStreetMap viewer. I have mixed feelings about OSM in general, it doesn’t give most of the data i actually need out of Google maps, and it’s generally got far worse UX (you haven’t appreciated how readable google maps is until you’ve seen OSM clients). But, it’s completely offline, the directions are equally good, and it has excellent topographical data, which is all a boon for when you’re driving or hiking in unfamiliar or remote areas.

DNS

While not on the server itself (for obvious reasons), I also run DNS over TLS with Adguard Home on a Pi to prevent any sort of man in the middle snooping of DNS queries originating from my network, which blocks out a number of trackers and ads. The upstream i prefer is quad9, since Cloudflare likes to put its thumb on the scales of political and anticompetitive waters a little too much, and Google’s resolver is obviously not an option.

Together, these protect the entire house against swaths of ads, trackers, ISP tracking, and any MITM attempts.

I used to use stubby-docker with pihole, but the setup itself was questionable. Stubby-docker wasn’t originally meant to run on ARM, and required a couple compiler flags to make work. The author even expressed surprise that it worked at all on a Pi. Pihole is respectable but not great software, being a huge mishmash of python scripts and other daemons running in a container. You get the idea it isn’t even meant to run in a container, since the documentation often encourages you to restart specific daemons. Adguard is much cleaner, easier to use, gives more information, has a lower footprint, and tends to just work better.

What about communications?

Notably missing from all this is how to talk with other people, or use social media. I use Signal and run a Matrix server to call / talk with friends and family. And as far as social media, I don’t have accounts anymore. Not for any particular reason, but because i was around in the early days when social media was fantastic, enlightening, and useful - and i’ve seen the downfall of every site i used to respect. From stackoverflow to reddit, i was there at the best and left at the worst. But i do sometimes read them when linked by others, and for those, i want a JS-free privacy-respecting open source alternative, such as…

Reddit: teddit.net. Take any reddit url, replace "www.reddit.com" with "teddit.net", and you’ll get a no-javascript, no-tracking, no ads, ultra fast page that just gives you the content you wanted. Self-hostable, if you care to.
Twitter: nitter.net. The same as Teddit, except for Twitter. I’m not sure if you’ve ever tried to use twitter without an account, but it’s a nightmare of popups and nagging. Nitter solves that. Self-hostable, if you care to.

The above two below are standouts, but i can’t recommend libredirect enough. A browser extension that handles the url redirection to these sorts of open-source privacy-respecting alternatives for you.

Email

protonmail. Not selfhosted, but if you go through the work to create MX dns records you can use them as your email provider for your domain. Running your own mailserver is a nightmare (i’ve tried), someday maybe I’ll try again.

As a security trick, use plus-aliased emails when signing up for a service. E.g., if I create an account at amazon.com, I’d use Enpass to generate a new password (and store it encrypted on my server, sync’d via webdav), then for the email I’d do business+amazon@mydomain.invalid. The +amazon means that the service will send only to that address, but you’ll see everything sent to business@mydomain.invalid in your mailbox, regardless of what came after the + in the email. This means you have a "different email per service", and any security breach in one is immediately obvious when another tries to send email to you. If you get an email claiming to be from Google, but it’s sent to your +amazon address, something is phishy.

Almost every mail service supports plus-aliasing, and almost every website happily sends to it.

I still have an old gmail account, which sometimes receives stuff i care about (although it’s mostly spam), which i use FairEmail to view. It’s careful not to send out trackers to senders of email unless you specifically want to.

Not selfhosting, but worth mentioning for privacy

Privacy.com allows you to create virtual prepaid cards all tied to the same debit card. Think of it like a password manager for your money - if one site is compromised and your financial data is stolen, the attackers only get the one virtual card, which you can close/pause/set limits on anytime you like, and is tied to one merchant (and can’t be used at other merchants). This is a godsend for budgeting and security.

Combined with the "email" above, almost every site i register at to has a unique email, password, and payment info, all of which can be changed and altered at my discretion.