I just deleted 1,372 disks from Google cloud and 7 project spaces.

Scott Williams 🐧

@spmatich I did Openshift at my last two employers, but currently using Rancher here, on prem, with paid support. We have a team that supports it in addition to other infrastructure.

Sass, David

@vwbusguy I just have one question.

How is backup being done and stored, and does the storage storing the backup included in that cost?

The last time I was designing an on-prem storage system to store the backups of the data WITHOUT the infrastructure backups we basically ended up with the triple of the cost of the live system.

DrRac27

@vwbusguy @Viss @arichtman @mttaggart what do you think about Docker Swarm today? I tried k8s in my homelab and I hated it. Just not a great fit for such a low scale. Now I run Docker Swarm and I hate it much less. Still not great though but I see no alternative...

Scott Williams 🐧

@DrRac27 @Viss @arichtman @mttaggart If you want a small scale lightweight k8s, then I recommend k3s. You can run k3s on one node.

K3s

(k3s.io)

DrRac27

@vwbusguy @Viss @arichtman @mttaggart thats what I tried first but I liked it even less. In k8s I at least had to learn how it works and every upgrade has a defined path. In k3s the install is `curl | sh` and what about upgrades? Just swapping out the binary and hope nothing breaks? I got it up and running with Ansible but I was not feeling great about it and expected it to break all the time. With swarm I just install the debian package and use the community.docker.docker_swarm ansible module

Scott Williams 🐧

@DrRac27 @Viss @arichtman @mttaggart Upgrade for k3s is you run that same script again. It upgrades the components for you. You can also revert versions and you can backup etcd in case you want to start fresh. Etcd on k3s single node is just an sqlite database.

Scott Williams 🐧

@DrRac27 @Viss @arichtman @mttaggart Coincidentally, Ansible is the reason I got into using k3s. I've been running AWX on it for years in my dayjob for an environment where I didn't have k8s established but just wanted to run Ansible AWX there.

Scott Williams 🐧

@sassdawe That's a valid question. It's important context that we weren't starting from scratch on prem but have plenty of existing infrastructure. Backups are both local to cluster storage (eg, Longhorn) and in a completely external ceph environment (RGW and/or EBS). Longhorn, etcd snapshots, Rancher Backup, etc, are backed up to Ceph RGW. Basically, if we have the node token secret and RGW secrets for a cluster, we can recreate everything.

Scott Williams 🐧

@sassdawe Needles to say, paying for Longhorn commercial support was part of the cost factor. Even though it's 100% FOSS, doing it in production without commercial support is too much of a continuity risk. Doing it this way was also a small fraction of the cost of ODF from Red Hat.

Scott Williams 🐧

@sassdawe If you mean host hardware contingency, we're using Rancher Elemental to provision hardware with SLE Micro and assign it to clusters as necessary. There are other ways to do this with k8s, such as metal3, which is what Openshift uses under the hood.

https://elemental.docs.rancher.com/
https://metal3.io/

I definitely recommend doing reproducible immutable #Linux for #Kubernetes hosts, whether that's Sidero, SUSE, RHEL, or Fedora.

DrRac27

@vwbusguy @Viss @arichtman @mttaggart ok good to know. I still don't think it is right for me but at least I learned sth, thanks!

Scott Williams 🐧

@DrRac27 @Viss @arichtman @mttaggart For things that I run in a container that don't need all the overhead of Kubernetes, I use podman with systemd to manage, so they end up running more like traditional Linux services, but getting updates through `podman pull` instead of yum update. Podman plays nicer with rootless, firewalld, cgroups2, etc., and has a fairly straightforward migration path to k8s if you end up needing to go bigger down the road.

Scott Williams 🐧

@DrRac27 @Viss @arichtman @mttaggart My general opinion is that podman with a proxy in front (eg, caddy, nginx) can do most of what swarm can with less overhead and if you really need more than that, then you probably should be thinking about Kubernetes anyway.

Scott Williams 🐧

@DrRac27 @Viss @arichtman @mttaggart And if multitenancy with security is your end goal, then check out Kata Containers.

It let's you orchestrate container workloads as tiny VMs.

Kata Containers - Open Source Container Runtime Software

Kata Containers is an open source container runtime, building lightweight virtual machines that seamlessly plug into the containers ecosystem.

(katacontainers.io)

DrRac27

@vwbusguy @Viss @arichtman @mttaggart I would love to use podman or kata but then I have no orchestration, right? If one node goes down for what ever reason (reboot, crash, I want to change hardware or reinstall) no other node picks up the tasks of that node? Can I build a sane failover with something like keepalived? If I had more time I would just write something myself, I can't believe nobody did it yet...

Taggart :donor:

@DrRac27 @vwbusguy @Viss @arichtman Yeah so this is why I teach starting with Swarm for orchestration, then moving to Podman/k3s once the need arises.

I like Podman a lot, but your concerns are real. I'd also add that while yes, much of Swarm functionality is achievable to a degree with Podman and a reverse proxy, that is additional deployment complexity for a solution designed to reduce it.

Scott Williams 🐧

@DrRac27 @Viss @arichtman @mttaggart That's an absolutely fair point and you're generally right. I would use Ansible to automate it and while systemd can trigger a restart on a failed container process, podman health check is mostly just for notifying journald that there might be a problem but doesn't pro-actively do anything about a container where the process is running but unhealthy.

Scott Williams 🐧

@mttaggart @DrRac27 @Viss @arichtman That's a valid point. In my setup, I have config management and monitoring services, making podman more practical, but if you don't already have those things, podman is less useful. It also ultimately depends on your SLA. IOW, can you afford the downtime vs added complexity trade off?

DrRac27

@vwbusguy @Viss @arichtman @mttaggart I think I don't fully understand. How would you automate failover with ansible?
The last days I was working much on my homelab and if I weren't invested too much in Swarm yet I would have tried k8s again Swarm does not even support devices like GPU or Zigbee Sticks (without hacking) and I wanted to run a registry that is only reachable on localhost (so inside of the whole cluster via the builtin loadbalancer) but that isn't supported in swarm mode eighter.

Scott Williams 🐧

@DrRac27 @Viss @arichtman @mttaggart Hey, so I was wrong about this. They actually did at ldd support for this as of podman 4.3.

Podman at the edge: Keeping services alive with custom healthcheck actions

Podman is well known for its tight integration with systemd. Running containerized workloads in systemd is a simple yet powerful means for reliable and rock-...

(www.redhat.com)