this week I'm reading Human Factors in Systems Engineering
there are so many gems I've highlighted already but really vibed with how the author clearly and simply expressed the impact of writing docs "early" here
this week I'm reading Human Factors in Systems Engineering
there are so many gems I've highlighted already but really vibed with how the author clearly and simply expressed the impact of writing docs "early" here
Are you looking for a new remote job? Browse 400+ remote positions from open source companies including @acquia @grafana @mozilla @wikimediafoundation and more on #OSJH
https://opensourcejobhub.com/jobs/?q=remote&utm_source=mosjh
#career #OpenSource #engineer #sales #security #marketing #CloudNative #developer #DevSecOps #SRE #FOSS
Want to grow your open source career? The LiFT Scholarship offers training & certs to help you level up—whether you're starting out or advancing.
Apply by April 30: https://app.smarterselect.com/programs/102338-Linux-Foundation-Education
Running an Incidents 101 training tomorrow. Including two games, both involving some dice rolling, should be fun. I don't feel nervous, I know what process and common ground to cover.
Trying my best to keep the material interesting between getting some interaction and showing the necessary slides with steps and rules on them.
In one activity we throw gaming dice and build a context,
randomizing things like customers affected, size of response, time of day, etc. Then use the rules for gauging severity. That's the whole game!
I hope above all that the activities go well and the people unfamiliar with the process will have an opportunity to learn something. I cannot make it be everything for everybody, but I hope to the right people it is a help.
a short lil blog post sharing how re-reading the evergreen etsy Debriefing Facilitation Guide helped me better investigate a mysterious sound....
Not sure if I asked this before: Does anyone use anything in particular to inject #apache logs into #SQL databases? I have been looking around and asking around and the only solid I got was "do not expect an apache module for that; it would introduce too much latency to each request" in #httpd@libera.chat.
System Administration
Week 10, Backups: Core Concepts
In this video, we begin our discussion of backups by covering some core concepts and terminology, looking at full vs. incremental vs. differential backups and the difference between long-term storage and disaster recovery of files due to more localized data loss.
And here’s the big reveal:
Virtual flash cards for the key terms for all of DevOps Institute’s exams. I took the glossaries from all their public study guides, deduplicated them, converted the courses they appear in to tags and added an exam they missed.
https://github.com/ajn142/DOI-Exam-Glossary
Reposting because I forgot the number one rule of chronological timelines (don’t post when everyone’s asleep lol).
Site Reliability Engineering is often like Cassandra (not the database), where you tell devs the kinds of scaling issues they'll see if they continue following clever shortsighted patterns — you're frequently correct but they never believe you.
System Administration
Week 9, Writing System Tools
This week we're going on a side-quest to discover solid #programming best practices that apply across simple scripting, prototyping, growing your tools, and owning a software product. We don't have videos for this topic, but the slides below include a lot of hopefully useful links ranging from coding style to ticket management and commit messages.
https://stevens.netmeister.org/615/09-writing-system-tools.pdf
If you've tried both Thanos and Mimir, which do you prefer? Feel free to comment why below
So, I've been using Thanos to receive and store my prometheus metrics long term in a self hosted S3 bucket. Thanos also acts as a datasource for my dashboards in Grafana, and provides a Ruler, which evaluates alerting rulers and forwards them to my alertmanager. It's ok. It's certainly got it's downsides, which I can go into later, but I've thinking... what about Mimir?
How do you all feel about Grafana's Mimir (source on GitHub)? It's AGPL and seems to literally be a replacement of Thanos, which is Apache 2.0.
Thanos description from their website:
Open source, highly available Prometheus setup with long term storage capabilities.
Mimir description from their website:
...open source software project that provides horizontally scalable, highly available, multi-tenant, long-term storage for Prometheus and OpenTelemetry metrics.
Both with work with alloy and prometheus alike. Both require you to configure initially confusing hashrings and replication parameters. Both have a bunch of large companies adopting them, so... now I feel conflicted. Should I try mimir? Poll in reply.
Hello, hachyderm! we've been working hard on building up our ansible runbooks and improving hachyderm's overall resilience. Recently, we've been focusing on is database resilience.
We're getting close to retiring our original database server (finally!) and preparing to move to a fully ansible-managed set of databases servers, primary and replica on new hardware. We'll send another announcement when we do the cut over. The team has done excellent work to make this highly automated, quick, and painless!
Done:
author ansible roles for managing postgresql, pgbackrest (backups), pgbouncer, and primary/replica failover
decide to continue with pgbouncer and *not* use pgcat
rotate database passwords
order new replica database hardware
order new future primary database hardware
To do soon:
rebuild replica database with ansible scripts
prepare primary database with ansible scripts
start replicating to new database replica
cut over to new database server
We're also planning on open-sourcing our ansible roles in the coming weeks - just a little housekeeping & tidying up before we do!
System Administration
Week 8, HTTPS & TLS
After discussing HTTP in the previous week and seeing how we used STARTTLS in the context of #SMTP, we are now quickly reviewing HTTPS, TLS, and the WebPKI. While we don't have a video segment for this, here are slides, including this handy diagram illustrating the CSR process:
hey, fediverse friends - i'm excited that we're finally announcing our Fediverse Security Fund over at @nivenly to help make fedi software more secure.
we're starting off super small to see if the Fund is a thing that can help. along the way we'll learn and improve our intake/payout process. and if there's solid interest and we see good impact, we'll hold a member vote near the end of the experiment to decide if we'll renew/expand the program.
thanks to @thisismissem for her contributions and being the first disclosure to validate the process.
let's close some vulns!
If you're at #KubeCon today (or the rest of the week?) do go say hi to the #CUE folks I help write their open-source docs (https://cuelang.org/docs) 'cos I *really* want the awesome tech to succeed!
If you're a #SysAdmin or #DevOps in #YAML config hell go and have a chat with them - next to the CNCF corner store at stall S761
System Administration
Week 8, The Simple Mail Transfer Protocol
Shared by a student of mine: Email vs Capitalism, or, Why We Can't Have Nice Things, a talk given by Dylan Beattie at NDC Oslo 2023. Covers a lot of our materials and adds some additional context.
Pushing core workout lately and being rewarded with more mornings free of migraine.
I played deeply into my music the past few nights, awaking the next morning scrubbed of a migraine.
Having those who listen and witness allows me to let go of emotions when I am having them, not carry them around. Less migraine activity ensues.
This week I learned that my anxiety about others is entwined with a particularly evil symptom of religious trauma, I saw both but never saw hiw they were connected.
I can recognize it now. And the feeling of not needing to "save" someone is a really powerful emotion - or lack of one - that, today, I am thankful for contributing to a clear head and no migraine.
Also feeling self-assured that fixing failures in our systems look a lot more like treating a migraine than using quick-fixes and low-hanging-fruit.
System Administration
Week 8, The Simple Mail Transfer Protocol, Part III
In this video, we look at ways to combat Spam. In the process, we learn about email headers, the Sender Policy Framework (#SPF), DomainKeys Identified Mail (#DKIM), and Domain-based Message Authentication, Reporting and Conformance (#DMARC). #SMTP doesn't seem quite so simple any more...
System Administration
Week 8, The Simple Mail Transfer Protocol, Part II
In this video, we observe the incoming mail on our MTA, look at how STARTTLS can help protect information in transit, how MTA-STS can help defeat a MitM performing a STARTTLS-stripping attack, and how DANE can be used to verify the authenticity of the mail server's certificate.