Introduction to Six Birds Theory
I read this thing called Six Birds theory and it’s basically a way to talk about “emergence” without making it sound mystical. People say stuff like “order emerges” or “meaning emerges” and it’s vague, but this paper tries to make it more like a system you can reason about.
The main idea is that a lot of what we call “things” are not fundamental objects you discover, they’re what you get after you compress information and then you force the compressed description to be consistent.
So imagine you have a really complicated system with tons of tiny details. In CS terms, the full state is too big, like a massive struct with fields you can’t even store or observe.
You don’t work with that directly.
You build an abstraction. You map many micro-states into one macro-state. That’s the first move: you decide what distinctions you care about and you drop the rest. The paper calls this coarse-graining, but it’s basically what you do in logging, monitoring, ML features, and basically any time you summarize data.
Then there’s a second move that matters a lot: after you compress, you usually normalize. Like you format code until it looks standard, or you canonicalize something so there aren’t multiple representations of the “same” thing.
The paper uses math words like closure operator and fixed point, but the simple version is: you apply a “complete the description” rule, and if you apply it again it doesn’t change anymore. When that happens you have a stable object. That’s what the paper wants to call an emergent object: a stable fixed point of your own description process. So emergence calculus is not “new stuff appears,” it’s more like “my interface and my normalizer settle into a stable pattern.”

Six Birds Theory: Explanation
The “six birds” are six patterns the paper claims show up when you do this kind of modeling. I’m not going to pretend the names helped me, but the patterns are basically about what can go wrong or what you’re forced to include. One big thing is that you want macro-level rules that work by themselves, but sometimes they don’t.
If you update the micro-system and then compress, you might get a different result than if you compress first and then try to update at the macro level. That mismatch is important because it means your abstraction is not closed under the dynamics. In software terms, your API is leaky.
You thought you built a nice interface, but the underlying behavior depends on details you threw away. The paper treats that mismatch as a diagnostic: either refine the abstraction, change the macro rule, or accept that this level can’t have a clean law.
Another idea is that not every macro-state is actually possible. Like, you might describe some summary you want, but there might not exist any micro-state that produces it. This is kind of like having a type or schema that looks valid but has no actual values that satisfy all constraints. When you compress, you carve out a subset of summaries that are “representable,” and the rest are fake states that only exist in your head.
The part that felt most practical to me is the stuff about irreversibility, like the arrow of time. People sometimes say that coarse-graining “creates” irreversibility because when you lose information you can’t reconstruct the past.
But the paper is picky and says you shouldn’t confuse “I forgot details” with “the system is physically irreversible.” It uses an information-theory audit based on comparing forward histories to reversed histories.
The point is: if you do honest processing, throwing away information should not increase whatever measure of time-asymmetry you’re tracking. If it looks like it increased, you probably hid some state or you have an external driving protocol you didn’t model, like changing rules over time but pretending the system is autonomous. This is basically the paper calling out a modeling bug and saying “no, include the clock in the state.”
There’s also a claim about novelty that I think is kind of straightforward: if your “theory” is basically a limited language, most possible new properties you might want to talk about won’t be definable in the old language.
So strict extensions are easy once the underlying system is complex. But also, you don’t get endless novelty by just running the same closure again and again. If you keep applying the same normalization step, you’ll just get the same fixed points.
To get genuinely new concepts, you have to change the lens, like add new predicates or new measurements. That matches how tech works too: you don’t get a better debugger by staring harder at the same log output. You add new instrumentation.
Summary
So if I had to explain what SixBirds is trying to do, it’s like a framework for layered modeling where “things” are stable compressed descriptions, and it gives you checks so you don’t accidentally claim something deep when it’s just an artifact of your abstraction pipeline.
It’s not trying to be poetic. It’s trying to make emergence feel like something you can test, like you’d test whether a system property is real or just a result of your tooling.
As always, thank you so much for reading How to Learn Machine Learning and have a wonderful day!
Subscribe to our awesome newsletter to get the best content on your journey to learn Machine Learning, including some exclusive free goodies!

