Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Select an option

  • Save timb-machine/bb0a53acb447556b455aab99b820aa41 to your computer and use it in GitHub Desktop.

Select an option

Save timb-machine/bb0a53acb447556b455aab99b820aa41 to your computer and use it in GitHub Desktop.
Can ChatGPT solve NCSC's hard problems?
prompt: NCSC has a research problem book. Once such problem is the following:
CC2 - How do we make system security assessments more data driven?
Subquestions:
How do we make system security assessments more data driven?
Strands or sub-problems
Why this is important
How do we make system security assessments more data driven?
Along with many private sector consultancy companies, the NCSC offers security architecture reviews as a service, by highlighting flaws in system designs so they can be corrected. The NCSC has a small number of highly skilled and experienced security architects who make professional judgements about whether a system design is secure enough to protect against certain threats. The demand for these architects will always be greater than their capacity. Another problem is that as threat-actor tradecraft is always evolving, it’s not possible to have in-depth and current knowledge of all threats. So unless we evolve the approach, mistakes will be inevitable, and systems that could have been defended will be breached.
We need to apply more rigour to the approach to system security assessments, to improve efficacy and repeatability, while also enabling a larger workforce. We also need to be able to apply knowledge of threat actor tradecraft to guide these security assessments, or perhaps even automatically validate designs as being secure against known threats.
Strands or sub-problems
Data science and modelling
It's important that we can map system architectures, component properties and security mitigations in a way that is suitable for automated reasoning. We also need to validate if any given system can defend against threat actor TTPs (tactics, techniques and procedures). We need research and analysis on modelling techniques for this problem, along with real-world examples of their application.
Product assurance
Security products are designed to mitigate certain risks, but which ones do they mitigate, and how well? A codified model here would be another vital component necessary for automated reasoning on overall system security where multiple security products are in use. As well as security products, we also need to consider the security properties of other products or components when securing a wider system: how can we better understand the provenance and supply chains that make up the hardware and software bills of materials for a component in the architecture, and how that affects the confidence we can have in that component?
Socio-technical factors
People using or operating the system also pose or mitigate risks, and it's important to factor this into a model for system security – how can these risks be codified?
Security architecture
This sub-strand is about modelling architectural mitigations rather than security products or configurations: for example, an architecture with more independent layers of defence. This area is also a logical place to be able to apply the above models to consider the different mitigation options – what are the practical tools and assessment methodologies needed to do that?
Measuring software security quality
This is about developing a taxonomy and objective measures for how difficult it's likely to be for security researchers to find vulnerabilities in any given software project, to inform the level of confidence we then have when that software is present in a system architecture. In other words, lack of knowledge of specific vulnerabilities shouldn’t be taken as a clean bill of health, as it may just be that no one has looked for vulnerabilities yet. So can we come up with an objective measure of how easy it might be to find vulnerabilities?
Attack simulation
Breach or attack simulation tools are another method to gain evidence of resilience to defend against an attack. How can we make tooling representative of real-world threat actors and their TTPs?
Risk
Finally, there is a need to bring together all of these strands to support a system owner’s risk decision about whether the defences in place are sufficient. How can we help a system owner make that decision confidently? And how can this more data-driven approach combine with other risk assessment approaches such as system theoretic process analysis (STPA)?
Why this is important
“As one of the people responsible for defending the NCSC’s own technology from cyber attacks, I grapple with making decisions that balance security, usability and cost. Much of this decision hinges on my own judgement and the advice from security advisors, as there is little good-quality data about whether the security mitigations in place on a given system are sufficient to defend against current threats. This research challenge represents the best chance we have to move security assessments from an art to science, helping people like me make more confident decisions that the defences in place are sufficient. Ideally, I’d have data and tooling telling me automatically when we need to take some action because threat actor tradecraft has changed, and my particular systems are now at risk.”
Carolyn A, Chief Engineer, NCSC
Can you propose a solution to this problem?
A purely “data-driven” security assessment isn’t just about adding dashboards—it requires rethinking how systems, threats, and mitigations are *represented*, *measured*, and *reasoned about*. If you don’t formalise those layers first, you’ll just automate subjective judgement at scale, which doesn’t solve the core problem.
Here’s a concrete, research-backed direction that could actually move this from craft to engineering.
---
# 1) Core idea: a **machine-reasonable security model stack**
Build a unified, layered model where:
* **System architecture** is encoded as a graph
* **Threat actor behaviour** is encoded as structured TTPs
* **Mitigations** are encoded as measurable controls
* **Outcomes** are probabilistic (not binary “secure/insecure”)
Think of it as a “digital twin” of system security.
### Key components
### A. System graph model
Represent systems as typed graphs:
* Nodes: services, users, devices, data stores
* Edges: trust relationships, data flows, control flows
* Attributes: configurations, exposure, privileges
This aligns well with:
* attack graphs
* dependency graphs
* zero trust architectures
---
### B. Threat modelling as executable knowledge
Map threat actor tradecraft using structured frameworks like:
* MITRE ATT&CK
Convert TTPs into:
* Preconditions (what must be true)
* Actions (what attacker does)
* Effects (what changes in system state)
This enables **automated reasoning like**:
> “Can TTP X succeed given current architecture?”
---
### C. Control effectiveness modelling
Instead of “control present = good”, define:
* Coverage: which TTPs it mitigates
* Strength: probability of stopping/detecting
* Dependencies: what must work for it to work
For example:
* MFA ≠ “secure login”
* MFA = reduces success probability of credential theft by X%, depending on implementation
This is similar in spirit to:
* NIST Cybersecurity Framework
but extended into quantitative form.
---
# 2) Move from checklists → probabilistic reasoning
Deterministic compliance models don’t reflect reality. Instead:
### Use Bayesian or probabilistic models
* Each attack path has a likelihood
* Each control modifies that likelihood
* Overall system risk = distribution, not a score
This enables:
* “What’s the probability of compromise in 30 days?”
* “Which control reduces risk most per £ spent?”
---
# 3) Data pipeline: what makes it “data-driven”
You need continuous ingestion of:
### A. Threat intelligence
* Observed TTP usage
* Exploit prevalence
* Campaign data
### B. System telemetry
* Logs, detections, incidents
* Vulnerability scan data
* Configuration drift
### C. Product assurance data
* SBOMs / HBOMs
* Known weaknesses
* Vendor trust metrics
---
# 4) Attack simulation that actually reflects reality
Most breach-and-attack simulation tools fail because they:
* use static playbooks
* don’t adapt to defences
Instead:
### Build adversarial simulation agents
* Reinforcement learning attackers
* Goal-driven (data exfiltration, persistence, etc.)
* Constrained by real-world TTPs
These agents:
* explore novel paths
* validate whether your model is realistic
* generate new “unknown but plausible” attacks
---
# 5) Socio-technical modelling (often ignored, but critical)
Purely technical models miss major risks.
Include:
* user susceptibility (phishing, error rates)
* operator behaviour (misconfigurations)
* organisational maturity
Model as:
* probabilities of failure or bypass
* conditional on workload, training, incentives
---
# 6) Measuring software security quality
You’re right to challenge “no vulns found = safe”.
Instead define **exploitability surface metrics**:
* code complexity
* unsafe language usage
* historical vulnerability density
* test coverage
* fuzzing depth
Use ML models trained on:
* historical CVE data
* repo characteristics
Output:
> “Expected vulnerability discovery rate”
---
# 7) Decision layer: turning models into usable outputs
This is where many research efforts fail.
You need to produce:
### A. Actionable outputs
* “Top 5 risk-reducing actions”
* “Controls with highest ROI”
### B. Confidence levels
* where data is weak
* where assumptions dominate
### C. What-if analysis
* “What if we add network segmentation?”
* “What if attacker capability increases?”
---
# 8) Human-in-the-loop (don’t eliminate experts—amplify them)
Full automation is unrealistic.
Instead:
* junior analysts use model outputs
* experts validate edge cases
* feedback loops improve models
This creates **scalable expertise**, not replacement.
---
# 9) Suggested architecture (end-to-end)
1. **Ingest**
* system design (IaC, diagrams → graph)
* threat intel feeds
* product assurance data
2. **Model layer**
* system graph
* TTP library
* control effectiveness database
3. **Reasoning engine**
* attack path generation
* probabilistic simulation
4. **Simulation layer**
* adversarial agents
5. **Output layer**
* risk metrics
* recommendations
* alerts on tradecraft changes
---
# 10) What makes this realistically achievable
This isn’t science fiction—pieces already exist:
* attack graph research
* probabilistic risk modelling
* threat frameworks like ATT&CK
* SBOM standards
* ML on vulnerability prediction
The gap is **integration + standardisation**.
---
# 11) Research priorities (high-impact)
If you were turning this into a research programme, focus on:
### 1. Formal system representation
A standard schema for architectures that supports reasoning
### 2. Quantifying control effectiveness
Currently the weakest link
### 3. TTP → system mapping
Bridging abstract threat intel to concrete systems
### 4. Validation datasets
Ground truth is rare—need shared benchmarks
### 5. Explainability
Security decisions must be auditable
---
# Final thought
The real shift isn’t “more data”—it’s:
> turning security from **expert opinion under uncertainty**
> into **structured probabilistic reasoning with measurable assumptions**
If done right, you don’t replace the NCSC’s scarce experts—you give them leverage over thousands of systems instead of dozens.
---
If you want, I can turn this into a concrete prototype design (e.g., data schemas, algorithms, or even a minimal working architecture).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment