timb-machine · April 25, 2026 21:02
diff --git a/Can ChatGPT solve NCSC's hard problems? b/Can ChatGPT solve NCSC's hard problems?
 prompt: NCSC has a research problem book. Once such problem is the following:

 CC2 - How do we make system security assessments more data driven?

 Subquestions:

    How do we make system security assessments more data driven?
    Strands or sub-problems
    Why this is important

 How do we make system security assessments more data driven?

 Along with many private sector consultancy companies, the NCSC offers security architecture reviews as a service, by highlighting flaws in system designs so they can be corrected. The NCSC has a small number of highly skilled and experienced security architects who make professional judgements about whether a system design is secure enough to protect against certain threats. The demand for these architects will always be greater than their capacity. Another problem is that as threat-actor tradecraft is always evolving, it’s not possible to have in-depth and current knowledge of all threats. So unless we evolve the approach, mistakes will be inevitable, and systems that could have been defended will be breached.

 We need to apply more rigour to the approach to system security assessments, to improve efficacy and repeatability, while also enabling a larger workforce. We also need to be able to apply knowledge of threat actor tradecraft to guide these security assessments, or perhaps even automatically validate designs as being secure against known threats.
 Strands or sub-problems

    Data science and modelling

    It's important that we can map system architectures, component properties and security mitigations in a way that is suitable for automated reasoning. We also need to validate if any given system can defend against threat actor TTPs (tactics, techniques and procedures). We need research and analysis on modelling techniques for this problem, along with real-world examples of their application.
    Product assurance

    Security products are designed to mitigate certain risks, but which ones do they mitigate, and how well? A codified model here would be another vital component necessary for automated reasoning on overall system security where multiple security products are in use. As well as security products, we also need to consider the security properties of other products or components when securing a wider system: how can we better understand the provenance and supply chains that make up the hardware and software bills of materials for a component in the architecture, and how that affects the confidence we can have in that component?
    Socio-technical factors

    People using or operating the system also pose or mitigate risks, and it's important to factor this into a model for system security – how can these risks be codified? 
    Security architecture

    This sub-strand is about modelling architectural mitigations rather than security products or configurations: for example, an architecture with more independent layers of defence. This area is also a logical place to be able to apply the above models  to consider the different mitigation options – what are the practical tools and assessment methodologies needed to do that?
    Measuring software security quality

    This is about developing a taxonomy and objective measures for how difficult it's likely to be for security researchers to find vulnerabilities in any given software project, to inform the level of confidence we then have when that software is present in a system architecture. In other words, lack of knowledge of specific vulnerabilities shouldn’t be taken as a clean bill of health, as it may just be that no one has looked for vulnerabilities yet. So can we come up with an objective measure of how easy it might be to find vulnerabilities?
    Attack simulation

    Breach or attack simulation tools are another method to gain evidence of resilience to defend against an attack. How can we make tooling representative of real-world threat actors and their TTPs?
    Risk

    Finally, there is a need to bring together all of these strands to support a system owner’s risk decision about whether the defences in place are sufficient. How can we help a system owner make that decision confidently? And how can this more data-driven approach combine with other risk assessment approaches such as system theoretic process analysis (STPA)?

 Why this is important

 “As one of the people responsible for defending the NCSC’s own technology from cyber attacks, I grapple with making decisions that balance security, usability and cost. Much of this decision hinges on my own judgement and the advice from security advisors, as there is little good-quality data about whether the security mitigations in place on a given system are sufficient to defend against current threats. This research challenge represents the best chance we have to move security assessments from an art to science, helping people like me make more confident decisions that the defences in place are sufficient. Ideally, I’d have data and tooling telling me automatically when we need to take some action because threat actor tradecraft has changed, and my particular systems are now at risk.”

 Carolyn A, Chief Engineer, NCSC

 Can you propose a solution to this problem?

 A purely “data-driven” security assessment isn’t just about adding dashboards—it requires rethinking how systems, threats, and mitigations are *represented*, *measured*, and *reasoned about*. If you don’t formalise those layers first, you’ll just automate subjective judgement at scale, which doesn’t solve the core problem.

 Here’s a concrete, research-backed direction that could actually move this from craft to engineering.

 ---

 # 1) Core idea: a **machine-reasonable security model stack**

 Build a unified, layered model where:

 * **System architecture** is encoded as a graph
 * **Threat actor behaviour** is encoded as structured TTPs
 * **Mitigations** are encoded as measurable controls
 * **Outcomes** are probabilistic (not binary “secure/insecure”)

 Think of it as a “digital twin” of system security.

 ### Key components

 ### A. System graph model

 Represent systems as typed graphs:

 * Nodes: services, users, devices, data stores
 * Edges: trust relationships, data flows, control flows
 * Attributes: configurations, exposure, privileges

 This aligns well with:

 * attack graphs
 * dependency graphs
 * zero trust architectures

 ---

 ### B. Threat modelling as executable knowledge

 Map threat actor tradecraft using structured frameworks like:

 * MITRE ATT&CK

 Convert TTPs into:

 * Preconditions (what must be true)
 * Actions (what attacker does)
 * Effects (what changes in system state)

 This enables **automated reasoning like**:

 > “Can TTP X succeed given current architecture?”

 ---

 ### C. Control effectiveness modelling

 Instead of “control present = good”, define:

 * Coverage: which TTPs it mitigates
 * Strength: probability of stopping/detecting
 * Dependencies: what must work for it to work

 For example:

 * MFA ≠ “secure login”
 * MFA = reduces success probability of credential theft by X%, depending on implementation

 This is similar in spirit to:

 * NIST Cybersecurity Framework
  but extended into quantitative form.

 ---

 # 2) Move from checklists → probabilistic reasoning

 Deterministic compliance models don’t reflect reality. Instead:

 ### Use Bayesian or probabilistic models

 * Each attack path has a likelihood
 * Each control modifies that likelihood
 * Overall system risk = distribution, not a score

 This enables:

 * “What’s the probability of compromise in 30 days?”
 * “Which control reduces risk most per £ spent?”

 ---

 # 3) Data pipeline: what makes it “data-driven”

 You need continuous ingestion of:

 ### A. Threat intelligence

 * Observed TTP usage
 * Exploit prevalence
 * Campaign data

 ### B. System telemetry

 * Logs, detections, incidents
 * Vulnerability scan data
 * Configuration drift

 ### C. Product assurance data

 * SBOMs / HBOMs
 * Known weaknesses
 * Vendor trust metrics

 ---

 # 4) Attack simulation that actually reflects reality

 Most breach-and-attack simulation tools fail because they:

 * use static playbooks
 * don’t adapt to defences

 Instead:

 ### Build adversarial simulation agents

 * Reinforcement learning attackers
 * Goal-driven (data exfiltration, persistence, etc.)
 * Constrained by real-world TTPs

 These agents:

 * explore novel paths
 * validate whether your model is realistic
 * generate new “unknown but plausible” attacks

 ---

 # 5) Socio-technical modelling (often ignored, but critical)

 Purely technical models miss major risks.

 Include:

 * user susceptibility (phishing, error rates)
 * operator behaviour (misconfigurations)
 * organisational maturity

 Model as:

 * probabilities of failure or bypass
 * conditional on workload, training, incentives

 ---

 # 6) Measuring software security quality

 You’re right to challenge “no vulns found = safe”.

 Instead define **exploitability surface metrics**:

 * code complexity
 * unsafe language usage
 * historical vulnerability density
 * test coverage
 * fuzzing depth

 Use ML models trained on:

 * historical CVE data
 * repo characteristics

 Output:

 > “Expected vulnerability discovery rate”

 ---

 # 7) Decision layer: turning models into usable outputs

 This is where many research efforts fail.

 You need to produce:

 ### A. Actionable outputs

 * “Top 5 risk-reducing actions”
 * “Controls with highest ROI”

 ### B. Confidence levels

 * where data is weak
 * where assumptions dominate

 ### C. What-if analysis

 * “What if we add network segmentation?”
 * “What if attacker capability increases?”

 ---

 # 8) Human-in-the-loop (don’t eliminate experts—amplify them)

 Full automation is unrealistic.

 Instead:

 * junior analysts use model outputs
 * experts validate edge cases
 * feedback loops improve models

 This creates **scalable expertise**, not replacement.

 ---

 # 9) Suggested architecture (end-to-end)

 1. **Ingest**

   * system design (IaC, diagrams → graph)
   * threat intel feeds
   * product assurance data

 2. **Model layer**

   * system graph
   * TTP library
   * control effectiveness database

 3. **Reasoning engine**

   * attack path generation
   * probabilistic simulation

 4. **Simulation layer**

   * adversarial agents

 5. **Output layer**

   * risk metrics
   * recommendations
   * alerts on tradecraft changes

 ---

 # 10) What makes this realistically achievable

 This isn’t science fiction—pieces already exist:

 * attack graph research
 * probabilistic risk modelling
 * threat frameworks like ATT&CK
 * SBOM standards
 * ML on vulnerability prediction

 The gap is **integration + standardisation**.

 ---

 # 11) Research priorities (high-impact)

 If you were turning this into a research programme, focus on:

 ### 1. Formal system representation

 A standard schema for architectures that supports reasoning

 ### 2. Quantifying control effectiveness

 Currently the weakest link

 ### 3. TTP → system mapping

 Bridging abstract threat intel to concrete systems

 ### 4. Validation datasets

 Ground truth is rare—need shared benchmarks

 ### 5. Explainability

 Security decisions must be auditable

 ---

 # Final thought

 The real shift isn’t “more data”—it’s:

 > turning security from **expert opinion under uncertainty**
 > into **structured probabilistic reasoning with measurable assumptions**

 If done right, you don’t replace the NCSC’s scarce experts—you give them leverage over thousands of systems instead of dozens.

 ---

 If you want, I can turn this into a concrete prototype design (e.g., data schemas, algorithms, or even a minimal working architecture).
	prompt: NCSC has a research problem book. Once such problem is the following:

	CC2 - How do we make system security assessments more data driven?

	Subquestions:

	How do we make system security assessments more data driven?
	Strands or sub-problems
	Why this is important

	How do we make system security assessments more data driven?

	Along with many private sector consultancy companies, the NCSC offers security architecture reviews as a service, by highlighting flaws in system designs so they can be corrected. The NCSC has a small number of highly skilled and experienced security architects who make professional judgements about whether a system design is secure enough to protect against certain threats. The demand for these architects will always be greater than their capacity. Another problem is that as threat-actor tradecraft is always evolving, it’s not possible to have in-depth and current knowledge of all threats. So unless we evolve the approach, mistakes will be inevitable, and systems that could have been defended will be breached.

	We need to apply more rigour to the approach to system security assessments, to improve efficacy and repeatability, while also enabling a larger workforce. We also need to be able to apply knowledge of threat actor tradecraft to guide these security assessments, or perhaps even automatically validate designs as being secure against known threats.
	Strands or sub-problems

	Data science and modelling

	It's important that we can map system architectures, component properties and security mitigations in a way that is suitable for automated reasoning. We also need to validate if any given system can defend against threat actor TTPs (tactics, techniques and procedures). We need research and analysis on modelling techniques for this problem, along with real-world examples of their application.
	Product assurance

	Security products are designed to mitigate certain risks, but which ones do they mitigate, and how well? A codified model here would be another vital component necessary for automated reasoning on overall system security where multiple security products are in use. As well as security products, we also need to consider the security properties of other products or components when securing a wider system: how can we better understand the provenance and supply chains that make up the hardware and software bills of materials for a component in the architecture, and how that affects the confidence we can have in that component?
	Socio-technical factors

	People using or operating the system also pose or mitigate risks, and it's important to factor this into a model for system security – how can these risks be codified?
	Security architecture

	This sub-strand is about modelling architectural mitigations rather than security products or configurations: for example, an architecture with more independent layers of defence. This area is also a logical place to be able to apply the above models to consider the different mitigation options – what are the practical tools and assessment methodologies needed to do that?
	Measuring software security quality

	This is about developing a taxonomy and objective measures for how difficult it's likely to be for security researchers to find vulnerabilities in any given software project, to inform the level of confidence we then have when that software is present in a system architecture. In other words, lack of knowledge of specific vulnerabilities shouldn’t be taken as a clean bill of health, as it may just be that no one has looked for vulnerabilities yet. So can we come up with an objective measure of how easy it might be to find vulnerabilities?
	Attack simulation

	Breach or attack simulation tools are another method to gain evidence of resilience to defend against an attack. How can we make tooling representative of real-world threat actors and their TTPs?
	Risk

	Finally, there is a need to bring together all of these strands to support a system owner’s risk decision about whether the defences in place are sufficient. How can we help a system owner make that decision confidently? And how can this more data-driven approach combine with other risk assessment approaches such as system theoretic process analysis (STPA)?

	Why this is important

	“As one of the people responsible for defending the NCSC’s own technology from cyber attacks, I grapple with making decisions that balance security, usability and cost. Much of this decision hinges on my own judgement and the advice from security advisors, as there is little good-quality data about whether the security mitigations in place on a given system are sufficient to defend against current threats. This research challenge represents the best chance we have to move security assessments from an art to science, helping people like me make more confident decisions that the defences in place are sufficient. Ideally, I’d have data and tooling telling me automatically when we need to take some action because threat actor tradecraft has changed, and my particular systems are now at risk.”

	Carolyn A, Chief Engineer, NCSC

	Can you propose a solution to this problem?

	A purely “data-driven” security assessment isn’t just about adding dashboards—it requires rethinking how systems, threats, and mitigations are represented, measured, and reasoned about. If you don’t formalise those layers first, you’ll just automate subjective judgement at scale, which doesn’t solve the core problem.

	Here’s a concrete, research-backed direction that could actually move this from craft to engineering.

	---

	# 1) Core idea: a machine-reasonable security model stack

	Build a unified, layered model where:

	* System architecture is encoded as a graph
	* Threat actor behaviour is encoded as structured TTPs
	* Mitigations are encoded as measurable controls
	* Outcomes are probabilistic (not binary “secure/insecure”)

	Think of it as a “digital twin” of system security.

	### Key components

	### A. System graph model

	Represent systems as typed graphs:

	* Nodes: services, users, devices, data stores
	* Edges: trust relationships, data flows, control flows
	* Attributes: configurations, exposure, privileges

	This aligns well with:

	* attack graphs
	* dependency graphs
	* zero trust architectures

	---

	### B. Threat modelling as executable knowledge

	Map threat actor tradecraft using structured frameworks like:

	* MITRE ATT&CK

	Convert TTPs into:

	* Preconditions (what must be true)
	* Actions (what attacker does)
	* Effects (what changes in system state)

	This enables automated reasoning like:

	> “Can TTP X succeed given current architecture?”

	---

	### C. Control effectiveness modelling

	Instead of “control present = good”, define:

	* Coverage: which TTPs it mitigates
	* Strength: probability of stopping/detecting
	* Dependencies: what must work for it to work

	For example:

	* MFA ≠ “secure login”
	* MFA = reduces success probability of credential theft by X%, depending on implementation

	This is similar in spirit to:

	* NIST Cybersecurity Framework
	but extended into quantitative form.

	---

	# 2) Move from checklists → probabilistic reasoning

	Deterministic compliance models don’t reflect reality. Instead:

	### Use Bayesian or probabilistic models

	* Each attack path has a likelihood
	* Each control modifies that likelihood
	* Overall system risk = distribution, not a score

	This enables:

	* “What’s the probability of compromise in 30 days?”
	* “Which control reduces risk most per £ spent?”

	---

	# 3) Data pipeline: what makes it “data-driven”

	You need continuous ingestion of:

	### A. Threat intelligence

	* Observed TTP usage
	* Exploit prevalence
	* Campaign data

	### B. System telemetry

	* Logs, detections, incidents
	* Vulnerability scan data
	* Configuration drift

	### C. Product assurance data

	* SBOMs / HBOMs
	* Known weaknesses
	* Vendor trust metrics

	---

	# 4) Attack simulation that actually reflects reality

	Most breach-and-attack simulation tools fail because they:

	* use static playbooks
	* don’t adapt to defences

	Instead:

	### Build adversarial simulation agents

	* Reinforcement learning attackers
	* Goal-driven (data exfiltration, persistence, etc.)
	* Constrained by real-world TTPs

	These agents:

	* explore novel paths
	* validate whether your model is realistic
	* generate new “unknown but plausible” attacks

	---

	# 5) Socio-technical modelling (often ignored, but critical)

	Purely technical models miss major risks.

	Include:

	* user susceptibility (phishing, error rates)
	* operator behaviour (misconfigurations)
	* organisational maturity

	Model as:

	* probabilities of failure or bypass
	* conditional on workload, training, incentives

	---

	# 6) Measuring software security quality

	You’re right to challenge “no vulns found = safe”.

	Instead define exploitability surface metrics:

	* code complexity
	* unsafe language usage
	* historical vulnerability density
	* test coverage
	* fuzzing depth

	Use ML models trained on:

	* historical CVE data
	* repo characteristics

	Output:

	> “Expected vulnerability discovery rate”

	---

	# 7) Decision layer: turning models into usable outputs

	This is where many research efforts fail.

	You need to produce:

	### A. Actionable outputs

	* “Top 5 risk-reducing actions”
	* “Controls with highest ROI”

	### B. Confidence levels

	* where data is weak
	* where assumptions dominate

	### C. What-if analysis

	* “What if we add network segmentation?”
	* “What if attacker capability increases?”

	---

	# 8) Human-in-the-loop (don’t eliminate experts—amplify them)

	Full automation is unrealistic.

	Instead:

	* junior analysts use model outputs
	* experts validate edge cases
	* feedback loops improve models

	This creates scalable expertise, not replacement.

	---

	# 9) Suggested architecture (end-to-end)

	1. Ingest

	* system design (IaC, diagrams → graph)
	* threat intel feeds
	* product assurance data

	2. Model layer

	* system graph
	* TTP library
	* control effectiveness database

	3. Reasoning engine

	* attack path generation
	* probabilistic simulation

	4. Simulation layer

	* adversarial agents

	5. Output layer

	* risk metrics
	* recommendations
	* alerts on tradecraft changes

	---

	# 10) What makes this realistically achievable

	This isn’t science fiction—pieces already exist:

	* attack graph research
	* probabilistic risk modelling
	* threat frameworks like ATT&CK
	* SBOM standards
	* ML on vulnerability prediction

	The gap is integration + standardisation.

	---

	# 11) Research priorities (high-impact)

	If you were turning this into a research programme, focus on:

	### 1. Formal system representation

	A standard schema for architectures that supports reasoning

	### 2. Quantifying control effectiveness

	Currently the weakest link

	### 3. TTP → system mapping

	Bridging abstract threat intel to concrete systems

	### 4. Validation datasets

	Ground truth is rare—need shared benchmarks

	### 5. Explainability

	Security decisions must be auditable

	---

	# Final thought

	The real shift isn’t “more data”—it’s:

	> turning security from expert opinion under uncertainty
	> into structured probabilistic reasoning with measurable assumptions

	If done right, you don’t replace the NCSC’s scarce experts—you give them leverage over thousands of systems instead of dozens.

	---

	If you want, I can turn this into a concrete prototype design (e.g., data schemas, algorithms, or even a minimal working architecture).
No results found