Incident Management for CSPs

Overview

CSPs must have robust incident management processes to detect, respond to, and recover from security incidents.

Learning Objectives

Understand CSP incident response requirements
Implement incident detection mechanisms
Define escalation procedures
Manage customer communication
Apply ISO 27017 incident controls

ISO 27017 Incident Controls

A.16.1.1 - Responsibilities and Procedures

CSP Incident Response Framework:

┌─────────────────────────────────────┐
│    1. Detection & Identification    │
│    - SIEM monitoring                │
│    - Automated alerts               │
│    - Customer reports               │
├─────────────────────────────────────┤
│    2. Triage & Classification       │
│    - Severity assessment            │
│    - Impact analysis                │
│    - Assignment to team             │
├─────────────────────────────────────┤
│    3. Containment                   │
│    - Isolate affected systems       │
│    - Prevent spread                 │
│    - Preserve evidence              │
├─────────────────────────────────────┤
│    4. Eradication                   │
│    - Remove threat                  │
│    - Patch vulnerabilities          │
│    - Restore integrity              │
├─────────────────────────────────────┤
│    5. Recovery                      │
│    - Restore services               │
│    - Verify functionality           │
│    - Monitor for recurrence         │
├─────────────────────────────────────┤
│    6. Post-Incident Review          │
│    - Root cause analysis            │
│    - Lessons learned                │
│    - Process improvement            │
└─────────────────────────────────────┘

Incident Classification

Severity Levels

Level	Criteria	Response Time	Customer Notification
P1 - Critical	Service down, data breach	15 min	Immediate
P2 - High	Major degradation	1 hour	4 hours
P3 - Medium	Minor impact	4 hours	24 hours
P4 - Low	No customer impact	24 hours	As needed

Customer Communication

Notification Template

SECURITY INCIDENT NOTIFICATION

Incident ID: INC-2024-001
Severity: P1 - Critical
Status: Investigating
Date Detected: 2024-01-15 14:30 UTC

SUMMARY:
We are investigating a potential unauthorized access
attempt to infrastructure in the US-East region.

CUSTOMER IMPACT:
- Services remain operational
- No evidence of data access
- Investigation ongoing

ACTIONS TAKEN:
- Isolated affected systems
- Enhanced monitoring activated
- Security team engaged

NEXT UPDATE:
Within 2 hours or sooner if status changes

CONTACT:
[email protected]

Incident Response Team Structure

┌──────────────────────────────────┐
│   Incident Commander             │
│   (Overall coordination)         │
└────────┬─────────────────────────┘
         │
    ┌────┴────────────┬──────────┐
    │                 │          │
┌───▼────┐  ┌────────▼───┐  ┌───▼────────┐
│Security│  │ Operations │  │ Communic.  │
│ Team   │  │   Team     │  │   Team     │
└───┬────┘  └────────┬───┘  └───┬────────┘
    │                │          │
    └────────┬───────┴──────────┘
             │
    ┌────────▼────────────┐
    │  Legal / Compliance │
    └─────────────────────┘

Detection Mechanisms

Automated Monitoring

Security Events:

Failed authentication attempts (threshold: 10/min)
Privilege escalation attempts
Unusual data access patterns
Configuration changes
Network anomalies

SIEM Integration:

// Example alert rule
{
  "rule": "Multiple failed logins",
  "condition": "failed_logins > 10 in 5 minutes",
  "action": "create_incident",
  "severity": "high",
  "notify": ["[email protected]"]
}

Evidence Collection

A.16.1.7 - Collection of Evidence

Forensic Procedures:

Preserve logs and system state
Create forensic images (if applicable)
Document timeline
Chain of custody
Legal hold procedures

Post-Incident Activities

Root Cause Analysis

5 Whys Analysis Example

Incident: Unauthorized API access

Why 1: API key was compromised
Why 2: Key was committed to public GitHub repo
Why 3: Developer wasn't aware of best practices
Why 4: Security training was outdated
Why 5: Training program lacked cloud-specific content

ROOT CAUSE: Inadequate cloud security training

CORRECTIVE ACTIONS:
1. Update security training (immediate)
2. Implement secret scanning in CI/CD
3. Rotate all API keys
4. Conduct security awareness campaign

Key Takeaways

Rapid detection and response are critical
Clear severity classification guides response
Customer communication must be timely
Evidence collection supports investigation
Post-incident review drives improvement
Automation enhances detection capabilities

Self-Assessment

What are the six phases of incident response?
What is a P1 incident?
When should customers be notified?
What is the purpose of root cause analysis?
What evidence should be collected?