Troubleshooting Guide¶
This guide helps you diagnose and resolve common issues with ZeroTrustKerberosLink. Each section focuses on a specific area where problems might occur.
Table of Contents¶
- Authentication Issues
- AWS Integration Issues
- Configuration Issues
- Performance Issues
- Security Issues
- Logging and Monitoring Issues
- Deployment Issues
Authentication Issues¶
Kerberos Authentication Failures¶
Symptoms: - Error message: "Kerberos authentication failed" - HTTP 401 Unauthorized responses - Client unable to obtain service tickets
Common Causes and Solutions:
- Invalid Keytab File
- Verify keytab file exists and has correct permissions:
ls -la /etc/zerotrustkerberos/krb5.keytab # Should show: -r-------- 1 zerotrustkerberos zerotrustkerberos
-
Check keytab content:
klist -kt /etc/zerotrustkerberos/krb5.keytab
-
Clock Skew
- Verify time synchronization:
chronyc tracking # or ntpq -p
-
Ensure time difference is less than 5 minutes between client, KDC, and ZeroTrustKerberosLink
-
Incorrect Service Principal
-
Confirm service principal in configuration matches keytab:
grep service_principal /etc/zerotrustkerberos/config.yaml klist -kt /etc/zerotrustkerberos/krb5.keytab | grep HTTP
-
Network Connectivity Issues
- Test connectivity to Kerberos KDC:
telnet kdc.example.com 88
User Authentication Failures¶
Symptoms: - Users can't authenticate despite valid Kerberos tickets - Error message: "User not authorized"
Common Causes and Solutions:
- Missing Role Mapping
- Check role mappings in configuration:
grep -A 20 role_mappings /etc/zerotrustkerberos/config.yaml
-
Add appropriate mapping for user principal
-
Principal Format Mismatch
-
Ensure principal format matches exactly:
# Configuration: admin@EXAMPLE.COM # User ticket: Admin@EXAMPLE.COM (case mismatch)
-
Context Evaluation Failure
- Check context evaluation logs:
grep "context evaluation" /var/log/zerotrustkerberos/app.log
- Verify user meets all context requirements (IP, time, etc.)
AWS Integration Issues¶
Role Assumption Failures¶
Symptoms: - Error message: "Failed to assume role" - HTTP 403 Forbidden responses
Common Causes and Solutions:
- Invalid IAM Role ARN
- Verify role exists:
aws iam get-role --role-name AdminRole
-
Check role ARN format in configuration
-
Trust Relationship Issues
- Examine trust policy:
aws iam get-role --role-name AdminRole --query 'Role.AssumeRolePolicyDocument'
-
Update trust policy to allow ZeroTrustKerberosLink to assume the role
-
Permission Issues
- Verify ZeroTrustKerberosLink has sts:AssumeRole permission:
aws iam simulate-principal-policy \ --policy-source-arn arn:aws:iam::123456789012:role/ZeroTrustKerberosLinkRole \ --action-names sts:AssumeRole
AWS Credential Issues¶
Symptoms: - Error message: "Unable to retrieve AWS credentials" - AWS API calls fail after authentication
Common Causes and Solutions:
- AWS Region Configuration
- Verify AWS region in configuration:
grep region /etc/zerotrustkerberos/config.yaml
-
Ensure region is valid and accessible
-
AWS API Throttling
- Check for throttling errors in logs:
grep "throttling" /var/log/zerotrustkerberos/app.log
-
Implement exponential backoff for API calls
-
Temporary Credential Expiration
- Verify session duration settings:
grep -A 5 session /etc/zerotrustkerberos/config.yaml
- Adjust session duration as needed (max 12 hours)
Configuration Issues¶
Syntax Errors¶
Symptoms: - Application fails to start - Error message: "Invalid configuration"
Common Causes and Solutions:
- YAML Syntax Errors
- Validate YAML syntax:
yamllint /etc/zerotrustkerberos/config.yaml
-
Fix indentation and formatting issues
-
Missing Required Fields
- Check for all required configuration fields:
zerotrustkerberos-cli validate-config --config /etc/zerotrustkerberos/config.yaml
-
Add missing required fields
-
Invalid Values
- Verify all configuration values are valid:
zerotrustkerberos-cli validate-config --strict --config /etc/zerotrustkerberos/config.yaml
- Correct any invalid values
Environment Variable Issues¶
Symptoms: - Configuration doesn't match expected values - Overrides not taking effect
Common Causes and Solutions:
- Case Sensitivity
-
Environment variables are case-sensitive:
# Correct: ZEROTRUST_SERVER_PORT=8443 # Incorrect: zerotrust_server_port=8443
-
Variable Format
-
Check variable format:
# Correct: ZEROTRUST_REDIS_HOST=redis.example.com # Incorrect: ZEROTRUST_REDIS.HOST=redis.example.com
-
Variable Precedence
- Understand precedence order:
- Command-line arguments
- Environment variables
- Configuration file
- Default values
Performance Issues¶
High Latency¶
Symptoms: - Authentication takes more than 1 second - AWS role assumption is slow
Common Causes and Solutions:
- Insufficient Resources
- Check CPU and memory usage:
top -p $(pgrep -f zerotrustkerberos)
-
Increase allocated resources
-
Network Latency
- Test network latency to AWS and KDC:
ping kdc.example.com ping sts.amazonaws.com
-
Consider deploying closer to AWS region
-
Inefficient Caching
- Enable or optimize Redis caching:
cache: enabled: true ttl: 300 # seconds
Connection Pooling Issues¶
Symptoms: - Increasing latency under load - Connection-related errors
Common Causes and Solutions:
- Insufficient Connection Pool Size
-
Increase connection pool size:
aws: connection_pool_size: 20
-
Connection Leaks
- Check for connection leaks in logs:
grep "connection" /var/log/zerotrustkerberos/app.log
-
Implement proper connection closing
-
Connection Timeouts
- Adjust timeout settings:
aws: timeout: 5 # seconds
Security Issues¶
TLS Configuration Issues¶
Symptoms: - TLS handshake failures - Certificate validation errors
Common Causes and Solutions:
- Invalid Certificate
- Verify certificate validity:
openssl x509 -in /etc/zerotrustkerberos/tls/certificate.crt -text -noout
-
Ensure certificate is not expired
-
Cipher Suite Compatibility
- Check cipher suite configuration:
grep -A 5 tls /etc/zerotrustkerberos/config.yaml
-
Update cipher suites for better compatibility
-
TLS Version Issues
- Verify minimum TLS version:
server: tls: min_version: "TLSv1.2"
- Update clients to support required TLS version
Security Header Issues¶
Symptoms: - Security scanners report missing headers - Browser security warnings
Common Causes and Solutions:
- Missing Headers
- Check header configuration:
grep -A 10 headers /etc/zerotrustkerberos/config.yaml
-
Add missing security headers
-
Invalid Header Values
- Verify header values:
curl -I https://zerotrustkerberos.example.com/auth/test
-
Correct invalid header values
-
CSP Configuration
- Test Content Security Policy:
curl -I https://zerotrustkerberos.example.com/auth/test | grep Content-Security-Policy
- Adjust CSP for required functionality
Logging and Monitoring Issues¶
Missing Logs¶
Symptoms: - Events not appearing in logs - Incomplete audit trail
Common Causes and Solutions:
- Log Level Too High
- Check log level configuration:
grep level /etc/zerotrustkerberos/config.yaml
-
Set appropriate log level (debug, info, warn, error)
-
Log Destination Issues
- Verify log destination:
grep output /etc/zerotrustkerberos/config.yaml
-
Ensure log destination is writable
-
Log Rotation Issues
- Check log rotation configuration:
cat /etc/logrotate.d/zerotrustkerberos
- Configure proper log rotation
Monitoring Integration Issues¶
Symptoms: - Metrics not appearing in monitoring system - Alerts not triggering
Common Causes and Solutions:
- Prometheus Endpoint Issues
- Verify Prometheus endpoint:
curl http://localhost:8080/metrics
-
Check Prometheus scrape configuration
-
Metric Format Issues
- Validate metric format:
curl http://localhost:8080/metrics | grep auth_requests
-
Correct any metric format issues
-
Alert Configuration Issues
- Review alert configuration:
grep -A 10 alerts /etc/zerotrustkerberos/config.yaml
- Adjust alert thresholds and conditions
Deployment Issues¶
Docker Deployment Issues¶
Symptoms: - Container fails to start - Container exits unexpectedly
Common Causes and Solutions:
- Volume Mount Issues
- Check volume mounts:
docker inspect zerotrustkerberos | grep -A 10 Mounts
-
Correct volume mount paths and permissions
-
Environment Variable Issues
- Verify environment variables:
docker inspect zerotrustkerberos | grep -A 20 Env
-
Set missing or correct invalid environment variables
-
Container Resource Limits
- Check resource limits:
docker stats zerotrustkerberos
- Adjust CPU and memory limits as needed
Kubernetes Deployment Issues¶
Symptoms: - Pods fail to start - Pods crash loop
Common Causes and Solutions:
- ConfigMap Issues
- Verify ConfigMap:
kubectl describe configmap zerotrustkerberos-config -n zerotrustkerberos
-
Update ConfigMap with correct configuration
-
Secret Issues
- Check Secret:
kubectl describe secret zerotrustkerberos-secrets -n zerotrustkerberos
-
Ensure Secret contains required data
-
Resource Constraints
- Examine pod resources:
kubectl describe pod -l app=zerotrustkerberos -n zerotrustkerberos
- Adjust resource requests and limits
Diagnostic Tools¶
ZeroTrustKerberosLink includes several built-in diagnostic tools:
Health Check¶
curl http://localhost:8080/health
The health check returns detailed information about: - Service status - Component health - Resource utilization - Recent errors
Configuration Validator¶
zerotrustkerberos-cli validate-config --config /etc/zerotrustkerberos/config.yaml
The validator checks: - Configuration syntax - Required fields - Value constraints - Security best practices
Connectivity Tester¶
zerotrustkerberos-cli test-connectivity
Tests connectivity to: - Kerberos KDC - AWS services - Redis cache - Monitoring endpoints
Getting Help¶
If you're unable to resolve an issue using this guide:
- Check Documentation
- Review the Implementation Guide
-
Check the FAQ
-
Search Known Issues
- Visit the GitHub Issues
-
Search for similar problems
-
Contact Support
- Email: support@zerotrustkerberos.com
- Include:
- Detailed description of the issue
- Configuration files (with sensitive data removed)
- Relevant log excerpts
- Steps to reproduce the issue