@@ -1081,3 +1081,109 @@ CloudFormation.
10811081- Stack Update Fails:
10821082 - The stack automatically rolls back to the previous known working state
10831083 - Ability to see in the log what happened and error messages
1084+
1085+ --------------------------------------
1086+
1087+ # CloudWatch
1088+
1089+ CloudWatch is used for monitoring.
1090+
1091+ ### Why is monitoring important?
1092+ - To deploy applications
1093+ - Safely
1094+ - Automatically
1095+ - Using Infrastructure as Code
1096+ - Leveraging AWS components
1097+ - Because applications are deployed, and users don’t care what services we've used
1098+ - Users only care that the application is working!
1099+ - Application latency: will it increase over time?
1100+ - Application outages: customer experience should not be degraded
1101+ - Users contacting the IT department or complaining is not a good outcome
1102+ - Troubleshooting and remediation
1103+ - Internal monitoring:
1104+ - Can we prevent issues before they happen?
1105+ - Performance and Cost
1106+ - Trends (scaling patterns)
1107+ - Learning and Improvement
1108+
1109+ ### Monitoring in AWS
1110+ - AWS CloudWatch:
1111+ - Metrics: Collect and track key metrics
1112+ - Logs: Collect, monitor, analyze and store log files
1113+ - Events: Send notifications when certain events happen in your AWS • Alarms: React in real-time to metrics / events
1114+ - AWS X-Ray:
1115+ - Troubleshooting application performance and errors
1116+ - Distributed tracing of microservices
1117+ - AWS CloudTrail:
1118+ - Internal monitoring of API calls being made
1119+ - Audit changes to AWS Resources by your users
1120+
1121+ ### CloudWatch Metrics
1122+ - CloudWatch provides metrics for every services in AWS
1123+ - ** Metric** is a variable to monitor (CPUUtilization, NetworkIn...)
1124+ - Metrics belong to ** namespaces**
1125+ - ** Dimension** is an attribute of a metric (instance id, environment, etc...).
1126+ - Up to 10 dimensions per metric
1127+ - Metrics have ** timestamps**
1128+ - Can create CloudWatch dashboards of metrics
1129+
1130+ ### CloudWatch EC2 Detailed monitoring
1131+ - EC2 instance metrics have metrics “every 5 minutes”
1132+ - With detailed monitoring (for a cost), you get data “every 1 minute”
1133+ - Use detailed monitoring if you want to more prompt scale your ASG!
1134+ - The AWS Free Tier allows us to have 10 detailed monitoring metrics
1135+ - ** Note:** EC2 Memory usage is by default not pushed (must be pushed
1136+ from inside the instance as a custom metric)
1137+
1138+ ### AWS CloudWatch Custom Metrics
1139+ - Possibility to define and send your own custom metrics to CloudWatch
1140+ - Ability to use dimensions (attributes) to segment metrics
1141+ - Instance.id
1142+ - Environment.name
1143+ - Metric resolution:
1144+ - Standard: 1 minute
1145+ - High Resolution: up to 1 second (StorageResolution API parameter - Higher cost)
1146+ - Use API call ** PutMetricData**
1147+ - Use exponential back off in case of throttle errors
1148+
1149+ ### Alarms are used to trigger notifications for any metric
1150+ - Alarms can go to Auto Scaling, EC2 Actions, SNS notifications
1151+ - Various options (sampling, %, max, min, etc...)
1152+ - Alarm States:
1153+ - OK
1154+ - INSUFFICIENT_DATA
1155+ - ALARM
1156+ - Period:
1157+ - Length of time in seconds to evaluate the metric
1158+ - High resolution custom metrics: can only choose 10 sec or 30 sec
1159+
1160+ ### AWS CloudWatch Logs
1161+ - Applications can send logs to CloudWatch using the SDK
1162+ - CloudWatch can collect log from:
1163+ - Elastic Beanstalk: collection of logs from application
1164+ - ECS: collection from containers
1165+ - AWS Lambda: collection from function logs
1166+ - VPC Flow Logs: VPC specific logs
1167+ - API Gateway
1168+ - CloudTrail based on filter
1169+ - CloudWatch log agents: for example on EC2 machines
1170+ - Route53: Log DNS queries
1171+ - CloudWatch Logs can go to:
1172+ - Batch exporter to S3 for archival
1173+ - Stream to ElasticSearch cluster for further analytics
1174+ - CloudWatch Logs can use filter expressions
1175+ - Logs storage architecture:
1176+ - Log groups: arbitrary name, usually representing an application. Log expiration policy should be defineda at this level.
1177+ - Log stream: instances within application / log files / containers
1178+ - Can define log expiration policies (never expire, 30 days, etc..)
1179+ - Using the AWS CLI we can tail CloudWatch logs
1180+ - To send logs to CloudWatch, make sure IAM permissions are correct!
1181+ - Security: encryption of logs using KMS at the Group Level
1182+
1183+ ### AWS CloudWatch Events
1184+ - Schedule: Cron jobs
1185+ - Event Pattern: Event rules to react to a service doing something
1186+ - Ex: CodePipeline state changes
1187+ - Triggers to Lambda functions, SQS/SNS/Kinesis Messages
1188+ - CloudWatch Event creates a small JSON document to give information
1189+ about the change
0 commit comments