Monitoring
Alarms
There are several alarms setup in CloudWatch to monitor the health of the PASS system. Triggered and cleared alarms use an SNS topic to send email to PASS devops. The following are a subset of Alarms to monitor the PASS application:
Relational Database Service (RDS)
CPUUtilization - The CPU Utilization of the RDS instance. Normally will alarm if this metric is higher than some threshold for an amount of time.
Elastic Compute Cloud (EC2)
CPUUtilization - The percentage of physical CPU time that Amazon EC2 uses to run the EC2 instance, which includes time spent to run both the user code and the Amazon EC2 code.
CloudWatch Log Errors - Monitor will alarm if an ERROR log entry appears in PASS logs in cloudwatch.
Elastic Cloud Service (ECS)
MemoryUtilized - The memory being used by tasks in the resource that is specified by the dimension set that you're using.
CpuUtilized - The CPU units used by tasks in the resource that is specified by the dimension set that you're using.
NetworkTxBytes - The number of bytes transmitted by the resource that is specified by the dimensions that you're using. This metric is obtained from the Docker runtime.
CloudWatch Log Errors - Monitor will alarm if an ERROR log entry appears in PASS logs in cloudwatch.
Application Load Balancer (ALB)
HTTPCode_ELB_4XX_Count - The number of HTTP 4XX client error codes that originate from the load balancer.
TargetResponseTime - The time elapsed, in seconds, after the request leaves the load balancer until the target starts to send the response headers.
UnHealthyHostCount - The number of unhealthy instances registered with your load balancer. An instance is considered unhealthy after it exceeds the unhealthy threshold configured for health checks.
Simple Queue Service (SQS)
NumberOfMessagesSent - The number of messages added to a queue.
ApproximateAgeOfOldestMessage - The approximate age of the oldest non-deleted message in the queue.
Simple Notification Service (SNS)
NumberOfNotificationsFailed - The number of messages that Amazon SNS failed to deliver.
Synthetic Canary
Logs
CloudWatch Logs are used for logging from all PASS Docker containers. The Access Logs have been enabled on the Public ALB. These logs are written to an S3 bucket. In order to view/query the access logs, an AWS Athena database has been created. You can query the Athena database using standard SQL in the Query Editor.
Dashboard
Data Loader Failure Notifications
It is important to know when one of the PASS Data Loader jobs fail to run. This can be done in EventBridge by creating a Rule that fires on AWS Batch Job failures and sends a notification using AWS SNS.
Last updated