与 Amazon CloudWatch Logs 集成 - Amazon ParallelCluster
Amazon Web Services 文档中描述的 Amazon Web Services 服务或功能可能因区域而异。要查看适用于中国区域的差异,请参阅中国的 Amazon Web Services 服务入门

本文属于机器翻译版本。若本译文内容与英语原文存在差异,则一律以英文原文为准。

与 Amazon CloudWatch Logs 集成

从开始Amazon ParallelCluster版本 2.6.0,默认情况下,常见日志存储在 CloudWatch Logs 中。有关 CloudWatch Logs 的更多信息,请参阅Amazon CloudWatch Logs 用户指南。要配置 CloudWatch Logs 集成,请参阅[cw_log]部分cw_log_settings设置。

为每个群集创建一个日志组,其名称为/aws/parallelcluster/cluster-name(例如,/aws/parallelcluster/testCluster)。每个日志(如果路径包含*)都有一个名为{hostname}.{instance_id}.{logIdentifier}。(例如,ip-172-31-10-46.i-02587cf29cc3048f3.nodewatcher。) 日志数 CloudWatch 过CloudWatch 代理,它运行为root在所有集群实例上。

从开始Amazon ParallelCluster版本 2.10.0,创建集群时会创建一个亚马逊云监视仪表板。通过此仪表板,您可以轻松查看存储在 CloudWatch Logs。有关更多信息,请参阅 Amazon CloudWatch 控制面板

此列表包含日志的路径和logIdentifier用于这些日志。

  • /opt/sge/default/spool/qmaster/messages (sge-qmaster)

  • /var/log/cfn-init.log (cfn-init)

  • /var/log/chef-client.log (chef-client)

  • /var/log/cloud-init.log (cloud-init)

  • /var/log/cloud-init-output.log (cloud-init-output)

  • /var/log/dcv/agent.*.log (dcv-agent)

  • /var/log/dcv/dcv-xsession.*.log (dcv-xsession)

  • /var/log/dcv/server.log (dcv-server)

  • /var/log/dcv/sessionlauncher.log (dcv-session-launcher)

  • /var/log/dcv/Xdcv.*.log (Xdcv)

  • /var/log/jobwatcher (jobwatcher)

  • /var/log/messages (system-messages)

  • /var/log/nodewatcher (nodewatcher)

  • /var/log/parallelcluster/clustermgtd (clustermgtd)

  • /var/log/parallelcluster/computemgtd (computemgtd)

  • /var/log/parallelcluster/pcluster_dcv_authenticator.log (dcv-authenticator)

  • /var/log/parallelcluster/pcluster_dcv_connect.log (dcv-ext-authenticator)

  • /var/log/parallelcluster/slurm_resume.log (slurm_resume)

  • /var/log/parallelcluster/slurm_suspend.log (slurm_suspend)

  • /var/log/slurmctld.log (slurmctld)

  • /var/log/slurmd.log (slurmd)

  • /var/log/sqswatcher (sqswatcher)

  • /var/log/supervisord.log (supervisord)

  • /var/log/syslog (syslog)

  • /var/spool/sge/*/messages (sge-exec-daemon)

  • /var/spool/torque/client_logs/* (torque-client)

  • /var/spool/torque/server_logs/* (torque-server)

集群中使用Amazon Batch存储达到RUNNINGSUCCEEDED,或者FAILEDCloudWatch Logs。日志组是/aws/batch/job,并且日志流名称格式为jobDefinitionName/default/ecs_task_id。默认情况下,这些日志设置为永不过期,但您可以修改保留期。有关更多信息,请参阅 。更改 CloudWatch Logs 中的日志数据保留中的Amazon CloudWatch Logs 用户指南

注意

chef-clientcloud-init-outputclustermgtdcomputemgtdslurm_resume, 和slurm_suspend已添加到Amazon ParallelCluster版本 2.9.。适用于Amazon ParallelCluster版本 2.6./var/log/cfn-init-cmd.log(cfn-init-cmd) 和/var/log/cfn-wire.log(cfn-wire)也存储在 CloudWatch Logs 中。