How to Announce Downtime
Overview
SiteRM sites can announce planned or emergency downtime by adding entries to the site’s FE/downtime.yaml file in the rm-configs repository.
This file is intended to give site administrators a simple, site-controlled way to tell SENSE consumers that a SiteRM, SiteRM Agent, or full site should not receive new requests for a defined period of time. Consumers such as SENSE-O and Autogole monitoring can use the file to disable drivers, suppress or annotate monitoring alerts, and show the reason why the endpoint is unavailable.
When to Use Downtime
Use downtime.yaml any time a site expects SENSE service to be unavailable or unstable, especially when new requests should be paused.
Common examples:
- SiteRM Frontend upgrades or restarts
- SiteRM Agent upgrades or host maintenance
- Switch maintenance or network upgrades
- Site power work or power cuts
- Data center maintenance windows
- Routing, BGP, VLAN, or optical path work
- Debugging incidents where new requests should be temporarily stopped
- Planned outages that should be visible to Orchestrators and monitoring systems
If the site can still serve existing circuits but should not accept new requests, announce downtime for the affected service and explain that in disable_reason.
File Location
Each site with a Frontend directory should have:
<SITE_NAME>/FE/downtime.yaml
Examples:
T2_US_Caltech/FE/downtime.yaml
NRM_CENIC/FE/downtime.yaml
An empty file means there is no announced downtime.
File Format
The file has one top-level key: downtimes. This key maps to a list of downtime entries.
Downtime entries are append-only operational history. Once an entry is added, do not remove it. When a new maintenance window or outage needs to be announced, add a new item to the downtimes list with a new id.
downtimes:
- id: 0
enabled: false
disable_reason: "Frontend upgrade and switch maintenance."
start: "2026 01 01 00:00"
end: "2026 01 01 04:00"
services: All
hostname: sense-example.example.edu
downfor:
- sense-o.es.net
- autogolemon.nrp.edu
Fields
id- Integer identifier for this downtime entry. Keep it unique in the
downtimeslist. Use a new ID for each new downtime entry. enabled- Boolean value. Use
falsewhen the service should be treated as unavailable. Usetruewhen keeping a placeholder or recording that the endpoint is available. disable_reason- Human-readable reason shown to operators and consumers. This should explain what is happening and, when useful, what impact is expected.
start- Start time in
yyyy mm dd hh:mmformat. end- End time in
yyyy mm dd hh:mmformat. The end time must be after the start time. services- Service scope for the downtime. Valid values are:
FE- SiteRM Frontend onlyAgent- SiteRM Agent or host-side service onlyAll- Full SiteRM service impact
hostname- FQDN of the affected SiteRM component or site endpoint.
downfor- List of consumers that should apply this downtime. Include the Orchestrator and monitoring systems that should stop sending new requests, disable a driver, or annotate alerting.
Example: Frontend Upgrade
downtimes:
- id: 0
enabled: false
disable_reason: "SiteRM Frontend upgrade. New SENSE requests are paused during the maintenance window."
start: "2026 02 10 15:00"
end: "2026 02 10 17:00"
services: FE
hostname: sense-fe.example.edu
downfor:
- sense-o.es.net
- autogolemon.nrp.edu
Example: Site Power Maintenance
downtimes:
- id: 0
enabled: true
disable_reason: "Previous Frontend upgrade has completed."
start: "2026 02 10 15:00"
end: "2026 02 10 17:00"
services: FE
hostname: sense-fe.example.edu
downfor:
- sense-o.es.net
- sense-o-dev.es.net
- id: 1
enabled: false
disable_reason: "Data center power maintenance. Frontend, agents, and network devices may be unavailable."
start: "2026 03 05 02:00"
end: "2026 03 05 08:00"
services: All
hostname: sense-fe.example.edu
downfor:
- sense-o.es.net
- sense-o-dev.es.net
Example: Multiple Downtimes
A site can keep multiple downtime records in the same list. Add a new entry for every new event. Do not remove older entries.
downtimes:
- id: 0
enabled: true
disable_reason: "Previous network upgrade has completed."
start: "2026 04 01 10:00"
end: "2026 04 01 12:00"
services: All
hostname: sense-fe.example.edu
downfor:
- sense-o.es.net
- sense-o-dev.es.net
- id: 1
enabled: false
disable_reason: "Network upgrade affecting SENSE production and development requests."
start: "2026 04 15 10:00"
end: "2026 04 15 12:00"
services: All
hostname: sense-fe.example.edu
downfor:
- sense-o.es.net
- sense-o-dev.es.net
Operational Guidance
Before downtime starts:
- Add or update the site’s
FE/downtime.yamlfile. - Include a clear
disable_reasonthat operators can understand without extra context. - Set
servicesto the narrowest accurate scope. - Include all affected consumers in
downfor. - Submit the change to rm-configs and allow SiteRM consumers to pick up the update.
During downtime:
- Keep the entry active for the full maintenance window.
- Extend
endif the work runs longer than expected. - Update
disable_reasonif the impact changes.
After downtime:
- Do not remove the downtime entry.
- Set
enabled: truewhen the downtime is over, if consumers use this flag to decide current availability. - For the next downtime, add a new list item with a new
id. - Confirm that SENSE-O, Autogole, and other consumers see the endpoint as available again.
- Confirm that the SiteRM Frontend and Agents are healthy before accepting new requests.
Validation
The rm-configs validator checks FE/downtime.yaml files. Empty files are valid. Populated files must define only the top-level downtimes list. Each entry must use the expected fields and types:
idmust be an integeridmust be unique in thedowntimeslistenabledmust betrueorfalsedisable_reasonmust be a stringstartandendmust useyyyy mm dd hh:mmstartmust be beforeendservicesmust beFE,Agent, orAllhostnamemust be setdownformust be a non-empty list
Run validation from the rm-configs repository:
python3 validate_config.py