The tldr; is at the bottom, after the story mode writeup. Feel free to CTRL+END or hyper-scroll your way there, or read the machinations of my troubleshooting on your way there.
When performing configuration changes to Cisco IOS devices like switches and routers, you're essentially updating the running-config, or configuration that is loaded into memory and is currently active. If you want to save your config changes in a more permanent fashion you need to copy the changes to the start-config as well, or you can lose your changes after a reboot.
At MSP1 we had lots of hotels for clients. Indeed that was our industry specialization, and we were pretty good at supporting their needs. It's not uncommon for hotels to have several vendors across the technology spectrum. Each would manage different aspects like the phones, guest wireless, TV's, etc. We were often tasked with troubleshooting issues with any and all vendors when issue arose.
One night I received a call from my supervisor that the Hilton Del Mar had a network outage for their back-office network following a brief power outage, so we loaded up and made the trip down in the evening.
When we arrived onsite we confirmed the back-office network outage and started working our way upstream to locate the issue. All of the BO network gear seemed to be working, so we worked our way back to the equipment where the uplink was located.
The BO network uplink came from an AT&T managed switch, which was not unusual. Most hotels have a managed network switch provided by the ISP which would set up specific ports for each network that needed an uplink. For example, if a property had a branded network (Hilton), a BO network and a guest wireless network (managed by a different provider), all 3 might connect back to the ISP switch for their uplink to the internet and be provided with a different public IP address.
We weren't able to get any link on the uplink port. Not even the link lights. When we called into AT&T's network support we attempted to troubleshoot the issue, but didn't manage to make much progress. We were told that the switch config was fine, that the port was supposed to be admin-down (a term meaning that in the configuration that port was shut off) and that they would need an LOA for us to request network changes for the property.
Attempting to reason with them and explain that it had been working up until a few hours before and that a power outage had led to a reboot we were given essentially the same response, so we started looking for an alternative solution.
I'm not going to advocate randomly plugging uplinks into random ports as a best practice, but there are times when you have to get a little creative with SOP in order to get your client up and running again.
We started plugging one of our laptops into empty ports and checked for connectivity, and we found one.
So we plugged in the BO uplink cable, confirmed the BO network was working normally and called it a day.
Our best guess is that at one point someone authorized requested the port made live for the BO network and AT&T made the config changes, but never typed in the "wr" command that would have written the config to startup. This would explain why, after a reboot, the port was admin-down and another was up. It would also explain why the rest of the network devices seemed to work normally.
We also *highly* recommended that the client allow us to purchase and install a UPS for that network switch, as there was none, and it likely would have prevented this scenario from happening at all.
tldr; always "wr" your Cisco IOS config when it's confirmed good and working. Also, put a UPS behind all of your infrastructure for God's sake.