I got an email from one of our users yesterday – unable to connect to their test database. I have a small universe of test databases. There’s the near-clone of the current production database, the near-clone of the previous version of the production database which was upgraded to the current version as a final test of the upgrade procedures; there’s the database being used to test the new module supporting the National Emissions Inventory, and a clone of that database being used for performance testing. There’s the database being used to test the next version of the application, and the database being used for Oracle10g verification and SQL/XML development. I finally managed to kill off two databases that were being used by two different groups doing separate draft permit development. And those are just the test databases for our primary application. Then there’s the primary production database itself, and three databases supporting other applications, and two test databases for those. It’s not like we’re not that big a place! I need a database of passwords for all the different databases.
So, I received an email from one of the users of one of these test databases. They’re in an office in another part of town, and they couldn’t connect to the particular test database they wanted to hit. I had them try a couple of things, with no success, so after lunch today I went to that office. Start the application, “cannot resolve service name”. Check the database ID, its OK. Ping the server, OK fine. Tnsping the database, it’s OK. Try the app again, make sure I saw the error correctly, yes, I did. Change the database ID to a different database on the same server. OK fine. Change back, nada. Telnet to the server, stop the listener, restart the listener. Try the app again. Annnkkkk! Check the tnsnames file. It’s good. Fire up SQL*Plus, try to connect to the target database. Nothing. Try to connect to another database on that server. OK fine. Fire up the Oracle net config utility. Test the connection. Can’t test it, I’m using a redirect in the tnsnames file. Rats! Fire up the app again, because I can’t think of what the problem might be, stall for time while I think. Still won’t connect. Not surprised. Check the connection info again, compare the port numbers. OK fine. At this point I’m pretty baffled. I telnet to the server again, decide to connect to the database directly.
The database was down. The database was down. I’d shut it down a week or so ago when I was setting up another database on that server, and hadn’t brought it back online. I spent an hour trying all sorts of remedies to various connectivity hypotheses, and the problem was the thing I should have checked first, the most obvious and therefore the simplest solution. The database was down. Occam’s Razor. Numquam ponenda est pluralitas sine necessitate. You should never focus on the solutions to a problem before you’ve focused on the problem itself. Forests and trees.
Numquam ponenda est pluralitas sine necessitate. Indeed.