The server crashed just after 12:00 a.m. CST overnight Thursday/Friday. I was able to get everything up-and-running again Friday night, but I had to restore a backup, because I couldn't get the database server to start up again. Since the total downtime was over 11 hours, I've made some changes that I hope will help avoid this in the future.
Right after I got everything running again, I signed up for a service that will automatically check on the site every 15 minutes and then notify me if the site is down. This should prevent hours and hours of downtime, although there may be times when I won't be able to troubleshoot and get the problem corrected.
Second, I implemented a caching mechanism in front of the site. How does this work? When a user is unauthenticated (i.e. not logged into the site) and visits a page, the page is automatically cached, and other people who visit the page will see the cached version. This means that the database and web server software don't have to do any work, since only the cache is served. The cache is regularly wiped out, but it will take some time for anonymous users to see any changes to the site.
Authenticated (logged in) users will be unaffected. The cache checks for cookies, which are little files that are created in your browser to save information, like the fact that you're logged into a website. When the cache sees a cookie, it doesn't stand in the way and instead lets the traffic pass through to the web/database servers, showing the most up-to-date content available.
1 reply to this topic