Sorry it's been down for so long, but it's back now! =)
Unfortunately we had one of our South African servers break. It's easy to fix, but it's 10pm in South Africa right now and everyone at the data center has gone home. It should be fixed in the next 10-12 hours when they get to work in the morning, I expect.
After yesterday's blog post, I got emails with words of praise from several humans and one ice/shadow dragon! Thank you! =)
It's all good, I'm not walking away anytime soon! It was just a shitty couple days and I needed to vent.
It's been a fun week.
It's taken me several days to piece together what was happening so I could write about it. Around Jan 2, the support mailbox started getting an increasing number of complaints about lagging servers -- on the order of dozens of messages per day, which is still tiny as a percentage of our customer base, but very unusual nonetheless. A typical complaint would read something like "my lag has been crazy, sometimes everything locks up for a full minute and then rubberbands to catch up!"
Over the next few days, here is what I was able to piece together:
As of this morning, things seem to have calmed down. Several patches from Studio Wildcard have reduced Ragnarok's average memory usage back to where it used to be. And we've put enough new hardware online that everyone has a stable game without lag.
Good for us, right?
Here's what sucks, though..
Why do some people have to write in and actively threaten me? I mean, it's cool that people write support and let us know that they're having a problem, but why do they have to yell? Type in all caps? Threaten lawsuits? Why do they have to say things like "now that your company is big, you're cutting back on server specs and customer support to screw everyone" when I'm spending more than ever on both?
We fixed ark70, one of the South Africa machines. It's all good again!
One of our South Africa machines, ark70, is down for about an hour. It had some hardware break and we are replacing it.
One of our servers, ark55, froze up this morning and was rebooted. Everyone's game servers should be back now.
Back in October, Studio Wildcard did a brief Halloween Event, which was implemented in a very unusual way that they had never done before. Rather than add a command line flag as they had in the past, they published a beta branch of the game on Steam that you had to download by passing "-beta halloween" to steamcmd. Some people saw references to "-beta halloween" on forums and thought that all we had to do was add this command line flag for their game. Those people then got mad at me for not doing that. =(
Our system works by just downloading the game from Steam once onto each machine, and then copying that into everyone's home directory. This was designed to go faster and save bandwidth, but it means that we can't just download a different branch from Steam for a particular customer. I would be happy to change the system, but since that event only ran for several days, the effort would have been moot by the time it was done.
I guess I should have done it after all, because Studio Wildcard just dropped this bomb on us again. They're running a Christmas event, which is available in the beta branch "-beta holidayevent". As before, the forums are filled with mistaken posts from people who think that this is a command line flag for Ark, when it's actually one for steamcmd.
Bottom line, I screwed up again and we can't do the event. I do deserve some blame for this one because I didn't fix our system after the failed Halloween event. But I think Studio Wildcard deserves some blame too, since they dropped this on us with no warning.
I'll have to change our system to be able to download these beta branches and make them available to people, since it sounds like this won't be the last time.
So someone at the data center mistakenly unplugged it while racking another machine. Sorry guys! =(
Everything should be back online now!
We just had a machine (known to us as ark71) die within the last hour. We're looking into it.