PDA

View Full Version : Vent Forum Downtime - Apologies


CPSDarren
06-01-2008, 02:54 PM
Our provider had two issues overnight. The first was that

"Some of the databases on this server were corrupted. It is pretty rare but it happens once in a long time."

That was resolved in a couple hours, but then early this morning they had,

"One of the servers has developed a problem in the cluster. We are dropping it from the cluster and rerouting the traffic."

and:


Our first fix did not work, the data re-corrupted.

We're dropping the replication and doing a restore from backups.




These things do happen from time to time and I apologize for any inconvenience to our members. I know personally that car-seat.org withdrawal is a serious problem when the forums go down! I'm not quite sure of the extent of any loss of data in terms of posts, new accounts or other information, but some information was certainly lost. So, there will probably be some missing threads, new registrations, subscriptions and private messages from much of Saturday since the previous backup very early Saturday morning.

Though this was completely beyond my control, I am going to extend all currently active donor subscriptions by one month. I will begin to apply these credits on Monday for any member account that had an active subscription at any time today, Sunday, June 1st. This will include any subcriptions or renewals made today from current or new donors. The one month extension is not only an apology and credit for the inconvenience of today's downtime (and any future downtime), but also a big THANK YOU for your support of Car-Seat.Org!

So, if you are a donor as of today and do not see a month added to your term by Monday night, please send me a PM and I will look into it.

I apologize again and hope our provider has the issue under control. The latest update said:


We have most of our crew in now so for the next couple of hours, we will be working on a resolution to avoid what happened to the cluster. Our ETA's were also way off and we will also work on that.


So far, the overall speed, reliability and service of this provider has been much better than the previous ones, so I'm not shopping for a new host just yet. If there are future problems, I am making a list of suitable hosting companies just in case.

Wineaux
06-01-2008, 03:49 PM
Yeah, lost a really long PM to CRS that was a lot of brainstorming on a new business. Grumble, grumble, grumble... I know it's no one's fault, but damn! That was a PM that I didn't really want to lose. :(

HEVY
06-01-2008, 03:55 PM
Withdrawal? Yeah there was some of that. ;)

crunchierthanthou
06-01-2008, 04:40 PM
Yeah, lost a really long PM to CRS that was a lot of brainstorming on a new business. Grumble, grumble, grumble... I know it's no one's fault, but damn! That was a PM that I didn't really want to lose. :(

does she get pm notification by email? She may be able to forward that back to you.

CPSDarren
06-01-2008, 05:14 PM
Seems to be some ongoing issue. I hope it doesn't go down again :/

MissKatie
06-01-2008, 05:37 PM
OMG, I gotta tell you, I almost died without cs.org for so long. This afternoon I've been refreshing every ten minutes waiting for it to come back, LMBO! Should have stayed down longer, my homework isn't done yet....

brightredmtn
06-01-2008, 06:19 PM
Oh so that's what happened to the sunscreen thread. I didn't think it was possible that thread got controversial and got deleted.

Mama!
06-01-2008, 06:57 PM
Withdrawal plus posts lost. yep.

Sorry this happened, Darren. :( I know it's beyond your control.

azgirl71
06-01-2008, 07:36 PM
I was waiting to put things on Swap before I go to Ebay with them. I am so glad it is back up! I was definately having withdrawl.

UlrikeDG
06-02-2008, 02:35 AM
OMG, I gotta tell you, I almost died without cs.org for so long.

You think that's bad, I was offline all day yesterday for my sister's wedding. I got online today to catch up, and there was NOTHING HERE!!!! :eek:

:whistle:

CRS
06-02-2008, 06:15 AM
Yeah, lost a really long PM to CRS that was a lot of brainstorming on a new business. Grumble, grumble, grumble... I know it's no one's fault, but damn! That was a PM that I didn't really want to lose. :(

That's ok! Thank goodness I get an email sent to me with PM's when I get sent them! I replied :)

Wineaux
06-02-2008, 09:03 AM
I lost a little rep due to the posts I got the rep from disappearing, but it's no biggie. Just more of an FYI that it happened.

CPSDarren
06-02-2008, 09:11 AM
Yup. Basically all of Saturday was nuked as if it didn't exist.

I am investigating options for more frequent or live backups. As it is now, we stand to lose 24 hours of everything if a crash happens late at night like it did Saturday. If I can cut that to 6 hours or less for a reasonable effort and cost, I will.

CPSDarren
06-02-2008, 09:13 AM
As of now, all donors (supporters, benefactors and sponsors) should see an extra month on their paid subscription. Thank you!

If you think an error was made, please contact me!


Though this was completely beyond my control, I am going to extend all currently active donor subscriptions by one month. I will begin to apply these credits on Monday for any member account that had an active subscription at any time today, Sunday, June 1st. This will include any subcriptions or renewals made today from current or new donors. The one month extension is not only an apology and credit for the inconvenience of today's downtime (and any future downtime), but also a big THANK YOU for your support of Car-Seat.Org!

So, if you are a donor as of today and do not see a month added to your term by Monday night, please send me a PM and I will look into it.

singingpond
06-02-2008, 10:14 AM
Oh, this explains lack of response on a couple threads I started Saturday; in fact, it explains total lack of threads :p. I may try to resurrect one of them, as I had meant to bookmark some useful responses I saw during the day on Saturday -- hopefully people will respond again with the same information.

I wonder whether some of the new members who posted questions, had them answered, and then had the answers disappear, are really confused right about now....

Katrin

CPSDarren
06-02-2008, 10:52 PM
This just in from the provider:

As a result of our discussions from the crash of the mysql.imountain.com cluster, we are implementing a few new backup procedures.

This was a fluke of a crash because we did an update to the mySQL cluster and it looked fine at the time but didn't develop issues until several hours later.

So for future operations:

1) Prior to doing updates, we will always do a backup instead of relying on our early morning backups.
2) We will have a live server running a copy of the database at all times. This will be an unattached server which will have the most recent backup of the database on-line.

Fix #1 will ensure that if we do an update that goes badly, we'll have a fresh copy of the database to restore, not one that is up to 24 hours old.

Fix #2 will provide a server we can switch you to in case #1 isn't available. This will eliminate the long wait it takes for mysql databases to restore.

Database replication which we use is usually extremely reliable and it's what keeps things running smoothly even if 1 server fails. In this circumstance, 1 problem was replicated across all the servers so that just broke everything.

We feel these changes will provide a quality hosting environemnt.

Thank you for your patience through this ordeal.


In addition, I am upgrading our services to include a full backup 3 times daily instead of the single overnight backup. That should mean that even in a catastrophic failure of their SQL cluster and backup server, we would only lose 8 hours of activity at the most.