Contingency Plans for RAMP Customers?

As Tessitura RAMP access is down for the second time this month and fourth time since we put tickets on sale in January, I'd like to hear what if any contingency plans others have set up for system downtime.  We've only had one instance of being down during our actual Festival, but we did have a few hours during our Donor Presale that we were down and it was a nightmare. 

  • Former Member
    Former Member $organization in reply to Andrew Recinos

    Andrew,

    We appreciate the response clarifying your best intentions, but our CEO has asked me to make an additional request for some information.  I have opened a support ticket for this request, but am copying it in below to gauge the interest of fellow community contacts on a proposal like this.

    We would like an update on the issue and estimated time for resolution.  We believe the support and communication in particular in these instances is below average and would like to have emailed updates every 15 minutes to Primary, Business, and IT contacts in our organization.  I suspect other organizations will echo this.  The priority is resolution, but we can plan our day and those ahead immediately if we have initial ballpark estimates and timely updates.  I don't think this is too much to ask for.  We are in season renewals and have a Broadway production in residence, one in which the booking agent happens to represent the show that was in residence during our last incident.  I have opened a support ticket (11479,15) and placed calls to emergency support this morning and have been directed to the service status page on the website which has updated twice in three hours.  This is not good enough.   Please let me know the latest as soon as possible so I may pass the information on to our CEO and everyone in the organization who is impacted right now.  

    Regards,
    Chris
    Bass Performance Hall

  • Adding some more things that need to be done by Tessitura in outages like this:

    For TNEW affected clients - Tessitura needs to provide a popup page that says the website is experiencing difficulties

    Tessitura NEEDS to create a redundant failover system so if the primary servers go down, the backups automatically start up and we barely notice an issue OR allow licensees to create their own failover when RAMP goes down

    There needs to be FAR better communication about what is happening and where the status of the outage is

    What you are describing Andrew are great plans for the future of RAMP but I would like to know what Tessitura is intending to do now to fix this major problem. This is completely unacceptable and our patrons don't care that the database we are using is planning all this stuff. They just want to be able to call or login and buy their tickets, make their donations, or just interact with our organizations. When Tessitura goes down, Tessitura doesn't look bad, we, the licensees, look bad.

  • I sure hope so Gawain. As I've said....on-prem organizations rarely, if ever, experience this level of outage.

  • To add what Chris De Leon said, we here at the Walton Arts Center would like to know what happened to communicating the fact that there is an outage? We found out RAMP was down by trying to log in to Tessitura. The status page isn’t enough. Giving RAMP customers a heads up and subsequent updates of outages would be a nice start.
     
  • I agree. I subscribe to the RSS feed and never get an email when RAMP is down. Clearly that doesn't work. They have primary, secondary, and tertiary contacts at all member organizations. Why those email addresses aren't attached to a down time email group is beyond me. We as licensees are often accused of over communicating issues and I feel the exact opposite is true from the company of the product we pay quite a bit to use.

  • I'd like to jump in and echo what Gawain has said.

    Having self hosted our Tessitura environment for many years, both on premise, and in colocation environments, we've had plenty of opportunities to sit down with engineers and discuss what it would take to eliminate all single points of failure and build in redundancy wherever possible.

    At the end of the day it's hard to accomplish these goals at costs that are justifiable because of the complexity of the end-to-end Tessitura environment.  Building truly redundant transactional data environments involves much more than backup and restore.

    I can't imagine that simplification of the overall Tessitura environment is not a top level goal for the Network.  Weather they talk much about it or not.  

    In the meantime, I like having the power to take action and do what we feel needs to be done to keep our operation running 24x7.  But despite all the planning and work we do, things happen that are out of our control.  We have our single points of failure and sometimes they fail.  Over the course of a year we typically run around 3 nines for our uptime.  Being up 99.9% of the time means that we'll experience unplanned outages for a total duration of about 9 hours a year.  That sounds pretty reasonable until 4 of those hours happen in the 3 hours before and the 1 hour after the start of a huge on sale event.

    No one likes to deal with outages such as this.  But, failing to plan for them is the same as planning to fail.  You will have unexpected downtime where all or part of your ability to conduct transactions unavailable.  It doesn't matter if you use a Managed Service such as RAMP or self host.  It will happen.  Plan for it.

  • I completely hear what you are saying and push back with this simple statement: what is a small organization with a limited staff and person thrown into an IT role who is not an IT person supposed to do? How are they supposed to plan for fail over when they are just trying to keep their organization running? They rely on uptime and Tessitura and RAMP to ensure their up time. Do I expect 100% uptime all the time...yes. Do I realize that any system, hosted locally or remotely, will go down and cause downtime? Yes. I have worked for both styles of hosting for Tessitura. RAMP hosting has, by far, been the MOST frustrating.

  • I really think that RAMP should move to AZURE and trust in the MS built redundancy to maintain uptime.
     
    Doug
     
  • Does anyone know when this outage began? There is no time stamp on the outage to say when it began. I'd like to keep track locally of the total time down for each outage.

  • We don't have any more web order past 5:14 am PST so probably somewhere after then.

  • If you “realize that any system, hosted locally or remotely, will go down and cause downtime”, then you don’t really expect 100% uptime.  You can’t have it both ways.
     
    I think I understand what you’re saying though.  100% uptime is the goal.  Even if we acknowledge that it’s not actually achievable, it’s still the goal we shoot for.
     
    However, shooting for an unachievable goal isn’t a wise business choice.  Instead, business leaders should set aside the time and resources necessary to determine how much downtime is actually acceptable.  In order to do that you must also determine how much that downtime will cost. 
     
    I know it’s hard to do.  We’ve had previous senior managers (no longer here!) at CSO who have repeatedly stated that NO downtime is acceptable.  Yet, they also refused to support expending additional resources to help build necessary redundancy to lower downtime.
     
    Reality is what it is.  Know how much it costs to be down.  Know how much you can actually afford to lose to downtime on an annualized basis.  Put that on a rolling timeline, and then see if your service provider is falling within those guidelines.   Then judge.
     
    Having been self-hosted for a long time I understand how frustrating it can be to have a single outage that trashes that goal.  It’s especially frustrating when it hits one of those single points of failure where redundancy was simply not cost justified. 
     
    I’m I have users here who would say that they would love to have our systems hosted by someone else because of the unplanned downtime we’ve experienced.  Most years we experience less than 4 hours of both planned and unplanned downtime.  But… there have been times when we’ve experienced the 12 hour outage.  One of those rare events that is not likely to happen again for another 15 years.  Is it worth the hundreds of thousands of dollars it would cost over that time to build around such an event?  Nope.  It just isn’t. 
     
    Down time is a fact of life.  Don’t fear it.  Accept it.  But, plan for it.  Measure it.  And hold your providers accountable if they fail to meet their uptime goals as stated in their SLA.
     
    Since we’re not a RAMP client, I am curious.  Are the RAMP SLA’s being met?
     
     
  • Sometime between 8:30-9am EDT; I can't get in to anything to see when our last web order was to confirm a closer window.

  • Former Member
    Former Member $organization in reply to Andrew Recinos

    I understand what you're saying Andrew, but the reality is that if we can't sell tickets due to your system being unreliable/down you are crippling not just our organization but also impacting our visitor experience. Ticketing is the first place people go when they walk in the door, and when our Box Office staff are selling wristbands with a tally on a piece of aper and unable to process credit cards we are giving a poor first impression that we are not the world class destination we claim to be. 

    We trusted the RAMP hosted product and the reality is that we can not do that anymore. We have been hearing for over a year the same information you posted above. There is a point where we need to make a change when we hear that our other neighbors also using tessitura are not experiencing the frequent issues we are. 

    Heidi Quicksilver

    Vice President - Technology

    Rock & Roll Hall of Fame

  • on-prem organizations rarely, if ever, experience this level of outage

    I don't have hard numbers, but I feel like RAMP outages are more frequent than we experienced self hosting.  We also get a lot of connectivity "blips" where we'll lose connection as an organization for a few minutes before we start reconnecting.  But on the other hand, the outages are usually much shorter.  When our server room flooded, for instance, we weren't back online in an hour...

    But I wouldn't be surprised if the RAMP outages were more frequent than someone with a standard installation because of all the additional "cloud" and virtualization pieces.  And I haven't been keeping track, but I feel like nine times out of ten, those are the fail points when there is an outage, rather than hard drive failures or routers dying.