Contingency Plans for RAMP Customers?

As Tessitura RAMP access is down for the second time this month and fourth time since we put tickets on sale in January, I'd like to hear what if any contingency plans others have set up for system downtime.  We've only had one instance of being down during our actual Festival, but we did have a few hours during our Donor Presale that we were down and it was a nightmare. 

  • I want to jump in here following Dan's lead. We have been on Tessitura since v3 (which I installed on local SQL servers 13 years ago). We moved to RAMP about 7 years ago. Yes, there has been downtime, and yes, it has generally occurred at the worst possible moment. Any technology will have outages. Every client, RAMP or not, must have documented and practiced plans to continue operations while systems are down.

  • According to our Vantiv portal, our last web transaction took place at 7:37am CT

  • No one likes to deal with outages such as this.  But, failing to plan for them is the same as planning to fail.

    We can still take phone calls, and even scrape cards at the window if necessary, but we do have an issue when it comes to TNEW.  In this case end users were just seeing a failure to resolve the site: that's very troubling.  Other times, when the issues are more subtle, they might come to the site, but just not be able to find any performances.  We don't have any power that I know of in this situation to implement a temporary redirect to a proper "site down" page with alternative contact information or other messaging, and we definitely need it.

    We also need (and this was another thread recently) better monitoring of services generally.  We can often have subtle problems that are not a simple as servers falling over but result in the site being effectively non-functional, and if they happen on a Sunday night after the box office closes, we are basically only going to hear about it before Tuesday if a patron complains to the right channels.  We're working towards a protocol where someone is tasked with logging in and buying a ticket every day in lieu of an automated process, but you'd think that this would be a service better suited to RAMP itself.

    And the general communication about these outages needs a better channel.  RSS isn't going to alert me in a timely manner (it is, an hour or more after the fact, how I heard about this): I think at the least a technical contact email list that gets these messages.

  • In addition, we might want to take into account historical context and the pledge that has been stated to improve going forward.  Our organization has also been on RAMP for 7+ years (in fact, since our inception).  There was some downtime in the first year or so, but following that, RAMP up-time was fairly impeccable (for us at least) for about 5 straight years, and only in this last year has it started to become more of an issue again.

    Yes, that does not solve any issues today and can still grab us at some inconvenient moments, but it does help to give hope to and retain confidence in the hosted service.  Prior to this last year, I had no complaints about the RAMP service, and again, while it does nothing for today's issues, the fact that these issues are being taken seriously gives me confidence.

    And of course, we shall continue to plan for any eventualities that we can.  And that is why we all love our Box Offices so much.

  • Our site seems to be up but you cannot cart anything because the cc server is not available. The front-in app will allow you to log in but it just hangs then errors out.
     
  • We were up for a second, but we're back down.  But that's a classic example of where we need RAMP to know what is or isn't working, and for TNEW to provide appropriate messaging to our customers.

    Hey, look at that:

    Progress, I suppose.  Mangles our new bootstrapish headers though...

  • We have dealt with the core issue and are now working through residual issues as quickly as we can.  We will continue to update the Status Page as we have more information.  We have historically relied on the Status Page for updates but recognize the need for email communications.  While not today, we will take that feedback and look to improve that as soon as possible.

  • Former Member
    Former Member $organization in reply to Dan Spees

    Being an attraction and a museum our experience with these frequent outages are much different. We have a mobile (not fixed location) box office where people buy admission when they arrive. We more often than not have a line of people waiting to enter. 85% of our tickets are still sold onsite. Tessitura going down cripples our entire entrance experience. It requires our staff to be tethered to a wired credit card machine in a fixed location. We have to manually track tickets sold to enter in the system later creating more work for box office staff. It creates issues for the finance team to reconcile payments after the fact as they can not be run through tessitura after the fact. Etc. When we have a busy summer day with eight thousand people visiting us, and our ticketing system goes down for an hour, any contingency plan is a lot of extra work no matter how you look at it.

  • Disruptions in service are always frustrating, not only for the network and our individual organizations, but, most importantly, for our patrons.  And, of course, they never occur at "convenient" times.  At the Kimbell Art Museum, we have implemented two back-up systems (wired and wireless) which enable us to continue selling on-site admissions, audios, and memberships during outages.  

    We appreciate the network's continued efforts to resolve and minimize these outages, but we certainly share everyone's concerns about coping with the realities of service disruptions. 

  • Our TNMP site is up and would be nice if we can easily redirect TNEW to TNMP ... for situations like today. 

  • Gawain,


    Regarding partially down web services...


    We experience this from time to time too.  There are so many pieces involved.  Front end, load balanced server clusters; content management systems; back end connections to Tessitura WebAPI servers with multiple web applications running on them; connections to databases in different security zones...  Even self-hosted environments have to deal with these things.


    One potential solution to the issue of partial site down situations is to host a simple web server somewhere else with a rather obscure name.  This could be a server hosted by your DNS provider, many of which provide basic web server hosting as part of your DNS hosting package.  When you find it necessary, you can change your DNS records and point your Tessitura enabled web server to this other server which can display customer service information about your outage.  As long as you keep your TTL's on your Tessitura web server DNS records fairly short, you can effect a switch within a reasonable period of time.



  • That's ironic.  Of course, TNMP is going away soon...

  • Heidi,


    It sounds as if you have a use case that would justify self-hosting.  As long as your hosted off premise your connectivity from end to end will most likely be your Achilles heel.  The more systems and hops you put between those end points, the more likely you are to experience a disruption.  And, as you pointed out... when you have a disruption you loose complete system functionality because everything relies on your connection to the destination host environment.



  • TNEW does offer an option to shut down the entire site and replace it with a nice message (see LTR_TN_EXPRESS_WEB_DEFAULTS, columns Is Shut Down, Shutdown Title, and Shutdown Text). Of course it only works when TNEW is working at least a little bit (as it is now, last I checked).

    In our case we are lucky enough (because we have major onsales going now...) to have a waiting room solution (Queue-it) deployed, so we were able to put everyone in a queue. And yet, because it's not a DNS-based solution (like CrowdHandler) but instead is implemented directly in TNEW, it requires TNEW to be working a bit, just like the built-in shutdown message.

  • When you find it necessary, you can change your DNS records and point your Tessitura enabled web server to this other server which can display customer service information about your outage.

    That can get pretty ugly, though, right?  In my experience DNS changes can often take hours to propagate.  So by the time all clients are seeing the right page the problem might be fixed, and now it'll be an hour before they can get back to the site again.  IANANE (I am not a network engineer).