Contingency Plans for RAMP Customers?

As Tessitura RAMP access is down for the second time this month and fourth time since we put tickets on sale in January, I'd like to hear what if any contingency plans others have set up for system downtime.  We've only had one instance of being down during our actual Festival, but we did have a few hours during our Donor Presale that we were down and it was a nightmare. 

Parents
  • I echo this sentiment. RAMP prohibits us, the clients, from having ANY sort of contingency for when RAMP is down. We are NOT allowed to say, host a backup locally should RAMP go down. We are VERY limited by RAMP to our access (even VERY seasoned users). The silence and lack of response from the RAMP team is simply unacceptable on this front. We are losing money because of this. And with Tessitura going to a completely web based product, how is Tessitura going to ensure that RAMP and TNEW clients don't experience this level of service disruption on a regular basis. I recently saw a posting from Jack about customer service. We are the customers here and the level of service we are receiving is sub-par at best.

  • And with Tessitura going to a completely web based product, how is Tessitura going to ensure that RAMP and TNEW clients don't experience this level of service disruption on a regular basis

    My sense is that many of the outages (and performance issues) revolve around the complex Jenga-like stack of services RAMP has to maintain to support an application that still uses PowerBuilder for core elements as a cloud, so a proper web services should involve a radical simplification of architecture and fewer of these "issue to host machines connect to storage devices" type failures.

  • I sure hope so Gawain. As I've said....on-prem organizations rarely, if ever, experience this level of outage.

  • I'd like to jump in and echo what Gawain has said.

    Having self hosted our Tessitura environment for many years, both on premise, and in colocation environments, we've had plenty of opportunities to sit down with engineers and discuss what it would take to eliminate all single points of failure and build in redundancy wherever possible.

    At the end of the day it's hard to accomplish these goals at costs that are justifiable because of the complexity of the end-to-end Tessitura environment.  Building truly redundant transactional data environments involves much more than backup and restore.

    I can't imagine that simplification of the overall Tessitura environment is not a top level goal for the Network.  Weather they talk much about it or not.  

    In the meantime, I like having the power to take action and do what we feel needs to be done to keep our operation running 24x7.  But despite all the planning and work we do, things happen that are out of our control.  We have our single points of failure and sometimes they fail.  Over the course of a year we typically run around 3 nines for our uptime.  Being up 99.9% of the time means that we'll experience unplanned outages for a total duration of about 9 hours a year.  That sounds pretty reasonable until 4 of those hours happen in the 3 hours before and the 1 hour after the start of a huge on sale event.

    No one likes to deal with outages such as this.  But, failing to plan for them is the same as planning to fail.  You will have unexpected downtime where all or part of your ability to conduct transactions unavailable.  It doesn't matter if you use a Managed Service such as RAMP or self host.  It will happen.  Plan for it.

  • I completely hear what you are saying and push back with this simple statement: what is a small organization with a limited staff and person thrown into an IT role who is not an IT person supposed to do? How are they supposed to plan for fail over when they are just trying to keep their organization running? They rely on uptime and Tessitura and RAMP to ensure their up time. Do I expect 100% uptime all the time...yes. Do I realize that any system, hosted locally or remotely, will go down and cause downtime? Yes. I have worked for both styles of hosting for Tessitura. RAMP hosting has, by far, been the MOST frustrating.

  • on-prem organizations rarely, if ever, experience this level of outage

    I don't have hard numbers, but I feel like RAMP outages are more frequent than we experienced self hosting.  We also get a lot of connectivity "blips" where we'll lose connection as an organization for a few minutes before we start reconnecting.  But on the other hand, the outages are usually much shorter.  When our server room flooded, for instance, we weren't back online in an hour...

    But I wouldn't be surprised if the RAMP outages were more frequent than someone with a standard installation because of all the additional "cloud" and virtualization pieces.  And I haven't been keeping track, but I feel like nine times out of ten, those are the fail points when there is an outage, rather than hard drive failures or routers dying.

  • No one likes to deal with outages such as this.  But, failing to plan for them is the same as planning to fail.

    We can still take phone calls, and even scrape cards at the window if necessary, but we do have an issue when it comes to TNEW.  In this case end users were just seeing a failure to resolve the site: that's very troubling.  Other times, when the issues are more subtle, they might come to the site, but just not be able to find any performances.  We don't have any power that I know of in this situation to implement a temporary redirect to a proper "site down" page with alternative contact information or other messaging, and we definitely need it.

    We also need (and this was another thread recently) better monitoring of services generally.  We can often have subtle problems that are not a simple as servers falling over but result in the site being effectively non-functional, and if they happen on a Sunday night after the box office closes, we are basically only going to hear about it before Tuesday if a patron complains to the right channels.  We're working towards a protocol where someone is tasked with logging in and buying a ticket every day in lieu of an automated process, but you'd think that this would be a service better suited to RAMP itself.

    And the general communication about these outages needs a better channel.  RSS isn't going to alert me in a timely manner (it is, an hour or more after the fact, how I heard about this): I think at the least a technical contact email list that gets these messages.

  • TNEW does offer an option to shut down the entire site and replace it with a nice message (see LTR_TN_EXPRESS_WEB_DEFAULTS, columns Is Shut Down, Shutdown Title, and Shutdown Text). Of course it only works when TNEW is working at least a little bit (as it is now, last I checked).

    In our case we are lucky enough (because we have major onsales going now...) to have a waiting room solution (Queue-it) deployed, so we were able to put everyone in a queue. And yet, because it's not a DNS-based solution (like CrowdHandler) but instead is implemented directly in TNEW, it requires TNEW to be working a bit, just like the built-in shutdown message.

  • Of course it only works when TNEW is working at least a little bit

    That's the rub, isn't it?  But also, surely TNEW is generally in the best position to make determinations of when to display such messages, rather than waiting for someone to notice a customer email, and then call our web admin at 11 at night to log in to Tessitura from wherever they are to change a system table, again assuming some part of RAMP is functioning.

  • I agree completely, though to play devil's advocate for a moment – in the case of a variable outage like this, how do they know whose sites are the "most" down? What about the collateral damage of shutting down sites that were working relatively well, if any? </devilsadvocate>

    In the absence of a TNEW automatic shutdown notice, I'm wondering about setting up our own go-between service (similar to Dan Spees' idea below) that fields requests to our tickets URL, is smart enough to monitor TNEW and redirect to it when it's working, and don't when it's not. But also as Dan says, there may be better uses of time and resources to deal with outages effectively (including holding Tess responsible where they should be).

  • in the case of a variable outage like this, how do they know whose sites are the "most" down? What about the collateral damage of shutting down sites that were working relatively well, if any

    My most recent experiences are of services coming up partially, then going down again.  If it's variable enough that it's moving around like that for everyone, then it probably would be safer to just take everyone down rather than hope that everyone who happens to be up now will remain up.

    And if it is fundamentally linked to different organizations, then it would be good if they had a mechanism to assess who is up and who is down (even if it's just calling people, or manually trying to buy tickets on their sites).

    We could do more: our pre-commerce site is our own, and we should have something posted there, but I don't think I'm getting through to our web admin, so single point of failure there.

  • Exactly. But we can't forget this is poor planning on Tessitura's part and we're starting to see the real issues come from it.

    What's worse is the performance is terrible. The choices are all over the place and never make sense.

    Now we are seeing Angular pop up in the client as if TN is going to be able to keep up with it's turnover. TN is going to get stuck supporting Angular 4 or 5 on older machines. It's as if they've learned nothing from their software rot over the last few years.

    Tessitura is a horrible choice for small organization. It's strong focus marketing features (Dashboards? Tessitura OTG?) and Technical dependence on older software commonly baked into the service (Powerbuilder, Silverlight) over actual technological development (support for cloud virtualization, decentralization of services) is all horrible news for any customer. I commonly find myself playing goalie to updates just because some department wanted to implement something they saw as a feature.

    Now Tessitura has jumped on the bandwagon of "Agile" without knowing how to properly handle it's legacy products it's just no place for any organization, small or large.

Reply
  • Exactly. But we can't forget this is poor planning on Tessitura's part and we're starting to see the real issues come from it.

    What's worse is the performance is terrible. The choices are all over the place and never make sense.

    Now we are seeing Angular pop up in the client as if TN is going to be able to keep up with it's turnover. TN is going to get stuck supporting Angular 4 or 5 on older machines. It's as if they've learned nothing from their software rot over the last few years.

    Tessitura is a horrible choice for small organization. It's strong focus marketing features (Dashboards? Tessitura OTG?) and Technical dependence on older software commonly baked into the service (Powerbuilder, Silverlight) over actual technological development (support for cloud virtualization, decentralization of services) is all horrible news for any customer. I commonly find myself playing goalie to updates just because some department wanted to implement something they saw as a feature.

    Now Tessitura has jumped on the bandwagon of "Agile" without knowing how to properly handle it's legacy products it's just no place for any organization, small or large.

Children
  • This exactly.  It is extremely hard to architect a highly fault-tolerant application with all the legacy software and requirements wrapped around Tessitura.  Until a move is made to switch to a more micro-services approach, this won't change.  The REST API is a move in the right direction here but from what I've heard, it's performance doesn't make that switch so appealing.