SQL High Availability

We are currently reviewing several options in restructruing our server infrastructrue as it relates to high availability for all Tessitura applications.  We would like to know how other organizations have successfully addressed fault tolerance within their Tessitura server configurations. 

Parents
  • I'll take a crack at this.  We have set up a clustered pair of servers for each function (2 SQL Servers, 2 seat/report servers, 2 API Gateways) using MS Clustering.  The idea is that each server is backed up by an identical twin server; during normal operations, the "Live" node of the cluster is on one machine and the "Test" node is on the other, but both can run both Live and Test, and indeed would in the event of failover.

    I would call it a qualified success; very often the clustering service itself is a source of trouble and requires some expertise.  I've found that SQL Server clustering is the most tightly integrated and easiest to setup and maintain; one machine goes down and the other picks it right up.  Clustering the seat and web server services takes some time and testing and a good knowledge of networking.  We do not have our credit card server clustered.

    Ideally we would have separated each pair of the cluster into two physical locations but this was not an option for us, so for disaster recovery we are looking into Tessitura's RADR solution.

  • Just a question.

    Are any of you using or plan to use SQL always on? I'm also looking at availability and would appreciate any advice or learning's that you have.

  • Dmitrij (now at Sadler's Wells) and I set up an Always On SQL Cluster (MSSQL 2012) at the National Theatre. We were doing it as part of a larger project that was addressing PCI issues - so had separated Test and Live environments, moved clients to Remote Apps, load balanced terminal servers and also for client updates (UAC issues on windows 7), and load balanced the API server for performance and stability.

    Had it with 1 Active and 1 Read-only (just so us DBAs could run queries even during on-sales)

    It did come in useful as one of the updates that incuded a SP release failed on our main DB on a sunday and I was able to get the second DB up and active in about 15 minutes - admittedly took a couple of days to figure out a few of the steps I had missed  (like reapplying the MASTER key encryption) - this also gave us time to look at the otehr DB and get that fixed and back in the cluster before our next on-sale.

    After that we also worked with the SA team so that we were patched the secondary in hours and then the next day or so we flipped that to be the primary primary so that the other DB could then be patched. This meant a DBA could be around when this was happening.

    Mark

Reply
  • Dmitrij (now at Sadler's Wells) and I set up an Always On SQL Cluster (MSSQL 2012) at the National Theatre. We were doing it as part of a larger project that was addressing PCI issues - so had separated Test and Live environments, moved clients to Remote Apps, load balanced terminal servers and also for client updates (UAC issues on windows 7), and load balanced the API server for performance and stability.

    Had it with 1 Active and 1 Read-only (just so us DBAs could run queries even during on-sales)

    It did come in useful as one of the updates that incuded a SP release failed on our main DB on a sunday and I was able to get the second DB up and active in about 15 minutes - admittedly took a couple of days to figure out a few of the steps I had missed  (like reapplying the MASTER key encryption) - this also gave us time to look at the otehr DB and get that fixed and back in the cluster before our next on-sale.

    After that we also worked with the SA team so that we were patched the secondary in hours and then the next day or so we flipped that to be the primary primary so that the other DB could then be patched. This meant a DBA could be around when this was happening.

    Mark

Children
  • The Science Museum of Minnesota is also HA on the production side. There were some things we needed to do for efficiency. One, was moving the SSRS reporting services to another server to centralize reporting. We also ran into an issue where the SIDS would not syncronize after a failover. We finally found a solution here: http://www.msbicoe.com/post/2012/04/05/Automatically-Transfer-and-Synchronize-SQL-Server-logins-SIDs-Passwords-and-Permissions.aspx for that.

  • The logins can be an issue, but as Tessitura stores active logins in a table I was able to write a script to re-enable those that were already on the server

    declare @table as table (
       username varchar(50),
       usersid varbinary(4000)
    )

    insert into @table
    exec sp_change_users_login 'Report'

    delete @table
    where username not in (
       select username
       from @table u
       join impresario.dbo.T_METUSER m on u.username=m.db_userid and m.inactive='N'
    )

    declare @login varchar(50)

    select @login=(
       select top 1 username
       from @table
       order by username
    )


    while @login is not null
    begin
       BEGIN try
          exec sp_change_users_login 'Update_One', @login, @login
          print 'Updated ' + @login
       END TRY
       BEGIN CATCH
          print 'Updated failed for ' + @login
       end catch

       select @login=(
        select top 1 username
        from @table
        where username>@login
          order by username
       )
    end

    Needs to be run on both impresario and impresario_cci

    For newer users who didn't have a login on the secondary I just used the security program to create their accounts, but at least that number was a lot easier to manage.

    Due to issues like this we didn't switch on automatic failover and left it is Manual.

    Mark