SQL High Availability

David Lane $organization over 17 years ago

We are currently reviewing several options in restructruing our server infrastructrue as it relates to high availability for all Tessitura applications. We would like to know how other organizations have successfully addressed fault tolerance within their Tessitura server configurations.

Parents

Michael Reisman $organization over 17 years ago

I'll take a crack at this. We have set up a clustered pair of servers for each function (2 SQL Servers, 2 seat/report servers, 2 API Gateways) using MS Clustering. The idea is that each server is backed up by an identical twin server; during normal operations, the "Live" node of the cluster is on one machine and the "Test" node is on the other, but both can run both Live and Test, and indeed would in the event of failover.

I would call it a qualified success; very often the clustering service itself is a source of trouble and requires some expertise. I've found that SQL Server clustering is the most tightly integrated and easiest to setup and maintain; one machine goes down and the other picks it right up. Clustering the seat and web server services takes some time and testing and a good knowledge of networking. We do not have our credit card server clustered.

Ideally we would have separated each pair of the cluster into two physical locations but this was not an option for us, so for disaster recovery we are looking into Tessitura's RADR solution.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Cancel
David Cooper $organization over 9 years ago in reply to Michael Reisman

Just a question.

Are any of you using or plan to use SQL always on? I'm also looking at availability and would appreciate any advice or learning's that you have.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Cancel
Mark Ridley $organization over 9 years ago in reply to David Cooper

Dmitrij (now at Sadler's Wells) and I set up an Always On SQL Cluster (MSSQL 2012) at the National Theatre. We were doing it as part of a larger project that was addressing PCI issues - so had separated Test and Live environments, moved clients to Remote Apps, load balanced terminal servers and also for client updates (UAC issues on windows 7), and load balanced the API server for performance and stability.

Had it with 1 Active and 1 Read-only (just so us DBAs could run queries even during on-sales)

It did come in useful as one of the updates that incuded a SP release failed on our main DB on a sunday and I was able to get the second DB up and active in about 15 minutes - admittedly took a couple of days to figure out a few of the steps I had missed (like reapplying the MASTER key encryption) - this also gave us time to look at the otehr DB and get that fixed and back in the cluster before our next on-sale.

After that we also worked with the SA team so that we were patched the secondary in hours and then the next day or so we flipped that to be the primary primary so that the other DB could then be patched. This meant a DBA could be around when this was happening.

Mark
Cancel
Vote Up 0 Vote Down

Sign in to reply

Cancel

Reply

Mark Ridley $organization over 9 years ago in reply to David Cooper

Dmitrij (now at Sadler's Wells) and I set up an Always On SQL Cluster (MSSQL 2012) at the National Theatre. We were doing it as part of a larger project that was addressing PCI issues - so had separated Test and Live environments, moved clients to Remote Apps, load balanced terminal servers and also for client updates (UAC issues on windows 7), and load balanced the API server for performance and stability.

Had it with 1 Active and 1 Read-only (just so us DBAs could run queries even during on-sales)

It did come in useful as one of the updates that incuded a SP release failed on our main DB on a sunday and I was able to get the second DB up and active in about 15 minutes - admittedly took a couple of days to figure out a few of the steps I had missed (like reapplying the MASTER key encryption) - this also gave us time to look at the otehr DB and get that fixed and back in the cluster before our next on-sale.

After that we also worked with the SA team so that we were patched the secondary in hours and then the next day or so we flipped that to be the primary primary so that the other DB could then be patched. This meant a DBA could be around when this was happening.

Mark
Cancel
Vote Up 0 Vote Down

Sign in to reply

Cancel

Children

Troy Nelson $organization over 9 years ago in reply to Mark Ridley

The Science Museum of Minnesota is also HA on the production side. There were some things we needed to do for efficiency. One, was moving the SSRS reporting services to another server to centralize reporting. We also ran into an issue where the SIDS would not syncronize after a failover. We finally found a solution here: http://www.msbicoe.com/post/2012/04/05/Automatically-Transfer-and-Synchronize-SQL-Server-logins-SIDs-Passwords-and-Permissions.aspx for that.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Cancel
Mark Ridley $organization over 9 years ago in reply to Troy Nelson

The logins can be an issue, but as Tessitura stores active logins in a table I was able to write a script to re-enable those that were already on the server

declare @table as table (
username varchar(50),
usersid varbinary(4000)
)

insert into @table
exec sp_change_users_login 'Report'

delete @table
where username not in (
select username
from @table u
join impresario.dbo.T_METUSER m on u.username=m.db_userid and m.inactive='N'
)

declare @login varchar(50)

select @login=(
select top 1 username
from @table
order by username
)

while @login is not null
begin
BEGIN try
exec sp_change_users_login 'Update_One', @login, @login
print 'Updated ' + @login
END TRY
BEGIN CATCH
print 'Updated failed for ' + @login
end catch

select @login=(
select top 1 username
from @table
where username>@login
order by username
)
end

Needs to be run on both impresario and impresario_cci

For newer users who didn't have a login on the secondary I just used the security program to create their accounts, but at least that number was a lot easier to manage.

Due to issues like this we didn't switch on automatic failover and left it is Manual.

Mark
Cancel
Vote Up 0 Vote Down

Sign in to reply

Cancel