Tuesday, May 9, 2017

T-SQL Tuesday #90 – Shipping Database Changes

Last month I finally figured out what is a TSQLTuesday after I have "accidentally" used a hashtag that I did not fully understand. This time around I want to contribute to the topic of a month: "Shipping Database Changes."

As a full-time developer, this particular subject is close and dear to my heart. When it comes to Shipping Database Changes, it is an absolute necessity for all developers. Since this is such a broad subject, I want to focus on specifics of how I manage it in my current work environment.

First a little bit of a background on my current role and the company I where I work. My official title is a Senior Software Developer for CivicaCMI, my primary responsibility is to developer new software. Even though I end up writing my share of .NET code at the end of the day I am a SQL Server developer who writes hundreds of lines of T-SQL a week to produce software that matches specification.

Most of my work is broken down into ticket items that request a specific feature or address a particular bug. I would say that 90% of update requests take only from several minutes to several hours to complete. As long as changes restricted to a single object, it is very easy to generate a script file. I have adapted a habit of creating change scripts as soon as I finish development of the item. This can be both good and bad. It is good that I already have a file that I can check-in to version control, but bad if QA finds a problem and I need to make multiple changes before it passes testing.

Another approach that I have to resort to is running Redgate Schema Compare software to identify all changes between development and test servers. Depending on developers working on the same database and number of unfinished changes it is possible to generate too many changes that do not need to promotion. Thus it is possible to spend extra time sifting through before creating correct change script. After having to do this process multiple times as a last resort. I have implemented a rule for each developer to generate their scripts. Otherwise, I have found myself spending almost entire day packaging their changes, and in the end, I still had to verify that work was complete.

After trying many things that did not seem to work, I realized that it is a combination of software and procedures that enable streamlined process for shipping database changes.

Software:

I have found through trial and error the following list of applications enabled me to achieve desired results.
  • Redgate Source Control - Very useful to keep track of all changes between builds. Moreover, very often I use it to rollback or troubleshoot my changes.
  • Redgate Schema Compare - Primary software I use to identify differences between development and test servers
  • SSMS 10.50 - Even though I use latest SSMS (14.0) for T-SQL development, whenever I need to generate DROP and CREATE script only SSMS 10.50 produces by default IF EXISTS DROP scripts for all objects.
Now before I go any further I would like to put a disclaimer about, DROP/CREATE of SQL objects. Since the application is connecting with specific SQL User, I do not have to worry about the issue with dropping permission in production. Additionally, because users are out of the application during the upgrade, I do not need to do anything special for handling multiple version of the same object running at the same time.

Procedures/Rules:

Now, these are not in any particular order.
  • Your own scripts - Each developer is responsible for generating their own scripts. If time permits I still verify it but I do not spend extended amount of time going over it.
  • Script files - First of all I require that only one SQL Object is modified in a file. This way if there is an issue with that file during deployment, the new version can be checked in to fix the problem.
  • File names - I have implemented the following standard.
    • 10_schema_objectname - this is used only for scripting changes to tables and other base object types.
    • 20_schame_objectname - this is used for all views because files applied in alphabetical order it is guaranteed that all base tables are in the database before any change to views.
    • 30_schema_objectname - this files include all data changes, like updates to lookup tables and other static data that is not usually changed from within application
    • 40_schema_objectname - this is done for all SP, FN and other SQL Server objects. If there is a dependency on another SP or FN, it is required to change a dependent object to a higher number like 41 or 42. In some cases of multiple nested dependencies, it is possible to go higher in numbers. Nonetheless, scripts arranged in a manner that ensures all dependent objects created in right order.
  • DROP/CREATE - All views, SP and FN are scripted as IF EXISTS DROP GO CREATE. This guarantees that if a script run on a client with version mismatch it does not fail. Alternative for other people who have to worry about losing permission in production system they can script IF NOT EXISTS CREATE GO ALTER

When it comes to actual deployment to Test and production servers, it is handled by application update program that runs scripts on the target server one by one in alphabetical order. Since we have clients running different versions, scripts always have to be applied in order, for example, if the customer is on version 1.5 before the could get 2.5 they need 2.0. This ensures that database changes are applied in correct order, and I don't have to worry about something breaking.

One last problem that I have to deal with on a regular basis is Version-drift. This is caused when I manually patch a client for a fix without going through the proper build process. In those cases, I just have to manually merge changes into development to guarantee that it will make it out to other clients. Once in a while, it becomes quite complicated to keep track of different clients running different versions and how to ensure that if they need a fix, it is not something that could be resolved through update versus manual code changes.

Conclusion

Even though this is not a full-proof plan, it seems to work for me and organization where I work. Over last 12 month of my employment, I have been part of multiple major and minor build of the software. I realize that it is possible that for some developers none of the things I have described are of any use while for others they can easily relate to them and possibly even adopt some into their process.

Thanks for reading. Don't forget to leave comments, questions or concerns. I always welcome constructive criticism.

Tuesday, April 11, 2017

THROW versus RAISERROR

Error handling is just as important as verifying that your code runs and produces correct results. For developers, error handling is last chance to prepare code to fail gracefully instead of just blowing up and having to pick up pieces later.

When it comes to SQL Server, there are few options for error handling. Basics of error handling are wrapping the code that is likely to fail inside the TRY block and then dealing with failures inside the CATCH block.

Let's look at a most basic scenario below.
 
SELECT 1 / 0

SELECT 1
The first statement generates the error division by 0, but the second statement still executed as nothing is controlling what happens when errors encountered. In some cases, this is perfectly fine, but in others, we want to stop the execution of our script in the event of encountered errors. We might even want to roll back the transaction to prevent partial data change.

Now if we take the same code and place it inside TRY block we get different behavior. In this case, the first statement generates the error but the second statement is not run.
 
BEGIN TRY
SELECT 1 / 0

SELECT 1
END TRY
BEGIN CATCH

END CATCH

Now that we covered the basics let's look at controlling a flow of the code through the use of RAISERROR and THROW. In some cases, we need to fail code based on business rules which pass syntax test. For example, if our procedure is responsible for withdrawing money from an account, we might have a business rule that prevents going into negative. For the first example, we are just going to try to withdraw money.
 
CREATE PROCEDURE dbo.WithdrawMoney
    @AccountId INT
  , @Amount    DECIMAL(18, 2)
AS
    BEGIN
        UPDATE  dbo.Accounts
        SET Amount = Amount - @Amount
        WHERE   AccountID = @AccountId
    END
In this example we can keep withdrawing money until arighmetic overflow error will be thrown for creating large negative number. Since we want to avoid negative balance we are going to add code to where clause.
 
CREATE PROCEDURE dbo.WithdrawMoney
    @AccountId INT
  , @Amount    DECIMAL(18, 2)
AS
    BEGIN
        UPDATE  dbo.Accounts
        SET Amount = Amount - @Amount
        WHERE   AccountID              = @AccountId
                AND (Amount - @Amount) > 0
    END
Now, whenever you try to withdraw money that results in negative amount update will "fail." This failure is only reflected by the number of rows affected. In this instance, it is zero. Now we need something in front end application that reads the number of rows affected by stored procedure to understand what happened.

Another way of handling this would be to run a check first and to generate the error that is sent back to front end. So we write two stored procedures one using RAISERROR and another one using THROW and expect the same result.
 
CREATE PROCEDURE dbo.WithdrawMoney_RAISERROR
    @AccountId INT
  , @Amount    DECIMAL(18, 2)
AS
    BEGIN
        IF EXISTS
            (
                SELECT  1
                FROM    dbo.Accounts
                WHERE   AccountID              = @AccountId
                        AND (Amount - @Amount) > 0
            )
            RAISERROR( 'Insufficient Amount in account to perform withdrawal', 16, 1 )

        UPDATE  dbo.Accounts
        SET Amount = Amount - @Amount
        WHERE   AccountID = @AccountId
    END
GO
CREATE PROCEDURE dbo.WithdrawMoney_THROW
    @AccountId INT
  , @Amount    DECIMAL(18, 2)
AS
    BEGIN
        IF EXISTS
            (
                SELECT  1
                FROM    dbo.Accounts
                WHERE   AccountID              = @AccountId
                        AND (Amount - @Amount) > 0
            )
            THROW 51000, 'Insufficient Amount in account to perform withdrawal', 1

        UPDATE  dbo.Accounts
        SET Amount = Amount - @Amount
        WHERE   AccountID = @AccountId
    END
Upon executing the first procedure, we get the error message back to the front end, but after checking balance, we find that money withdrawn from the account, but in the case of the second procedure, the same error returned to the front end but money still there.

Now we begin to scratch our head trying to figure out why we lost the money even though we got errors in both cases. The truth behind is the fact that RAISERROR does not stop the execution of code if it is outside of TRY CATCH block. To get same behavior out of RAISERROR, we would need to rewrite procedure to look something like following example.
 
CREATE PROCEDURE dbo.WithdrawMoney_RAISERROR
    @AccountId INT
  , @Amount    DECIMAL(18, 2)
AS
    BEGIN
        BEGIN TRY
            IF EXISTS
                (
                    SELECT  1
                    FROM    dbo.Accounts
                    WHERE   AccountID              = @AccountId
                            AND (Amount - @Amount) > 0
                )
                RAISERROR( 'Insufficient Amount in account to perform withdrawal', 16, 1 )


            UPDATE  dbo.Accounts
            SET Amount = Amount - @Amount
            WHERE   AccountID = @AccountId
        END TRY
        BEGIN CATCH
            DECLARE @ErrorMessage  VARCHAR(4000)
                  , @ErrorSeverity INT
                  , @ErrorState    INT

            SELECT  @ErrorMessage  = ERROR_MESSAGE()
                  , @ErrorSeverity = ERROR_SEVERITY()
                  , @ErrorState    = ERROR_STATE()

            RAISERROR( @ErrorMessage, @ErrorSeverity, @ErrorState )
        END CATCH
    END
Now we added another complexity to the procedure because if we wrap RAISERROR in TRY CATCH block, we suppress the error and the front end left clueless as to what happened inside stored procedure. Therefore we need to "rethrow" it by catching the information about it and doing another RAISERROR. If we choose to do the same with THROW, it is a whole lot simpler. Just look at the next example.
 
CREATE PROCEDURE dbo.WithdrawMoney_THROW
    @AccountId INT
  , @Amount    DECIMAL(18, 2)
AS
    BEGIN
        BEGIN TRY
            IF EXISTS
                (
                    SELECT  1
                    FROM    dbo.Accounts
                    WHERE   AccountID              = @AccountId
                            AND (Amount - @Amount) > 0
                )
                THROW 51000, 'Insufficient Amount in account to perform withdrawal', 1


            UPDATE  dbo.Accounts
            SET Amount = Amount - @Amount
            WHERE   AccountID = @AccountId
        END TRY
        BEGIN CATCH
            THROW
        END CATCH
    END
In this case, we simply add THROW to CATCH block which acts as "RETHROW." Unfortunately for me, I write most of the code for SQL Server 2008 R2 which did not have THROW command. It is only available starting with SQL Server 2012. Therefore I have to use RAISERROR and always remember to wrap it inside TRY CATCH block to ensure that it always stops execution when I want it to do so.

Be sure to leave a comment and thank you so much for reading.

Tuesday, March 21, 2017

What kind of JOIN syntax is this?

I am not often puzzled when looking at TSQL syntax, but this one time I could not figure out the syntax. Syntax looked similar to the example below.
 
SELECT  t.*
      , s.*
FROM    Teacher AS t
LEFT OUTER JOIN dbo.Student AS s
JOIN dbo.Grades AS gr
    ON s.ID = gr.StudentID
       AND  gr.GradeLetter = 'A'
    ON s.TeacherID = t.ID


My first gut reaction was that this code is broken and would not run. To my amazement code ran just fine. Now came the hard part, which was to figure out what the code was doing because I have never seen this syntax before. Since I did not understand what I was looking at I could not BING "weird join syntax" to get an answer. As a developer, I learned long time ago to break down code into smallest possible chunks to get the answer.

After I have figured out the relationship between tables, I was able to understand what query was doing. To be able to read query better it can be rewritten in the following way.
 
SELECT  t.*
      , s.*
FROM    Teacher AS t
LEFT OUTER JOIN
            (
                SELECT  s.*
                      , gr.*
                FROM    dbo.students AS s
                JOIN dbo.Grades AS gr
                    ON s.ID                = gr.StudentID
                       AND  gr.GradeLetter = 'A'
            ) AS s
    ON s.TeacherID = t.ID
This code is much easier to read and to understand. Students filtered by Grades and we are only interested in Students who have "A" grade. Therefore we get all records from Teacher table but only get Students who have good grades. If we would try to rewrite the query in the following matter, we would only get Teachers that have Students with perfect grades as we specified inner join; therefore, it takes precedence over the left outer join.
 
SELECT  t.*
      , s.*
FROM    Teacher AS t
LEFT OUTER JOIN dbo.Student AS s
    ON s.TeacherID         = t.ID
JOIN dbo.Grades AS gr
    ON s.ID                = gr.StudentID
       AND  gr.GradeLetter = 'A'
Now that I knew what code was doing I wanted to know performance because I do not care how the code looks as long as it does not perform fast it is useless. The execution plans of the two queries are almost identical with the only difference being that when I rewrote the original query as sub-query, I got additional operation added to the plan. Physical joins were identical and execution time was almost exact.


I could not believe I learned something new after writing T-SQL for many years. All of the sudden I got another tool in my toolbelt to write better code. Now if I ever need to filter tables that should not effect result I can write with this new syntax. To be honest, I have not taken this to the extreme to see how many JOIN operations I can nest before it becomes unreadable. However, I am sure it would not take many before someone else could not figure out what was going on and would have to break it down piece by piece to see it clear.

Just imagine having to support code that looks like example 1 below instead of example 2. Even though the code in the following case produces same execution plan, it becomes hard to read and therefore hard to maintain, and if you have junior developers who are not strong in SQL, they are more likely to make mistakes.
 
--Example 1:
SELECT  t.FullName AS TeacherName
      , s.FullName AS StudentName
      , gr.Class
FROM    dbo.Teacher AS t
JOIN dbo.Student AS s
JOIN dbo.Grades AS gr
JOIN dbo.Class AS c
JOIN dbo.Room AS r
    ON c.ID = gr.ClassID
    ON r.ClassID = c.ID
    ON s.ID = gr.StudentID
    ON s.TeacherID = t.ID

--Example 2:
SELECT  t.FullName AS TeacherName
      , s.FullName AS StudentName
      , gr.Class
FROM  dbo.Teacher AS t
JOIN dbo.Student AS s
    ON t.ID = s.TeacherID
JOIN dbo.Grades AS gr
    ON s.ID = gr.StudentID
JOIN dbo.Class AS c
    ON c.ID = r.ClassID
JOIN dbo.Room AS r
    ON c.ID = gr.ClassID
P.S. I still don't know if this JOIN syntax has special name, but at least I now know what it does.