Tuesday, March 21, 2017

What kind of JOIN syntax is this?

I am not often puzzled when looking at TSQL syntax, but this one time I could not figure out the syntax. Syntax looked similar to the example below.
 
SELECT  t.*
      , s.*
FROM    Teacher AS t
LEFT OUTER JOIN dbo.Student AS s
JOIN dbo.Grades AS gr
    ON s.ID = gr.StudentID
       AND  gr.GradeLetter = 'A'
    ON s.TeacherID = t.ID


My first gut reaction was that this code is broken and would not run. To my amazement code ran just fine. Now came the hard part, which was to figure out what the code was doing because I have never seen this syntax before. Since I did not understand what I was looking at I could not BING "weird join syntax" to get an answer. As a developer, I learned long time ago to break down code into smallest possible chunks to get the answer.

After I have figured out the relationship between tables, I was able to understand what query was doing. To be able to read query better it can be rewritten in the following way.
 
SELECT  t.*
      , s.*
FROM    Teacher AS t
LEFT OUTER JOIN
            (
                SELECT  s.*
                      , gr.*
                FROM    dbo.students AS s
                JOIN dbo.Grades AS gr
                    ON s.ID                = gr.StudentID
                       AND  gr.GradeLetter = 'A'
            ) AS s
    ON s.TeacherID = t.ID
This code is much easier to read and to understand. Students filtered by Grades and we are only interested in Students who have "A" grade. Therefore we get all records from Teacher table but only get Students who have good grades. If we would try to rewrite the query in the following matter, we would only get Teachers that have Students with perfect grades as we specified inner join; therefore, it takes precedence over the left outer join.
 
SELECT  t.*
      , s.*
FROM    Teacher AS t
LEFT OUTER JOIN dbo.Student AS s
    ON s.TeacherID         = t.ID
JOIN dbo.Grades AS gr
    ON s.ID                = gr.StudentID
       AND  gr.GradeLetter = 'A'
Now that I knew what code was doing I wanted to know performance because I do not care how the code looks as long as it does not perform fast it is useless. The execution plans of the two queries are almost identical with the only difference being that when I rewrote the original query as sub-query, I got additional operation added to the plan. Physical joins were identical and execution time was almost exact.


I could not believe I learned something new after writing T-SQL for many years. All of the sudden I got another tool in my toolbelt to write better code. Now if I ever need to filter tables that should not effect result I can write with this new syntax. To be honest, I have not taken this to the extreme to see how many JOIN operations I can nest before it becomes unreadable. However, I am sure it would not take many before someone else could not figure out what was going on and would have to break it down piece by piece to see it clear.

Just imagine having to support code that looks like example 1 below instead of example 2. Even though the code in the following case produces same execution plan, it becomes hard to read and therefore hard to maintain, and if you have junior developers who are not strong in SQL, they are more likely to make mistakes.
 
--Example 1:
SELECT  t.FullName AS TeacherName
      , s.FullName AS StudentName
      , gr.Class
FROM    dbo.Teacher AS t
JOIN dbo.Student AS s
JOIN dbo.Grades AS gr
JOIN dbo.Class AS c
JOIN dbo.Room AS r
    ON c.ID = gr.ClassID
    ON r.ClassID = c.ID
    ON s.ID = gr.StudentID
    ON s.TeacherID = t.ID

--Example 2:
SELECT  t.FullName AS TeacherName
      , s.FullName AS StudentName
      , gr.Class
FROM  dbo.Teacher AS t
JOIN dbo.Student AS s
    ON t.ID = s.TeacherID
JOIN dbo.Grades AS gr
    ON s.ID = gr.StudentID
JOIN dbo.Class AS c
    ON c.ID = r.ClassID
JOIN dbo.Room AS r
    ON c.ID = gr.ClassID
P.S. I still don't know if this JOIN syntax has special name, but at least I now know what it does.