10 SQL Questions and Answers.

1. What is the difference between HAVING and WHERE?
WHERE: is used to check conditions before the aggregation takes place.

SELECT state, COUNT(city) as city_count
FROM Address
WHERE country = 'IN'
GROUP BY state

This gives you a table of all states in 'IN' and the number of cities in each state.

HAVING: is used to check conditions after the aggregation takes place.

SELECT state, COUNT(city) as city_count
FROM Address
WHERE country = 'IN'
GROUP BY state
HAVING COUNT(city)>5

Gives you a table of states in 'IN' with more than 5 cities and the number of cities in each state.

2a) SELECT COUNT(DISTINCT name) vs SELECT DISTINCT COUNT(name).
2b) When are COUNT(name) and COUNT DISTINCT(name) equal?
a) SELECT COUNT(DISTINCT name) returns the number of rows that have unique names. For e.g. For a table with names : [Lakshmi, Karthik, Karthik, Rahul, Rahul, Rahul], the response is 3. SELECT DISTINCT COUNT(name) returns the number of unique counts of names present. For the same set of names as the previous example, the result would be [1, 2, 3]
b) If the table column does not have duplicate data, then the response would be the same for both SELECT COUNT(name) and SELECT COUNT(DISTINCT name)

3. Can left and inner join return the same results?
LEFT JOIN and INNER JOIN might provide the same result when, all the records of left table has at least one match on the right table. The main difference between INNER JOIN and LEFT JOIN is that LEFT JOIN still displays the records on the LEFT side even if they have no match on the RIGHT side table.

4. What is the difference between RANK and DENSE RANK?
RANK gives you the ranking within your ordered partition. Ties are assigned the same rank, with the next ranking(s) skipped. So, if you have 3 items at rank 2, the next rank listed would be ranked 5.
DENSE_RANK again gives you the ranking within your ordered partition, but the ranks are consecutive. No ranks are skipped if there are ranks with multiple items.

SELECT employee_name, department, salary,
RANK() OVER (partition by department ORDER BY salary nulls first) r,
DENSE_RANK() OVER (partition by department ORDER BY salary nulls first) dr,
DENSE_RANK() OVER (partition by department ORDER BY salary nulls last) dr2
FROM Employees

employee_name  department  salary   r    dr    dr2  
-------------- ------------ ------ ---- ---- ----
Rahul         10           null    1    1    4  
Ajay          10           10000   2    2    1  
Siva          10           40000   3    3    2
Akshaya       10           40000   3    3    2
Shruti        10           50000   5    4    3
Sabari        10           50000   5    4    3
Vignesh       11           5000    1    1    1
Karthi        11           20000   2    2    2
Ravi          12           30000   1    1    1

5. Can UNION and UNION ALL return the same results?
UNION removes duplicate records (where all columns in the results are the same), UNION ALL does not. Both these commands result the same results when there are no duplicate rows present (both data sets combined).

6. What are the best scenarios to use a self join?
You use a self join when a table references data in itself.

E.g., an Employee table of an Employee Portal app, may have a supervisor_id column that points to the employee that is the boss of the current employee. Or a Catogory table of an e-commerce website may have a sub_category_id column that points to a category present in the same table.

7. How can you use SQL to dedupe rows?
Dedupe here refers to removal of duplicate rows based on a certain number of columns.
Solution 1: One can use CTE's(Common Table Expressions) to delete using analytic function ROW_NUMBER().

WITH CTE (col1, col2, dupcnt)
AS
(
SELECT col1, col2,
ROW_NUMBER() OVER (PARTITION BY col1, col2 ORDER BY col1) AS dupcnt
FROM MyTable
)
DELETE
FROM CTE
WHERE dupcnt > 1
GO

Solution 2: Use SELF JOINs

SELECT emp_name, emp_address, sex, marital_status
from MyTable a
WHERE NOT EXISTS (select 1 
         from MYTable b
         where b.emp_name = a.emp_name and
               b.emp_address = a.emp_address and
               b.sex = a.sex and
               b.create_date >= a.create_date)

Solution 3: Use GROUP BY

SELECT first_name, last_name, mobile_no
FROM  Customer
GROUP BY first_name, last_name, mobile_no;

8. In what scenarios are the LAG function useful? SQL Server provides LAG() function which is very useful in case the current row values need to be compared with the data/value of the previous record or any record before the previous record. Syntax:

LAG (scalar_expression [, offset] [, default])  
OVER ( [ partition_by ] order_by )

For Example

SELECT organisation, year, revenue,
LAG (revenue, 1, 0) 
OVER (PARTITION BY organisation ORDER BY year) AS PrevYearRevenue  
FROM OrgTable 
ORDER BY organisation, year;

9. Use a window function to calculate a running total.
As simple as

WITH data AS (
  SELECT
    date(rental_date) as day,
    count(rental_id) as rental_count
  FROM Rental
  GROUP BY day
)
SELECT  day, rental_count,
   SUM(rental_count) OVER (ORDER BY day) as cumulative_sum
from data;

Note: The sum(), lead(), lag() functions are supported only in MySQL version >= 8

10. Why would you use GROUP BY ROLLUP?
The ROLLUP is an extension of the GROUP BY clause. The ROLLUP option allows you to include extra rows that represent the subtotals, which are commonly referred to as super-aggregate rows, along with the grand total row. By using the ROLLUP option, you can use a single query to generate multiple grouping sets. For e.g. when we run the following query, we get:

SELECT
    warehouse, 
    SUM (quantity) qty
FROM
    inventory
GROUP BY
    warehouse;

warehouse      qty
------------------
Chennai         11 
Mumbai          10

whereas, the same query with ROLLUP gives an extra row with the sum of the individual counts.

SELECT 
    warehouse, SUM(quantity) qty
FROM
    inventory
GROUP BY ROLLUP (warehouse);

warehouse      qty
------------------
Chennai         11 
Mumbai          10  
null            21

If you learned something new from the post, leave a 👍. நன்றி.

~Rahul

10 SQL Questions and Answers.

Some intriguing SQL questions and their answers