1. What is the difference between HAVING and WHERE?
WHERE: is used to check conditions before the aggregation takes place.
SELECT state, COUNT(city) as city_count
FROM Address
WHERE country = 'IN'
GROUP BY state
This gives you a table of all states in 'IN' and the number of cities in each state.
HAVING: is used to check conditions after the aggregation takes place.
SELECT state, COUNT(city) as city_count
FROM Address
WHERE country = 'IN'
GROUP BY state
HAVING COUNT(city)>5
Gives you a table of states in 'IN' with more than 5 cities and the number of cities in each state.
2a) SELECT COUNT(DISTINCT name) vs SELECT DISTINCT COUNT(name).
2b) When are COUNT(name) and COUNT DISTINCT(name) equal?
a) SELECT COUNT(DISTINCT name)
returns the number of rows that have unique names. For e.g. For a table with names : [Lakshmi, Karthik, Karthik, Rahul, Rahul, Rahul], the response is 3.
SELECT DISTINCT COUNT(name)
returns the number of unique counts of names present.
For the same set of names as the previous example, the result would be [1, 2, 3]
b) If the table column does not have duplicate data, then the response would be the same for both SELECT COUNT(name)
and SELECT COUNT(DISTINCT name)
3. Can left and inner join return the same results?
LEFT JOIN and INNER JOIN might provide the same result when, all the records of left table has at least one match on the right table.
The main difference between INNER JOIN and LEFT JOIN is that LEFT JOIN still displays the records on the LEFT side even if they have no match on the RIGHT side table.
4. What is the difference between RANK and DENSE RANK?
RANK gives you the ranking within your ordered partition. Ties are assigned the same rank, with the next ranking(s) skipped. So, if you have 3 items at rank 2, the next rank listed would be ranked 5.
DENSE_RANK again gives you the ranking within your ordered partition, but the ranks are consecutive. No ranks are skipped if there are ranks with multiple items.
SELECT employee_name, department, salary,
RANK() OVER (partition by department ORDER BY salary nulls first) r,
DENSE_RANK() OVER (partition by department ORDER BY salary nulls first) dr,
DENSE_RANK() OVER (partition by department ORDER BY salary nulls last) dr2
FROM Employees
employee_name department salary r dr dr2
-------------- ------------ ------ ---- ---- ----
Rahul 10 null 1 1 4
Ajay 10 10000 2 2 1
Siva 10 40000 3 3 2
Akshaya 10 40000 3 3 2
Shruti 10 50000 5 4 3
Sabari 10 50000 5 4 3
Vignesh 11 5000 1 1 1
Karthi 11 20000 2 2 2
Ravi 12 30000 1 1 1
5. Can UNION and UNION ALL return the same results?UNION
removes duplicate records (where all columns in the results are the same), UNION ALL
does not. Both these commands result the same results when there are no duplicate rows present (both data sets combined).
6. What are the best scenarios to use a self join?
You use a self join when a table references data in itself.
E.g., an Employee table of an Employee Portal app, may have a supervisor_id column that points to the employee that is the boss of the current employee. Or a Catogory table of an e-commerce website may have a sub_category_id column that points to a category present in the same table.
7. How can you use SQL to dedupe rows?
Dedupe here refers to removal of duplicate rows based on a certain number of columns.
Solution 1: One can use CTE's(Common Table Expressions) to delete using analytic function ROW_NUMBER()
.
WITH CTE (col1, col2, dupcnt)
AS
(
SELECT col1, col2,
ROW_NUMBER() OVER (PARTITION BY col1, col2 ORDER BY col1) AS dupcnt
FROM MyTable
)
DELETE
FROM CTE
WHERE dupcnt > 1
GO
Solution 2: Use SELF JOINs
SELECT emp_name, emp_address, sex, marital_status
from MyTable a
WHERE NOT EXISTS (select 1
from MYTable b
where b.emp_name = a.emp_name and
b.emp_address = a.emp_address and
b.sex = a.sex and
b.create_date >= a.create_date)
Solution 3: Use GROUP BY
SELECT first_name, last_name, mobile_no
FROM Customer
GROUP BY first_name, last_name, mobile_no;
8. In what scenarios are the LAG function useful? SQL Server provides LAG() function which is very useful in case the current row values need to be compared with the data/value of the previous record or any record before the previous record. Syntax:
LAG (scalar_expression [, offset] [, default])
OVER ( [ partition_by ] order_by )
For Example
SELECT organisation, year, revenue,
LAG (revenue, 1, 0)
OVER (PARTITION BY organisation ORDER BY year) AS PrevYearRevenue
FROM OrgTable
ORDER BY organisation, year;
9. Use a window function to calculate a running total.
As simple as
WITH data AS (
SELECT
date(rental_date) as day,
count(rental_id) as rental_count
FROM Rental
GROUP BY day
)
SELECT day, rental_count,
SUM(rental_count) OVER (ORDER BY day) as cumulative_sum
from data;
Note: The sum(), lead(), lag() functions are supported only in MySQL version >= 8
10. Why would you use GROUP BY ROLLUP
?
The ROLLUP
is an extension of the GROUP BY
clause. The ROLLUP
option allows you to include extra rows that represent the subtotals, which are commonly referred to as super-aggregate rows, along with the grand total row. By using the ROLLUP
option, you can use a single query to generate multiple grouping sets.
For e.g. when we run the following query, we get:
SELECT
warehouse,
SUM (quantity) qty
FROM
inventory
GROUP BY
warehouse;
warehouse qty
------------------
Chennai 11
Mumbai 10
whereas, the same query with ROLLUP
gives an extra row with the sum of the individual counts.
SELECT
warehouse, SUM(quantity) qty
FROM
inventory
GROUP BY ROLLUP (warehouse);
warehouse qty
------------------
Chennai 11
Mumbai 10
null 21
If you learned something new from the post, leave a 👍. நன்றி.
~Rahul