10 SQL questions and answers [2]

1. Why use the WITH statement? What does it help you do?
The WITH clause in SQL was introduced in standard SQL to simplify complex long queries, especially those with JOINs and subqueries. Often interchangeably called CTE (Common Table Expressions) or subquery refactoring, a WITH clause defines a temporary data set whose output is available to be referenced in subsequent queries. For e.g.

WITH cte_table
AS
(SELECT
    SUM(quantity) as total
FROM OrderDetails
GROUP BY product_id)
SELECT
    AVG(total) average_product_quantity
FROM cte_table;

2. Can you use COUNT DISTINCT on two columns? How?
If we want to count distinct column combinations, we must either clumsily concatenate values (and be very careful to choose the right separator):

select count(distinct col1 || '-' || col2) from mytable;

or use a subquery:

select count(*) from (select distinct col1, col2 from mytable);

Or rather use the columns in the GROUP BY clause.

select col1, col2, count(*) from from mytable GROUP BY col1, col2;

The COUNT function does not table multiple parameters.

3. When does a full outer join actually make sense to use?
It's rare, but there are a few cases. For example, it allows us to see each employee who is in a department and each department that has an employee, but also see each employee who is not part of a department and each department which doesn't have an employee. It is used mostly when attempting to find mismatched, orphaned data, from both tables and get all the results (including unmatched rows) for manual transformation. 4. How to decide which rank function to use, RANK or DENSE RANK?
RANK continues the rank count in line with the ROW_NUMBER series, while DENSE_RANK does not, and instead continues the rank count with the next value after the duplicate rank. To answer the question, it depends on our requirement. If we are reporting winners, and always needed to report a first, second, and third place, regardless of ties for each place, you would use DENSE_RANK. Otherwise, we could use RANK, which might mean that there will be no second or third place.

5. Difference between UNION and UNION ALL?
UNION removes duplicate records (where all columns in the results are the same), UNION ALL does not. Both these commands result the same results when there are no duplicate rows present (both data sets combined).

6. When is it a bad idea to use a subquery?
Subqueries are usually fine unless they are dependent subqueries (also known as correlated subqueries). If you are only using independent subqueries and they are using appropriate indexes then they should run quickly. If you have a dependent subquery you might run into performance problems because a dependent subquery typically needs to be run once for each row in the outer query. So if your outer query has 1000 rows, the subquery will be run 1000 times. On the other hand an independent subquery typically only needs to be evaluated once.

7. What is the best approach to doing a running total?
Using window functions. For e.g.

WITH data AS (
  SELECT
    date(rental_date) as day,
    count(rental_id) as rental_count
  FROM Rental
  GROUP BY day
)
SELECT  day, rental_count,
   SUM(rental_count) OVER (ORDER BY day) as cumulative_sum
from data;

8. What does the LEAD function do?
The LEAD function can be very useful for calculating the difference between the value of the current row and the value of the following row. It is an analytic function that lets you query more than one row in a table at a time without having to join the table to itself.

9. If an error is thrown using SQL, where do you start looking?
The line of error references in the error response can be used to pinpoint the source of error. Most query languages have their own syntax for Error Handling as well.

10. Inner joining tables A and B returns 50 rows. Left join Table B to A returns 125 rows. What's going on here?
Table A has 125 rows, out of which 50 overlap with Table B.

10 SQL questions and answers [2]

Ace your SQL interview!