Solving a LinkedIn Data Science SQL Interview Question

 https://www.youtube.com/watch?v=d2920ysu2hQ&ab_channel=JayFeng
 01:37

peepoThink (1)

  • we could use GROUP BY to see how many posting share the same uid
  • we could then SELECT WHERE count == 1
    (2)
  • use DATE_TRUNC to group by year or
  • use WHERE to split each into 180 day intervals from now()
 02:13

visualize the output and work backwards, make an output table, split into intermediate tables to process

 05:55
WITH user_job AS (
	SELECT u_id, j_id, COUNT(DISTINCT date_posted) AS num_posted
	FROM jobs
	GROUP BY u_id, j_id
)
 08:06
SELECT
	SUM(CASE WHEN avg_posted > 1 THEN 1 END) AS posted_multiple
	, SUM(CASE WHEN avg_posted = 1 THEN 1 END) AS posted_once
FROM (
	SELECT
		user_id, AVG(num_posted) AS avg_posted
	FROM user_job
	GROUP BY user_id
) AS t
 10:41

HAVING won't work because it will filter values

Post Mortem

WITH - named function
COUNT(DISTINCT field_name) - get count of distinct values in field
GROUP_BY creates sub tables on different keys
CASE WHEN - if
END - end the if
ELSE - else
AVG() - get average of field

Use intermediate tables to get to the final table