Solving a LinkedIn Data Science SQL Interview Question
https://www.youtube.com/watch?v=d2920ysu2hQ&ab_channel=JayFeng
01:37
peepoThink (1)
- we could use
GROUP BY
to see how many posting share the sameuid
- we could then
SELECT WHERE
count == 1
(2) - use
DATE_TRUNC
to group by year or - use WHERE to split each into 180 day intervals from
now()
02:13
visualize the output and work backwards, make an output table, split into intermediate tables to process
05:55
WITH user_job AS (
SELECT u_id, j_id, COUNT(DISTINCT date_posted) AS num_posted
FROM jobs
GROUP BY u_id, j_id
)
08:06
SELECT
SUM(CASE WHEN avg_posted > 1 THEN 1 END) AS posted_multiple
, SUM(CASE WHEN avg_posted = 1 THEN 1 END) AS posted_once
FROM (
SELECT
user_id, AVG(num_posted) AS avg_posted
FROM user_job
GROUP BY user_id
) AS t
10:41
HAVING
won't work because it will filter values
Post Mortem
WITH
- named functionCOUNT(DISTINCT field_name)
- get count of distinct values in fieldGROUP_BY
creates sub tables on different keysCASE WHEN
- ifEND
- end the ifELSE
- elseAVG()
- get average of field
Use intermediate tables to get to the final table