JUST DO IT!

SQL ํ•™์Šต - Transaction(for Python) & ์ถ”๊ฐ€ SQL ๋ฌธ๋ฒ• (TIL 230512) ๋ณธ๋ฌธ

TIL

SQL ํ•™์Šต - Transaction(for Python) & ์ถ”๊ฐ€ SQL ๋ฌธ๋ฒ• (TIL 230512)

sunhokimDev 2023. 5. 12. 22:06

๐Ÿ“š KDT WEEK 6 DAY 5 TIL

  • Transaction
  • Python์—์„œ ํŠธ๋žœ์žญ์…˜ ์‚ฌ์šฉํ•˜๊ธฐ
  • ์œ ์šฉํ•œ SQL ๋ฌธ๋ฒ•

 

๐ŸŸฅ Transaction

Atomicํ•˜๊ฒŒ ์‹คํ–‰๋˜์–ด์•ผ ํ•˜๋Š” SQL๋“ค์„ ๋ฌถ์–ด์„œ ํ•˜๋‚˜์˜ ์ž‘์—…์ฒ˜๋Ÿผ ์ฒ˜๋ฆฌํ•˜๋Š” ๋ฐฉ๋ฒ•

 

  • DDL ๋˜๋Š” DML ์ค‘ ๋ ˆ์ฝ”๋“œ๋ฅผ ์ˆ˜์ •, ์ถ”๊ฐ€, ์‚ญ์ œํ•œ ๊ฒƒ์—๋งŒ ์˜๋ฏธ๊ฐ€ ์žˆ์Œ
  • SELECT์—๋Š” ํŠธ๋žœ์žญ์…˜ ๋ถˆ๊ฐ€
  • BEGIN ~ END or BEGIN ~ COMMIT ์‚ฌ์ด์— ์ฟผ๋ฆฌ๋ฌธ ์ž‘์„ฑ
  • Atomic : ํŠธ๋ž™์žญ์…˜์˜ ๊ณผ์ •์ด ๋ชจ๋‘ ์„ฑ๊ณต๋˜๊ฑฐ๋‚˜ ๋ชจ๋‘ ์‹คํŒจ๋˜์–ด์•ผํ•จ
  • ROLLBACK : ์ด์ „ ์ƒํƒœ๋กœ ๋Œ์•„๊ฐ

ex) ์€ํ–‰์˜ ๊ณ„์ขŒ ์ธ์ถœ๊ณผ ์ž…๊ธˆ์ด ๋ฌถ์—ฌ์„œ ์‹คํ–‰๋˜์–ด์•ผ ํ•˜๋ฏ€๋กœ ํŠธ๋ž™์žญ์…˜์ด ํ•„์š”ํ•จ

 

BEGIN;
    A์˜ ๊ณ„์ขŒ๋กœ๋ถ€ํ„ฐ ์ธ์ถœ;
    B์˜ ๊ณ„์ขŒ๋กœ ์ž…๊ธˆ;
END; -- COMMIT๊ณผ ๋™์ผ
  • ๊ณ„์ขŒ ์ธ์ถœ๊ณผ ์ž…๊ธˆ์ด ๋งˆ์น˜ ํ•˜๋‚˜์˜ ๋ช…๋ น์–ด์ฒ˜๋Ÿผ ์ฒ˜๋ฆฌ๋œ๋‹ค.
  • BEGIN ์ด์ „์˜ ์ƒํƒœ๋กœ ๋Œ์•„๊ฐ€๋ ค๋ฉด ROLLBACK ์‹คํ–‰

commit mode

  • autocommit
    • True : ๋ชจ๋“  ๋ ˆ์ฝ”๋“œ์˜ ์ˆ˜์ •/์‚ญ์ œ/์ถ”๊ฐ€ ์ž‘์—…์ด ๋ฐ”๋กœ DB์— ์“ฐ์—ฌ์ง
    • False : ๋ชจ๋“  ๋ ˆ์ฝ”๋“œ์˜ ์ˆ˜์ •/์‚ญ์ œ/์ถ”๊ฐ€ ์ž‘์—…์ด COMMIT์ด ํ˜ธ์ถœ๋  ๋•Œ๊นŒ์ง€ ์ €์žฅ X
      • False์˜ ๊ฒฝ์šฐ .commit(), .rollback()์œผ๋กœ DB ์ €์žฅ ์กฐ์œจ

 


 

DELETE FROM vs TRUNCATE

 

DELETE FROM table_name (DELETE * FROM ๊ณผ๋Š” ๋‹ค๋ฆ„)

  • ํ…Œ์ด๋ธ”์—์„œ ๋ชจ๋“  ๋ ˆ์ฝ”๋“œ๋ฅผ ์‚ญ์ œ ๊ฐ€๋Šฅ
  • WHERE ์‚ฌ์šฉ๊ฐ€๋Šฅ โžก๏ธ ํŠน์ • ๋ ˆ์ฝ”๋“œ๋งŒ ์‚ญ์ œ ๊ฐ€๋Šฅ (DROP TABLE๊ณผ์˜ ์ฐจ์ด์ !)

 

TRUNCATE table_name

  • ํ…Œ์ด๋ธ”์—์„œ ๋ชจ๋“  ๋ ˆ์ฝ”๋“œ๋ฅผ ์‚ญ์ œ
  • DELETE FROM ๋ณด๋‹ค ์†๋„๊ฐ€ ๋น ๋ฅด๋‹ค
  • WHERE, Transaction ์‚ฌ์šฉ๋ถˆ๊ฐ€ โžก๏ธ ๋กค๋ฐฑ๋ถˆ๊ฐ€

 


๐ŸŸฆ ์‹ค์Šต

 

Redshift Connet in Python

.connect()์— ์ƒ์„ฑํ–ˆ๋˜ Redshift ์ •๋ณด๋ฅผ ์ฑ„์›Œ๋„ฃ์œผ๋ฉด ๋œ๋‹ค. (autocommit = False)

import psycopg2

conn = psycopg2.connect("dbname={dbname} user={user} host={host} password={password} port={port}".format(
    dbname=dbname,
    user=redshift_user,
    password=redshift_pass,
    host=host,
    port=port
))
conn.set_session(autocommit=autocommit)
cur = conn.cursor()
cur.execute("์ฟผ๋ฆฌ๋ฌธ")
res = cur.fetchall()
for r in res:
  print(r)

cur.execute() ์•ˆ์˜ ์ฟผ๋ฆฌ๋ฌธ์„ ํ†ตํ•ด ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ ธ์™€์„œ ์ถœ๋ ฅํ•˜๋Š” ์ฝ”๋“œ์ด๋‹ค.

 

1. autocommit = False์ด๋ฏ€๋กœ DB ๋ฐ์ดํ„ฐ ์กฐ์ž‘์ด ํ˜„์žฌ ์„ธ์…˜์—์„œ๋Š” ์ด๋ค„์ง€๋‚˜, ์‹ค์ œ ์ €์žฅ๋˜์ง€๋Š” ์•Š๋Š”๋‹ค.

2. conn.commit() ์œผ๋กœ ์‹ค์ œ DB์— ์ €์žฅํ•  ์ˆ˜ ์žˆ๋‹ค.

3. conn.rollback() ์œผ๋กœ ์‹คํ–‰๋˜๊ธฐ ์ด์ „ ์ƒํƒœ๋กœ ๋Œ์•„๊ฐˆ ์ˆ˜ ์žˆ๋‹ค.

 

๋‹ค์Œ์€ transaction์˜ ์˜ˆ์‹œ์ด๋‹ค.

try:
  cur.execute("DELETE FROM test_name_gender;") 
  cur.execute("INSERT INTO test_name_gender VALUES ('Claire', 'Female');")
  conn.commit()
except (Exception, psycopg2.DatabaseError) as error:
  print(error)
  conn.rollback()
finally :
  conn.close()

test_name_gender ํ…Œ์ด๋ธ”์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ญ์ œ, ์ถ”๊ฐ€ํ•˜๋Š” ๋“ฑ ๋ฐ์ดํ„ฐ๋ฅผ ์กฐ์ž‘ํ•˜์˜€๋‹ค.

ํ•˜์ง€๋งŒ try ๊ณผ์ •์—์„œ ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•œ๋‹ค๋ฉด conn.commit()์ด ์‹คํ–‰๋˜์ง€ ์•Š๊ณ  conn.rollback()์ด ์‹คํ–‰๋œ๋‹ค.

 

 

autocommit=True์ธ ๊ฒฝ์šฐ, ๋ฐ์ดํ„ฐ ์กฐ์ž‘์–ด๋Š” ์ž๋™์œผ๋กœ commit๋˜๋ฏ€๋กœ ํŠธ๋žœ์žญ์…˜์„ ๋‹ค๋ฃจ๋ ค๋ฉด ๋‹ค์Œ์˜ ๋ฐฉ์‹์„ ๋”ฐ๋ฅธ๋‹ค.

cur.execute("BEGIN;")
cur.execute("DELETE FROM test_name_gender;")
cur.execute("INSERT INTO test_name_gender VALUES ('Benjamin', 'Male');")
cur.execute("END;")
  • cur.execute()๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ•œ ์ค„์”ฉ ๋ช…๋ น์–ด๋ฅผ ์‹คํ–‰ํ•œ๋‹ค.
  • BEGIN๊ณผ END(COMMIT)์œผ๋กœ ํŠธ๋žœ์žญ์…˜์˜ range๋ฅผ ์„ค์ •ํ•œ๋‹ค.

๐ŸŸจ ์•Œ์•„๋‘๋ฉด ์œ ์šฉํ•œ SQL ๋ฌธ๋ฒ•๋“ค

ํ…Œ์ด๋ธ” ์—ฐ์‚ฐ

  • UNION : ์—ฌ๋Ÿฌ๊ฐœ์˜ ํ…Œ์ด๋ธ”์ด๋‚˜ SELECT ๊ฒฐ๊ณผ๋ฅผ ํ•˜๋‚˜์˜ ๊ฒฐ๊ณผ๋กœ ํ•ฉ์ณ์ค€๋‹ค.
    • UNION ALL์˜ ๊ฒฝ์šฐ ์ค‘๋ณต์„ ์ œ๊ฑฐํ•˜์ง€ ์•Š๊ณ  ๋ชจ๋‘ ๊ฒฐ๊ณผ๋กœ ํ•ฉ์นœ๋‹ค.
  • EXCEPT : ํ•˜๋‚˜์˜ SELECT ๊ฒฐ๊ณผ์—์„œ ๋‹ค๋ฅธ SELECT ๊ฒฐ๊ณผ๋ฅผ ๋นผ์ค€๋‹ค.
  • INTERSECT : ์—ฌ๋Ÿฌ๊ฐœ์˜ SELECT๋ฌธ์—์„œ ๊ฐ™์€ ๋ ˆ์ฝ”๋“œ๋งŒ ์ฐพ์•„์ค€๋‹ค.

NULL ์ฒ˜๋ฆฌ

  • COALESCE(A, B, C, ...) : A๋ถ€ํ„ฐ ์ฐจ๋ก€๋Œ€๋กœ NULL๊ฐ’์ด ๋‚˜์˜ค๋ฉด ๊ทธ ๊ฐ’์„ ๋ฆฌํ„ดํ•˜๊ณ  ๋ชจ๋‘ NULL์ด๋ฉด NULL ๋ฆฌํ„ด
    • NULL๊ฐ’์„ ๋‹ค๋ฅธ ๊ฐ’์œผ๋กœ ๋ฐ”๊พธ๊ณ  ์‹ถ์„ ๋•Œ ์‚ฌ์šฉ
  • NULLIF(A, B) : A์™€ B๊ฐ€ ๊ฐ™์œผ๋ฉด NULL์„ ๋ฆฌํ„ดํ•œ๋‹ค.

 

LISTAGG

๊ฒฐ๊ณผ๋ฅผ ๋ณด๋ฉด ์•Œ ์ˆ˜ ์žˆ๋“ฏ์ด ์ญ‰ ๊ฒฐ๊ณผ๋ฅผ ํ•œ ๋ฌธ์ž์—ด๋กœ ํ•ฉ์ณ์ค€๋‹ค.

  • LISTAGG(channel) ์„ LISTAGG(channel, '->') ๋กœ ๋ฐ”๊ฟ”์ฃผ๋ฉด, -> ๊ฐ€ ๋ฌธ์ž์—ด์˜ ๊ตฌ๋ถ„์ž๋กœ ์ถ”๊ฐ€๋œ๋‹ค.
  • GROUP BY์˜ Aggํ•จ์ˆ˜์ด๋ฏ€๋กœ GROUP BY๊ฐ€ ํ•„์š”ํ•˜๋‹ค.
  • LISTAGG ๋’ค์ชฝ์— ์„ ์–ธ๋œ WITHIN์€ ์ •๋ ฌ์„ ๋‹ด๋‹นํ•œ๋‹ค. WITHIN์ด ์—†์œผ๋ฉด ๋ฌด์ž‘์œ„๋กœ ๋ฌธ์ž์—ด์ด ์ƒ์„ฑ๋œ๋‹ค.

 

WINDOW

  • ex) FIRST_VALUE, LAST_VALUE, ROW_NUMBER...  > ๋ฐ‘์— ์ˆ™์ œ์ฐธ๊ณ 
  • ํ•จ์ˆ˜์ด๋ฆ„() OVER ( PARTITION BY ... ORDER BY ...) ํ˜•ํƒœ๋ฅผ ์ง€๋‹Œ ๊ฒƒ์ด ํŠน์ง•

 

1. LAG : ์ด์ „ ๋˜๋Š” ๋‹ค์Œ ํŠน์ • ํ•„๋“œ ๊ฐ’์„ ๊ฐ€์ ธ์˜ฌ ์ˆ˜ ์žˆ๋Š” ํ•จ์ˆ˜

LAG(channel, 1) OVER (PARTITION BY userId ORDER BY ts) prev_channel

ORDER BY userid, ts๊ฐ€ ๋’ค์ชฝ์— ์„ ์–ธ๋˜์–ด์žˆ๋‹ค๊ณ  ๊ฐ€์ •ํ•œ๋‹ค.

LAG(channel,1) : ๋‚ด ์•ž ํŒŒํ‹ฐ์…˜์˜ ์ฑ„๋„์„ ๊ฐ€์ ธ์˜จ๋‹ค.

PARTITION BY userId ORDER BY ts : userId๊ฐ€ ํŒŒํ‹ฐ์…˜์˜ ๊ธฐ์ค€์ด ๋˜๊ณ , ts๊ฐ€ ์ •๋ ฌ ๊ธฐ์ค€์ด ๋˜์–ด์ค€๋‹ค.

 

 

JSON Parsing Functions

JSON ํฌ๋งท์„ ํ™œ์šฉํ•˜์—ฌ Parsing ๊ฐ€๋Šฅํ•œ ํ•จ์ˆ˜

 

SELECT JSON_EXTRACT_PATH_TEXT('JSON ๋ฌธ์ž์—ด', 'f4','f6');

--> f4 ํƒœ๊ทธ์˜ f6 ์š”์†Œ๋ฅผ ์ฐพ์•„ ๊ฐ’์„ ๋ฐ˜ํ™˜ํ•ด์ค€๋‹ค.

JSON์˜ ๊ตฌ์กฐ๋ฅผ ๋ฏธ๋ฆฌ ์•Œ๊ณ ์žˆ์–ด์•ผ ํ•œ๋‹ค๋Š” ๊ฒƒ์ด ํ 

 


 

์ˆ™์ œ

๋”๋ณด๊ธฐ

1. ์‚ฌ์šฉ์ž๋ณ„๋กœ ์ฒ˜์Œ ์ฑ„๋„๊ณผ ๋งˆ์ง€๋ง‰ ์ฑ„๋„ ์•Œ์•„๋‚ด๊ธฐ

 

๋‚ดํ’€์ด

SELECT usc.userid, ROW_NUMBER() OVER (PARTITION BY usc.userid ORDER BY ts.ts) as nn
FROM raw_data.session_transaction as trans
INNER JOIN raw_data.user_session_channel as usc ON trans.sessionid = usc.sessionid
INNER JOIN raw_data.session_timestamp as ts ON trans.sessionid = ts.sessionid

์ฒ˜์Œ ์ฑ„๋„๊ณผ ๋งˆ์ง€๋ง‰ ์ฑ„๋„๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ์ „์ฒด ์ฑ„๋„์ด ๋ชจ๋‘ ์ถœ๋ ฅ๋œ๋‹ค. 

 

FIRST_VALUE, LAST_VALUE ์‚ฌ์šฉํ•˜๊ธฐ

SELECT DISTINCT
    A.userid,
    FIRST_VALUE(A.channel) over(partition by A.userid order by B.ts rows between unbounded preceding and unbounded following) AS First_Channel,
    LAST_VALUE(A.channel) over(partition by A.userid order by B.ts rows between unbounded preceding and unbounded following) AS Last_Channel
FROM raw_data.user_session_channel A
LEFT JOIN raw_data.session_timestamp B ON A.sessionid = B.sessionid;

rows between ~ following ๊ตฌ๋ฌธ์€ ์œˆ๋„์šฐ์˜ ๋ฒ”์œ„๋ฅผ ์ „์ฒด๋กœ ํ•˜๋Š” ๊ตฌ์ ˆ์ด๋‹ค.

 

CTE ๋นŒ๋”ฉ๋ธ”๋ก ์‚ฌ์šฉํ•˜๊ธฐ

WITH first AS (
    SELECT userid, ts, channel, ROW_NUMBER() OVER(PARTITION BY userid ORDER BY ts) seq
    FROM raw_data.user_session_channel usc
    JOIN raw_data.session_timestamp st ON usc.sessionid = st.sessionid
), last AS(
    SELECT userid, ts, channel, ROW_NUMBER() OVER(PARTITION BY userid ORDER BY ts DESC) seq
    FROM raw_data.user_session_channel usc
    JOIN raw_data.session_timestamp st ON usc.sessionid = st.sessionid
)

SELECT first.userid AS userid, first.channel AS first_channel, last.channel AS last_channel
FROM first
JOIN last ON first.userid = last.userid and last.seq = 1
WHERE first.seq = 1

 


 

 

2.. Gross Revenue(Refund ํฌํ•จ๋งค์ถœ)์ด ๊ฐ€์žฅ ํฐ userId 10๊ฐœ ์ฐพ๊ธฐ

 

๋‚ด ํ’€์ด(์ •๋‹ต)

SELECT usc.userid, SUM(trans.amount) as GrossRevenue
FROM raw_data.session_transaction as trans
INNER JOIN raw_data.user_session_channel as usc ON trans.sessionid = usc.sessionid
GROUP BY 1
ORDER BY 2 DESC
LIMIT 10

 

SUM OVER ํ™œ์šฉํ•ด๋ณด๊ธฐ

SELECT
    DISTINCT usc.userid,
    SUM(amount) OVER(PARTITION BY usc.userid)
FROM raw_data.user_session_channel AS usc
JOIN raw_data.session_transaction AS revenue ON revenue.sessionid = usc.sessionid
ORDER BY 2 DESC

์œ„์ฒ˜๋Ÿผ GROUP BY๋ฅผ ์ œ๊ฑฐํ•˜๊ณ  SUM์—์„œ PARTITION BY๋ฅผ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค. 

ํ•˜์ง€๋งŒ ์ด๋Ÿฌ๋ฉด SUM์ด userid์˜ ์ค‘๋ณตํšŸ์ˆ˜๋งŒํผ ๋ช‡ ๋ฒˆ์”ฉ ์ถœ๋ ฅ๋˜๊ธฐ ๋•Œ๋ฌธ์— DISTINCT๋ฅผ ํ™œ์šฉํ–ˆ๋‹ค.

์‚ฌ์‹ค ์ด๋ณด๋‹ค ์œ„์— ์žˆ๋Š” ๋‹ต์ด ํ›จ์”ฌ ํšจ์œจ์ ์ด๊ณ  ๊น”๋”ํ•˜๋‹ค.

 


 

 

3. raw_data.nps ํ…Œ์ด๋ธ”์„ ๋ฐ”ํƒ•์œผ๋กœ ์›”๋ณ„ NPS ๊ณ„์‚ฐํ•˜๊ธฐ

 

๋‚ด ํ’€์ด

%%sql

SELECT
    LEFT(created_at,7) as month,
    ROUND(SUM(CASE WHEN score >= 9 THEN 1 END) / COUNT(1)::float , 3) - ROUND(SUM(CASE WHEN score <= 6 THEN 1 END) / COUNT(1)::float , 3) as NPS
FROM raw_data.nps
GROUP BY 1

 

ํ…Œ์ด๋ธ” ํ•˜๋‚˜ ๋งŒ๋“ค์–ด์„œ ๋ช…์‹œํ•˜๊ธฐ

SELECT
    month,
    ROUND((promoters-detractors)::float/total_count*100, 2) AS overall_nps
FROM (
    SELECT 
        LEFT(created_at, 7) AS month,
        COUNT(CASE WHEN score >= 9 THEN 1 END) AS promoters,
        COUNT(CASE WHEN score <= 6 THEN 1 END) AS detractors,
        COUNT(CASE WHEN score > 6 AND score < 9 THEN 1 END) AS passives,
        COUNT(1) AS total_count
    FROM raw_data.nps
    GROUP BY 1
    ORDER BY 1
)

0์œผ๋กœ ๋‚˜๋ˆŒ ๋•Œ๋ฅผ ์ œ์™ธํ•˜๊ธฐ ์œ„ํ•œ ์ฒ˜๋ฆฌ๊ฐ€ ๋˜์–ด์žˆ์ง€๋Š” ์•Š๋‹ค.