JUST DO IT!

Redshift ๊ณ ๊ธ‰ ๊ธฐ๋Šฅ -TIL230524 ๋ณธ๋ฌธ

TIL

Redshift ๊ณ ๊ธ‰ ๊ธฐ๋Šฅ -TIL230524

sunhokimDev 2023. 5. 24. 19:40

๐Ÿ“š KDT WEEK 8 DAY 3 TIL

  • Redshift ๊ถŒํ•œ๊ณผ ๋ณด์•ˆ
  • Redshift ๋ฐฑ์—…๊ณผ ๋ณต๊ตฌ
  • Redshift ๊ด€๋ จ ์„œ๋น„์Šค
    • Redshift Spectrum + ์‹ค์Šต
    • Athena
    • Redshift ML (SageMaker) + ์‹ค์Šต
  • Redshift ์ค‘์ง€ ๋ฐ ์ œ๊ฑฐ ๋ฐ ์ตœ์ ํ™”

 

๐ŸŸฅ Redshift ๊ถŒํ•œ๊ณผ ๋ณด์•ˆ

 

Redshift ์‚ฌ์šฉ์ž๋ณ„ ํ…Œ์ด๋ธ” ๊ถŒํ•œ ์„ค์ •

  • ์‚ฌ์šฉ์ž๋ณ„ ํ…Œ์ด๋ธ”๋ณ„ ๋”ฐ๋กœ๋”ฐ๋กœ ๊ถŒํ•œ์„ ์ง€์ •ํ•˜์ง€๋Š” ์•Š๊ณ , IAM ๊ทธ๋ฃน๊ณผ ์—ญํ• ์„ ํ™œ์šฉํ•œ๋‹ค.
  • ์ตœ๊ทผ์—” RBAC(Role Based Access Control)๊ฐ€ ์ผ๋ฐ˜์ ์ด๋‹ค.
  • ์—ฌ๋Ÿฌ ์—ญํ• ์˜ ์†ํ•œ ์‚ฌ์šฉ์ž๋Š” ๊ฐ ์—ญํ• ์˜ ๊ถŒํ•œ์„ ๋ชจ๋‘ ๊ฐ€์ง„๋‹ค.
  • GROUP ํ‚ค์›Œ๋“œ๋ฅผ ROLE๋กœ ๋ฐ”๊พธ์–ด๋„ ๋™์ผํ•˜๊ฒŒ ๋™์ž‘ํ•œ๋‹ค.

๋‹ค์Œ์€ Colab์—์„œ ๊ฐ„๋‹จํ•˜๊ฒŒ ์‹ค์Šตํ•œ ๊ฒฐ๊ณผ์ด๋‹ค.

๊ทธ๋ฃน์„ ๋จผ์ €๋งŒ๋“ค๊ณ , ๊ทธ๋ฃน์— ๊ถŒํ•œ์„ ๋ถ€์—ฌํ•ด์ฃผ์—ˆ๋‹ค.

CREATE GROUP ๊ทธ๋ฃน๋ช… ์œผ๋กœ ๊ทธ๋ฃน์„ ์ƒ์„ฑํ•˜๊ณ , GRANT ๋ช…๋ น์–ด๋กœ ๊ถŒํ•œ์„ ๋ถ€์—ฌํ•œ๋‹ค.

GRANT ALL ์€ ์ฝ๊ธฐ์™€ ์“ฐ๊ธฐ ๊ถŒํ•œ์„ ๋ถ€์—ฌํ•˜๊ณ , GRANT USAGE ๋Š” SELECT ๊ถŒํ•œ์„ ๋ถ€์—ฌํ•œ๋‹ค.

analytics_authors ๊ทธ๋ฃน์€ analytics, adhoc ์Šคํ‚ค๋งˆ์˜ ๋ชจ๋“  ํ…Œ์ด๋ธ” ์ฝ๊ธฐ, ์“ฐ๊ธฐ ๊ถŒํ•œ๊ณผ raw_data์˜ ๋ชจ๋“  ํ…Œ์ด๋ธ” ์ฝ๊ธฐ ๊ถŒํ•œ์„ ์–ป์—ˆ๋‹ค.

 

 

์ปฌ๋Ÿผ ๋ ˆ๋ฒจ ๋ณด์•ˆ(Column Level Security)

  • ํ•œ ํ…Œ์ด๋ธ” ๋‚ด์—์„œ ํŠน์ • ์ปฌ๋Ÿผ์— ๋Œ€ํ•ด ๊ถŒํ•œ์„ ๋ถ€์—ฌํ•˜๋Š” ๊ฒƒ
  • ๋ณดํ†ต ๊ฐœ์ธ์ •๋ณด์— ํ•ด๋‹นํ•˜๋Š” ์ปฌ๋Ÿผ์„ ๊ฐ์ถ”๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉํ•œ๋‹ค.
  • ์‚ฌ์‹ค์€ ์ด๋Ÿฌํ•œ ์ปฌ๋Ÿผ์„ ๋ณ„๋„์˜ ํ…Œ์ด๋ธ”๋กœ ๊ตฌ์„ฑํ•˜๋Š” ๊ฒƒ์ด ๋ฒ ์ŠคํŠธ

 

๋ ˆ์ฝ”๋“œ ๋ ˆ๋ฒจ ๋ณด์•ˆ (Row Level Security)

  • ํ•œ ํ…Œ์ด๋ธ” ๋‚ด์—์„œ ํŠน์ • ๋ ˆ์ฝ”๋“œ์— ๋Œ€ํ•ด ๊ถŒํ•œ์„ ๋ถ€์—ฌํ•˜๋Š” ๊ฒƒ
  • ์—ญ์‹œ ๋ณ„๋„์˜ ํ…Œ์ด๋ธ”์„ ๊ตฌ์„ฑํ•˜๋Š” ๊ฒƒ์ด ๋” ์ข‹๋‹ค.

 


 

๐ŸŸฆ Redshift ๋ฐฑ์—…๊ณผ ๋ณต๊ตฌ

 

Redshift ๋ฐ์ดํ„ฐ ๋ฐฑ์—… (๊ณ ์ •๋น„์šฉ)

  • Snapshot : ๋งˆ์ง€๋ง‰ ๋ฐฑ์—…์œผ๋กœ๋ถ€ํ„ฐ ๋ฐ”๋€ ๊ฒƒ๋“ค๋งŒ ์ €์žฅํ•œ๋‹ค.
  • Table Restore : ์Šค๋ƒ…์ƒท์„ ํ™œ์šฉํ•˜์—ฌ ํŠน์ • ์‹œ์ ์˜ ํŠน์ • ํ…Œ์ด๋ธ”์„ ๋ณต๊ตฌํ•˜๋Š” ๊ฒƒ์ด ๊ฐ€๋Šฅ
  • ์ž๋™ ๋ฐฑ์—… : ๊ฐ™์€ ์ง€์—ญ S3์— ์ตœ๋Œ€ 35์ผ๊นŒ์ง€์˜ ๋ณ€๊ฒฝ์„ ๋ฐฑ์—… ๊ฐ€๋Šฅ
  • ๋งค๋‰ด์–ผ ๋ฐฑ์—… : ์‚ฌ์šฉ์ž๊ฐ€ ์ง์ ‘ ์–ธ์ œ๋“  ํ˜„์žฌ ์ง€์ ์˜ ๋ฐฑ์—…์„ ๋งŒ๋“ค์–ด ๋ฌดํ•œํžˆ ์œ ์ง€ ๊ฐ€๋Šฅ
  • Cross-regional snapshot : ์œ ์‚ฌ์‹œ๋ฅผ ๋Œ€๋น„ํ•˜์—ฌ ๋‹ค๋ฅธ ์ง€์—ญ์˜ S3 ๋ฐฑ์—…์„ ๋‘๋Š” ๊ฒƒ

 

Redshift Serverless ๋ฐ์ดํ„ฐ ๋ฐฑ์—…

  • Snapshot ์ด์ „์— Recovery Points(๋ณต๊ตฌ ์‹œ์ )๊ฐ€ ์กด์žฌํ•ด์„œ, Recovery points๋ฅผ snapshot์œผ๋กœ ๋ฐ”๊พธ๋Š” ๊ณผ์ •์ด ํ•„์š”
  • ๊ฐ€๋ณ€ ๋น„์šฉ์˜ ๊ฒฝ์šฐ ์ปดํ“จํŒ… ์ž์›๊ณผ ์Šคํ† ๋ฆฌ์ง€๊ฐ€ ๊ณ ์ •์ ์ด์ง€ ์•Š์•„ snapshot ๋ฐ”๋กœ ํ• ๋‹น์ด ์–ด๋ ค์›€
  • ์ž๋™ ๋ฐฑ์—… : Recovery Points๊ฐ€ ์ƒ์„ฑ๋˜์–ด 24์‹œ๊ฐ„๋™์•ˆ ์œ ์ง€๋œ๋‹ค.
  • ์Šค๋ƒ…์ƒท์„ ํ™œ์šฉํ•œ ๋‹ค๋ฅธ ์ž‘์—…์€ ๊ณ ์ • ๋น„์šฉ ๋ฐฉ์‹๊ณผ ๊ฑฐ์˜ ๋™์ผํ•˜๊ฒŒ ๊ฐ€๋Šฅํ•˜๋‹ค.

 

Redshift Serverless์˜ ๋„ค์ž„์ŠคํŽ˜์ด์Šค > ๋ฐ์ดํ„ฐ ๋ฐฑ์—… ํ•ญ๋ชฉ์„ ํ™•์ธํ•ด๋ณด๋ฉด,

์Šค๋ƒ…์ƒท์€ ํ…… ๋น„์—ˆ๊ณ , Recovery Points(๋ณต๊ตฌ ์‹œ์ )๋งŒ ์ž๋™ ์ƒ์„ฑ๋œ๋‹ค.

์˜ค๋ฅธ์ชฝ ์ƒ๋‹จ์˜ ์Šค๋ƒ…์ƒท ์ƒ์„ฑ ๋ฒ„ํŠผ์„ ํ†ตํ•ด ์ง์ ‘ ์Šค๋ƒ…์ƒท์„ ๋งŒ๋“ค์–ด์•ผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค.

Recovery Points(๋ณต๊ตฌ ์‹œ์ ) ํ•ญ๋ชฉ์—์„œ ํ•˜๋‚˜๋ฅผ ์„ ํƒํ•˜๊ณ , '๋ณต๊ตฌ ์‹œ์ ์—์„œ ์Šค๋ƒ…์ƒท ์ƒ์„ฑ' ๋ฒ„ํŠผ์œผ๋กœ ์Šค๋ƒ…์ƒท์„ ์ƒ์„ฑํ•  ์ˆ˜๋„ ์žˆ๋‹ค.

 


 

๐ŸŸฉ Redshift์˜ ๊ด€๋ จ ๊ธฐํƒ€ ์„œ๋น„์Šค

 

๐ŸŽˆ Redshift Spectrum

S3์˜ ํŒŒ์ผ๋“ค์„ ๋งˆ์น˜ ํ…Œ์ด๋ธ”์ฒ˜๋Ÿผ SQL๋กœ ์ฒ˜๋ฆฌ ๊ฐ€๋Šฅํ•œ ์„œ๋น„์Šค

Redshift spectrum

 

  • S3 ํŒŒ์ผ๋“ค์„ ์™ธ๋ถ€ ํ…Œ์ด๋ธ”๋กœ ์ฒ˜๋ฆฌํ•˜์—ฌ Redshift ํ…Œ์ด๋ธ”๊ณผ ์กฐ์ธ์ด ๊ฐ€๋Šฅํ•˜๋„๋ก ํ•จ
  • 1TB๋ฅผ ์Šค์บ”ํ•  ๋•Œ๋งˆ๋‹ค $5
  • S3์™€ Redshift ํด๋Ÿฌ์Šคํ„ฐ๊ฐ€ ๊ฐ™์€ region์— ์žˆ์–ด์•ผํ•œ๋‹ค.

 

Spectrum ์„œ๋น„์Šค๋ฅผ ์‚ฌ์šฉํ•  ๋•Œ ์•Œ๋ฉด ์ข‹์€ ํ‚ค์›Œ๋“œ

1. Fact ํ…Œ์ด๋ธ” : ๋ถ„์„์˜ ์ดˆ์ ์ด ๋˜๋Š” ์–‘์  ์ •๋ณด๋ฅผ ํฌํ•จํ•˜๋Š” ์ค‘์•™ ํ…Œ์ด๋ธ”

ex) ๋งค์ถœ ์ˆ˜์ต, ํŒ๋งค๋Ÿ‰, ์ด์ต ๋“ฑ ์‚ฌ์šฉ์ž์˜ ํ–‰๋™์ด ๊ธฐ๋ก๋œ ํ…Œ์ด๋ธ”

 

2. Dimension ํ…Œ์ด๋ธ” : (์‚ฌ์šฉ์ž๊ฐ€ ๊ตฌ๋งคํ•œ) ๋ฌผํ’ˆ์ด๋‚˜ ๊ณ ๊ฐ์— ๋Œ€ํ•œ ์ •๋ณด ํ…Œ์ด๋ธ”

Fact ํ…Œ์ด๋ธ”์˜ ๋ฐ์ดํ„ฐ์— ๋งฅ๋ฝ์„ ์ œ๊ณตํ•˜์—ฌ ์‚ฌ์šฉ์ž๊ฐ€ ๋‹ค์–‘ํ•œ ๋ฐฉ์‹์œผ๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ์กฐ๊ฐ๋‚ด๊ณ  ๋ถ„์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•œ๋‹ค.

๋ณดํ†ต Dimension ํ…Œ์ด๋ธ” ํฌ๊ธฐ๊ฐ€ ์ž‘๊ณ , fact ํ…Œ์ด๋ธ”์˜ foreign key๋กœ ์ฐธ์กฐ๋จ.

 

3. ์™ธ๋ถ€ ํ…Œ์ด๋ธ”(External Table)

  • DB ์—”์ง„์ด ์™ธ๋ถ€์— ์žˆ๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ๋งˆ์น˜ ๋‚ด๋ถ€ ํ…Œ์ด๋ธ”(์ฝ๊ธฐ ์ „์šฉ)์ฒ˜๋Ÿผ ์‚ฌ์šฉ
  • CSV, JSON, XML๊ณผ ๊ฐ™์€ ํŒŒ์ผ ํ˜•์‹ ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ์›๊ฒฉ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค(ODBC ๋“ฑ)์™€ ๊ฐ™์€ ๋‹ค์–‘ํ•œ ๋ฐ์ดํ„ฐ ์†Œ์Šค๋„ ๊ฐ€๋Šฅ
  • ์™ธ๋ถ€ ํ…Œ์ด๋ธ”์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ ํ›„ ๊ฒฐ๊ณผ๋ฅผ ๋‚ด๋ถ€ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์— ์ ์žฌํ•  ์ˆ˜ ์žˆ๋‹ค.
  • ํ•˜์ง€๋งŒ ์™ธ๋ถ€ ํ…Œ์ด๋ธ”์€ ๋ณด์•ˆ ๋ฐ ์„ฑ๋Šฅ ๋ฌธ์ œ์— ๋Œ€ํ•ด ์‹ ์ค‘ํ•˜๊ฒŒ ๊ณ ๋ คํ•ด์•ผ ํ•œ๋‹ค.

 

4. AWS Glue

  • AWS์˜ Serverless ETL ์„œ๋น„์Šค
  • AWS Glue Data Catalog๋Š” ๋ฐ์ดํ„ฐ ์†Œ์Šค ๋ฐ ๋Œ€์ƒ์˜ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ๊ฒ€์ƒ‰ ๊ธฐ๋Šฅ์„ ์ œ๊ณต
  • ์ฃผ๋กœ S3๋‚˜ ๋‹ค๋ฅธ AWS ์„œ๋น„์Šค ์ƒ์˜ ๋ฐ์ดํ„ฐ ์†Œ์Šค๋ฅผ ๋Œ€์ƒ์œผ๋กœ ํ•˜๋ฏ€๋กœ Spectrum์ด๋‚˜ Athena์— ํ•„์š”ํ•œ ์„œ๋น„์Šค์ด๋‹ค.
  • ๋”ฐ๋ผ์„œ ๊ด€๋ จ ์„œ๋น„์Šค๋ฅผ ์ด์šฉํ•  ๋•Œ Glue ๊ถŒํ•œ์ด ์žˆ๋Š” ์—ญํ• ์ด ํ•„์š”ํ•˜๋‹ค. > Spectrum ์‹ค์Šตํ•  ๋•Œ ๋„ฃ์–ด์ค„ ์˜ˆ์ •

 

 

โฉ Redshift Spectrum ์‹ค์Šตํ•ด๋ณด๊ธฐ

 

1. AWS Glue ๊ถŒํ•œ ์ถ”๊ฐ€

 

๊ธฐ์กด์— S3FullAccess๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ๋˜ ์—ญํ• ์— Glue ๊ถŒํ•œ๋„ ์ถ”๊ฐ€ํ•ด์ฃผ์—ˆ๋‹ค.

AWSGlueConsoleFullAccess๋ฅผ ์—ญํ• ์— ๋„ฃ์–ด์ค€๋‹ค.

 

 

2. S3 ๋ฒ„ํ‚ท์— ์™ธ๋ถ€ ํ…Œ์ด๋ธ”๋กœ ์‚ฌ์šฉํ•  ํŒŒ์ผ ์ถ”๊ฐ€

 

usc ํด๋”๋ฅผ ๋งŒ๋“ค์–ด ํ•ด๋‹น csv ํŒŒ์ผ์„ ๋„ฃ์–ด์ฃผ์—ˆ๋‹ค.

 

3. Colab ์‹ค์Šต

 

์™ธ๋ถ€ ์Šคํ‚ค๋งˆ ์ƒ์„ฑ

-- AWSGlueConsoleFullAccess
CREATE EXTERNAL SCHEMA external_schema
from data catalog
database 'myspectrum_db' -- myspectrum_db์— ๋งŒ๋“  ์™ธ๋ถ€ ์Šคํ‚ค๋งˆ ์ €์žฅ
iam_role 'arn:aws:iam::*****:role/redshift.read.s3' -- IAM ARL ๋„ฃ๊ธฐ
create external database if not exists; -- ํ•ด๋‹น DB๊ฐ€ ์—†์œผ๋ฉด ๋งŒ๋“ฆ

myspectrum_db๋ฅผ ์ƒ์„ฑํ•˜์—ฌ ํ•ด๋‹น DB์— external_schema๋ฅผ ํ•˜๋‚˜ ๋งŒ๋“ค์—ˆ๋‹ค.

์ค‘์š”ํ•œ ์ ์€, Glue ๊ถŒํ•œ์„ ํฌํ•จํ•œ ์—ญํ• ์„ ์ง€์ •ํ•ด์ฃผ์–ด์•ผ ํ•œ๋‹ค.

 

Dimension ์šฉ๋„๋กœ ์‚ฌ์šฉํ•  ์œ ์ € ์ •๋ณด ํ…Œ์ด๋ธ” ๋งŒ๋“ค๊ธฐ

-- Dimension ์šฉ๋„์˜ ํ…Œ์ด๋ธ” ํ•˜๋‚˜ ์ œ์ž‘, ์œ ์ € ์ •๋ณด ํ…Œ์ด๋ธ”
-- ์œ ์ €์˜ ์„ฑ๋ณ„๊ณผ ๋‚˜์ด(18~50)๋ฅผ ๋žœ๋ค์œผ๋กœ ๋ถ€์—ฌํ•˜๊ณ  user_property ํ…Œ์ด๋ธ”๋กœ ์ €์žฅํ•œ๋‹ค.
CREATE TABLE raw_data.user_property AS
SELECT
 userid,
 CASE WHEN cast (random() * 2 as int) = 0 THEN 'male' ELSE 'female' END gender,
 (CAST(random() * 50 as int)+18) age
FROM (
 SELECT DISTINCT userid
 FROM raw_data.user_session_channel
);

 

์™ธ๋ถ€ ํ…Œ์ด๋ธ”์„ ๋งŒ๋“ค์–ด ์–‘์‹์— ๋งž๊ฒŒ ๋ฐ์ดํ„ฐ ํ• ๋‹น

CREATE EXTERNAL TABLE external_schema.user_session_channel (
 userid integer ,
 sessionid varchar(32),
 channel varchar(32)
)
row format delimited -- ํ•œ ์ค„์— ํ•œ ๋ ˆ์ฝ”๋“œ
fields terminated by ',' -- ๋ฐ์ดํ„ฐ๋ฅผ ๋‚˜๋ˆ„๋Š” ๋ฐฉ์‹, CSV ํŒŒ์ผ์ด๋ฏ€๋กœ ',' ๊ฐ€ ๋“ค์–ด๊ฐ„๋‹ค.
stored as textfile
location 's3://sunho-test-bucket/usc/'; -- ํŒŒ์ผ ๊ฒฝ๋กœ

 

์™ธ๋ถ€ ํ…Œ์ด๋ธ”๊ณผ JOIN ๊ฒฐ๊ณผ

์ •์ƒ์ ์œผ๋กœ JOIN์ด ๋œ ๋ชจ์Šต

 

 


 

 

๐ŸŽˆ Athena

  • AWS์˜ Presto ์„œ๋น„์Šค๋กœ, Redshift Spectrum๊ณผ ๋น„์Šทํ•œ ๊ธฐ๋Šฅ์„ ์ œ๊ณต
  • S3์— ์žˆ๋Š” ๋ฐ์ดํ„ฐ๋“ค์„ ๊ธฐ๋ฐ˜์œผ๋กœ SQL ์ฟผ๋ฆฌ๊ธฐ๋Šฅ์„ ์ œ๊ณต
  • Redshift๋ฅผ ์‚ฌ์šฉํ•˜์ง€ ์•Š๋Š”๋‹ค๋ฉด Athena๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ๋œ๋‹ค.

 

 

๐ŸŽˆ Redshift ML

  • SQL๋งŒ ์‚ฌ์šฉํ•ด์„œ ๋จธ์‹ ๋Ÿฌ๋‹ ๋ชจ๋ธ์„ ํ›ˆ๋ จ๊ฐ€๋Šฅํ•˜๋„๋ก ํ•˜๋Š” ์„œ๋น„์Šค
  • AWS SageMaker(์ตœ์ ํ™” ๋ชจ๋ธ ์ž๋™ ์ƒ์„ฑ)์— ์˜ํ•ด ์ง€์›๋œ๋‹ค.
  • BYOM(Bring Your Own Model)๋กœ ์ด๋ฏธ ์žˆ๋Š” ๋ชจ๋ธ ์‚ฌ์šฉ๋„ ๊ฐ€๋Šฅ

 

Redshift ML์—์„œ ์•Œ๋ฉด ์ข‹์€ ํ‚ค์›Œ๋“œ

 

1. ๋จธ์‹ ๋Ÿฌ๋‹

  • ํŠธ๋ ˆ์ด๋‹ ์…‹ : ํ•™์Šต์— ์‚ฌ์šฉ๋˜๋Š” ๋ฐ์ดํ„ฐ
  • ๋ ˆ์ด๋ธ”(Y) : ์˜ˆ์ธกํ•ด์•ผ ํ•˜๋Š” ๊ฐ’
  • Featuer(X) : ๋ ˆ์ด๋ธ”์„ ์˜ˆ์ธกํ•˜๋Š” ๋ฐ ์‚ฌ์šฉํ•˜๋Š” ํžŒํŠธ๋“ค

 

2. Amazon SageMaker

 

๋จธ์‹ ๋Ÿฌ๋‹ ๋ชจ๋ธ ๊ฐœ๋ฐœ์„ ์ฒ˜์Œ๋ถ€ํ„ฐ ๋๊นŒ์ง€ ํ•ด๊ฒฐํ•˜๋Š” AWS ์„œ๋น„์Šค(MLOps ํ”„๋ ˆ์ž„์›)

 

๋‹ค์Œ์˜ ๊ณผ์ •์„ ๋ชจ๋‘ ์ง€์›ํ•ด์ค€๋‹ค.

  • ํŠธ๋ ˆ์ด๋‹ ์…‹ ์ค€๋น„
  • ๋ชจ๋ธ ํ›ˆ๋ จ
  • ๋ชจ๋ธ ๊ฒ€์ฆ
  • ๋ชจ๋ธ ๋ฐฐํฌ ๋ฐ ๊ด€๋ฆฌ
  • SageMaker Studio๋ผ๋Š” ์›น ๊ธฐ๋ฐ˜ ํ™˜๊ฒฝ ์ œ๊ณต
  • Python Notebook์— SageMaker ๋ชจ๋“ˆ ๋„ฃ์–ด ํ›ˆ๋ จ๋„ ๊ฐ€๋Šฅํ•˜๋‹ค.

 

3. AutoPilot

SageMaker์—์„œ ์ œ๊ณต๋˜๋Š” AutoML(๋ฐ์ดํ„ฐ ์ฃผ๋ฉด ์•Œ์•„์„œ ๋ชจ๋ธ ๋งŒ๋“ฆ) ๊ธฐ๋Šฅ

 

โฉ Redshift ML ์‹ค์Šตํ•ด๋ณด๊ธฐ

 

1. S3์— ํ•™์Šต์— ์‚ฌ์šฉํ•  CSV ํŒŒ์ผ ์—…๋กœ๋“œ

S3 > sunho-test-bucket > redshift_ml > train.csv

 

2. SageMaker ๊ถŒํ•œ์„ ์œ„ํ•œ IAM Role ๋งŒ๋“ค๊ธฐ

์œ„ ์ด๋ฏธ์ง€์ฒ˜๋Ÿผ IAM ์—ญํ• ์„ ํ•˜๋‚˜ ๋งŒ๋“ค์–ด์ค€๋‹ค.

Redshift๊ฐ€ SageMaker์— Accessํ•˜๊ณ , SageMaker๋„ Redshift์— Accessํ•˜๋ฏ€๋กœ ์–‘์ชฝ์œผ๋กœ ๊ด€๊ณ„๋ฅผ ์„ค์ •ํ•ด์•ผํ•œ๋‹ค.

 

์ด ์„ค์ •์€ ํ•ด๋‹น ์—ญํ• ์˜ ์‹ ๋ขฐ ๊ด€๊ณ„ > ์‹ ๋ขฐ ์ •์ฑ… ํŽธ์ง‘ ์—์„œ ๋ฐ”๊ฟ€ ์ˆ˜ ์žˆ๋‹ค.

์™ผ์ชฝ ์ƒ๋‹จ์˜ ์‹ ๋ขฐ ์ •์ฑ… ํŽธ์ง‘ ๋ฒ„ํŠผ์„ ํด๋ฆญํ•œ๋‹ค.

 

์˜ค๋ฅธ์ชฝ ํƒญ์˜ ๋ณด์•ˆ ์ฃผ์ฒด ์ถ”๊ฐ€ ๋ฒ„ํŠผ์„ ๋ˆŒ๋Ÿฌ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ํŽธ์ง‘ํ•ด์ค€๋‹ค.

๋ณด์•ˆ ์ฃผ์ฒด๋ฅผ ์ถ”๊ฐ€ํ•˜๊ณ  ์ •์ฑ… ์—…๋ฐ์ดํŠธ ๋ฒ„ํŠผ์„ ๋ˆŒ๋Ÿฌ ์ €์žฅํ•œ๋‹ค.

 

3. Redshift ๋„ค์ž„์ŠคํŽ˜์ด์Šค์— ํ•ด๋‹น ์—ญํ•  ์ถ”๊ฐ€

Redshift ์›ํ•˜๋Š” ๋„ค์ž„์ŠคํŽ˜์ด์Šค > ๋ณด์•ˆ ๋ฐ ์•”ํ˜ธํ™” > IAM ์—ญํ•  ๊ด€๋ฆฌ

 

๋งŒ๋“ค์—ˆ๋˜ Redshift-ML ์—ญํ• ์ด ์ถ”๊ฐ€๋œ ๋ชจ์Šต

 

4. S3์˜ train.csv๋ฅผ Redshift ํ…Œ์ด๋ธ”์— COPY

ํ…Œ์ด๋ธ”์„ ์ƒˆ๋กœ ๋จผ์ € ํ•˜๋‚˜ ๋งŒ๋“  ๋’ค, COPY ์ปค๋งจ๋“œ๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค.

-- ์†์„ฑ์€ ํ•ด๋‹น csv ํŒŒ์ผ์— ๋งž๊ฒŒ ์ง€์ •
CREATE TABLE raw_data.orange_telecom_customers (
 state varchar,
 account_length integer,
 area_code integer,
 international_plan varchar,
 …
 customer_service_calls integer,
 churn varchar,
 purpose varchar
);

 

COPY raw_data.orange_telecom_customers
FROM 's3://sunho-test-bucket/redshift_ml/train.csv'
credentials 'aws_iam_role=arn:aws:iam::*****:role/redshift.read.s3' -- IAM ARN ์ž…๋ ฅ
delimiter ',' dateformat 'auto' timeformat 'auto' IGNOREHEADER 1 -- ์˜๋ฏธ๋Š” ์ €๋ฒˆ ๊ธ€ ์ฐธ๊ณ (For CSV)
removequotes;

 

5. ML ๋ชจ๋ธ ์ƒ์„ฑํ•˜๊ธฐ

CREATE MODEL ๋ช…๋ น์–ด๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ML ๋ชจ๋ธ์„ ์ƒ์„ฑํ•œ๋‹ค.

์ฐธ๊ณ ๋กœ ๋ฐ์ดํ„ฐ์˜ purpose ํ•„๋“œ ๊ฐ’์ด Train ์ด๋ฉด ํ•™์Šต์šฉ ๋ฐ์ดํ„ฐ๋กœ ์‚ฌ์šฉํ•˜๊ฒ ๋‹ค๋Š” ์˜๋ฏธ๋กœ ์ €์žฅํ–ˆ์—ˆ๋‹ค.

๋ชจ๋ธ์„ ์ƒ์„ฑํ•˜๋ฉด์„œ ์ด ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ํ•จ์ˆ˜๋„ ๊ฐ™์ด ์ƒ์„ฑํ•˜๊ฒŒ ๋œ๋‹ค.

 

CREATE MODEL orange_telecom_customers_model
FROM (
 SELECT
 state, account_length, …, customer_service_calls, churn -- ํ•ด๋‹นํ•˜๋Š” ์†์„ฑ๋“ค ๋ชจ๋‘ ์ž…๋ ฅ
 FROM raw_data.orange_telecom_customers
 WHERE purpose = 'Train'
)
TARGET churn -- ์˜ˆ์ธกํ•  ๊ฐ’ ํ•„๋“œ
FUNCTION ml_fn_orange_telecom_customers -- ๋‚˜์ค‘์— ํ•จ์ˆ˜๋กœ ์‚ฌ์šฉํ•  ๋•Œ ์‚ฌ์šฉํ•  ํ•จ์ˆ˜๋ช…
IAM_ROLE 'arn:aws:iam::*****:role/Redshift-ML' -- IAM ARN(์•„๊นŒ ๋งŒ๋“  SageMaker ML ์—ญํ• )
SETTINGS ( -- ๋ฒ„ํ‚ท ์ง€์ •
 S3_BUCKET 'sunho-test-bucket'
);

 

์ดํ›„ ๋ชจ๋ธ์ด ๋งŒ๋“ค์–ด์ง€๋Š”๋ฐ, ๋ฐ์ดํ„ฐ์˜ ํฌ๊ธฐ์— ๋”ฐ๋ผ ๋‹ค๋ฅด๊ฒ ์ง€๋งŒ ์‹œ๊ฐ„์ด ๊ฝค ์˜ค๋ž˜ ์†Œ์š”๋œ๋‹ค๊ณ  ํ•œ๋‹ค.

์•ฝ 2700๊ฐœ์˜ ๋ฐ์ดํ„ฐ๋ฅผ ํ•™์Šตํ•˜๋Š”๋ฐ 3~4์‹œ๊ฐ„ ์ •๋„ ์†Œ์š”๋ฌ์œผ๋ฉฐ, ๋น„์šฉ์€ ์•ฝ $21๊ฐ€ ์ฒญ๊ตฌ๋œ๋‹ค๊ณ  ํ•œ๋‹ค.

๋‚˜๋Š” ๋น„์šฉ์ด ์ฒญ๊ตฌ๋˜๋Š”๊ฒŒ ๋ถ€๋‹ด์Šค๋Ÿฌ์›Œ์„œ ์ง์ ‘ ์‹คํ–‰ํ•˜์ง„ ์•Š์•˜๋‹ค..

 

6. ๋งŒ๋“ค์–ด์ง„ ๋ชจ๋ธ ์‚ฌ์šฉํ•ด๋ณด๊ธฐ

๋ชจ๋ธ์„ ์ƒ์„ฑํ•˜๋ฉฐ ๋งŒ๋“ค์—ˆ๋˜ ํ•จ์ˆ˜๋ฅผ ์ด์šฉํ•ด์„œ ์˜ˆ์ธก๊ฐ’์„ ๋„์–ด๋ณด์ž.

 

SELECT churn, -- ์ •๋‹ต์ด ๋  ๊ฒƒ์ด๋‹ค.
 ml_fn_orange_telecom_customers(
 state, account_length, area_code, international_plan, voice_mail_plan,
 number_vmail_messages, total_day_minutes, total_day_calls,
 total_day_charge, total_eve_minutes, total_eve_calls, total_eve_charge,
 total_night_minutes, total_night_calls, total_night_charge,
 total_intl_minutes, total_intl_calls, total_intl_charge,
 customer_service_calls
 ) AS "prediction" -- ml_fn_orange_telecom_customers ํ•จ์ˆ˜๋ฅผ ํ†ตํ•ด ์˜ˆ์ธก๊ฐ’์ด ๋‚˜์˜จ๋‹ค.
FROM raw_data.orange_telecom_customers
WHERE purpose = 'Test'; -- ์ด๋ฒˆ์—” Test์šฉ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค.

 

7. ๋งŒ๋“ค์–ด์ง„ ๋ชจ๋ธ ์ œ๊ฑฐํ•˜๊ธฐ!

์ด๋Œ€๋กœ ๋ชจ๋ธ์„ ๋‚ด๋ฒ„๋ ค๋‘๋ฉด ์šฉ๋Ÿ‰๋„ ์ฐจ์ง€ํ•˜๊ณ  ๋น„์šฉ์ด ์ฒญ๊ตฌ๋  ์ˆ˜๋„ ์žˆ๋‹ค๊ณ  ํ•œ๋‹ค.

๋‹ค์Œ์˜ ๊ณผ์ •์œผ๋กœ ๋‚จ์€ ์ž”์žฌ๋“ค์„ ๊ผญ ์ฒญ์†Œํ•ด์ฃผ์ž.

 

Drop Model '๋ชจ๋ธ๋ช…'  ๋ช…๋ น์–ด๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ธ์„ ์‚ญ์ œํ•œ๋‹ค.

SageMaker ๋Œ€์‹œ๋ณด๋“œ์—์„œ Models, Endpoints์— ํ•ด๋‹นํ•˜๋Š” ๊ฐ’์„ ๋ชจ๋‘ ์‚ญ์ œํ•œ๋‹ค.

 

๋ชจ๋ธ์„ ๋” ์ด์ƒ ์‚ฌ์šฉํ•˜์ง€ ์•Š๋Š”๋‹ค๋ฉด ๊ผญ ์ฒญ์†Œํ•˜๋Š” ๊ฒƒ์„ ์žŠ์ง€ ์•Š๋„๋ก ๋ช…์‹ฌํ•˜์ž.

 

 


 

๐ŸŸง Redshift ์ค‘์ง€ / ์ œ๊ฑฐ ๋ฐ ์ตœ์ ํ™”ํ•˜๊ธฐ

 

Redshift(๊ณ ์ •๋น„์šฉ)์˜ ๊ฒฝ์šฐ

  • ๋Œ€์‹œ๋ณด๋“œ์—์„œ Pause์™€ Resume์„ ์ ์ ˆํžˆ ์‚ฌ์šฉํ•˜์—ฌ ์กฐ์ ˆ
  • Redshift๋ฅผ ์‚ญ์ œํ•˜๊ณ  S3์— ์Šค๋ƒ…์ƒท์„ ์ €์žฅํ•ด๋‘์—ˆ๋‹ค๊ฐ€ ๋‚˜์ค‘์— ์ƒˆ๋กœ์šด Redshift cluster๋กœ ๋งŒ๋“ค ์ˆ˜ ์žˆ์Œ
  • Redshift ๊ณ ์ • ๋น„์šฉ ์„œ๋น„์Šค๋Š” ์ฃผ๊ธฐ์ ์œผ๋กœ ๋ฒ„์ „ ์—…๊ทธ๋ ˆ์ด๋“œ๋ฅผ ์œ„ํ•ด ์ค‘๋‹จ๋œ๋‹ค. (Maintenance window)

 

Redshift Serverless์˜ ๊ฒฝ์šฐ

  • Serverless์˜ ๊ฒฝ์šฐ ๊ณ ์ •๋น„์šฉ์ฒ˜๋Ÿผ ์ค‘์ง€ํ•  ์ˆ˜ ์—†๋‹ค.
  • ์ œ๊ฑฐํ•ด์•ผํ•˜๋Š” ๊ฒฝ์šฐ ๋‹ค์Œ์˜ ๊ณผ์ •์„ ๋”ฐ๋ผ์•ผ ํ•œ๋‹ค.
    1. ๋จผ์ € ๋ชจ๋“  Workgroup๋“ค์„ ๋ชจ๋‘ ์‚ญ์ œํ•ด์•ผํ•จ
    2. ๋‹ค์Œ์œผ๋กœ ๋ชจ๋“  Namespace ์‚ญ์ œ

 

Redshift ์ตœ์ ํ™”ํ•˜๊ธฐ

Redshift์—์„œ๋Š” ๋ฐ์ดํ„ฐ๊ฐ€ ์‚ฝ์ž…, ์—…๋ฐ์ดํŠธ, ์‚ญ์ œ๋  ๋•Œ ๋ฐ์ดํ„ฐ๋“ค์ด ๋ถˆ๊ทœ์น™ํ•˜๊ฒŒ ๋ถ„์‚ฐ๋˜์–ด ์ €์žฅ๋œ๋‹ค.

์ด๋•Œ VACUUM ๋ช…๋ น์–ด๋ฅผ ํ†ตํ•ด ๋ฐ์ดํ„ฐ๋ฅผ ์ •๋ฆฌํ•˜๊ณ , ๋‚จ์€ ๋””์Šคํฌ ๊ณต๊ฐ„ ํšŒ์ˆ˜ ๋ฐ ํ•ด์ œ๋ฅผ ์ฒ˜๋ฆฌํ•ด์ค€๋‹ค.

 

VACUUM [ FULL | SORT ONLY | DELETE ONLY | REINDEX | RECLUSTER ] [table_name] [TO threshold PERCENT]
  • FULL - ํ…Œ์ด๋ธ” ์ •๋ ฌ + ์‚ญ์ œ๋œ ๊ณต๊ฐ„ ํšŒ์ˆ˜
  • SORT ONLY - ํ…Œ์ด๋ธ” ์ •๋ ฌ
  • DELETE ONLY - ์‚ญ์ œ๋œ ๊ณต๊ฐ„ ํšŒ์ˆ˜
  • REINDEX - ์ธํ„ฐ๋ฆฌ๋ธŒ ํ…Œ์ด๋ธ” ์ธ๋ฑ์Šค ์žฌ์ง€์ • + ํ…Œ์ด๋ธ” ์ •๋ ฌ + ์‚ญ์ œ๋œ ๊ณต๊ฐ„ ํšŒ์ˆ˜
  • TO threshold PERCENT : ์ง€์ •ํ•œ ์ž„๊ณ„๊ฐ’ ์ด์ƒ์œผ๋กœ ํ…Œ์ด๋ธ”์ด ์ •๋ ฌ๋˜์–ด ์žˆ๋‹ค๋ฉด ์ •๋ ฌ ๋‹จ๊ณ„๋ฅผ ์ƒ๋žตํ•œ๋‹ค.

 

๋‹ค์Œ์€ ์‚ฌ์šฉ ์˜ˆ์‹œ๋‹ค.

VACUUM ์‚ฌ์šฉ ์˜ˆ์‹œ - ์ถœ์ฒ˜) AWS document

 

VACUUM์€ ์ฃผ๊ธฐ์ ์œผ๋กœ ์‹คํ–‰์‹œ์ผœ์ฃผ์–ด Redshift๋ฅผ ์ตœ์ ํ™”ํ•ด์ฃผ๋ฉด ์ข‹์ง€๋งŒ, ๋ฆฌ์†Œ์Šค๋ฅผ ๋งŽ์ด ์žก์•„๋จน๋Š” ์ž‘์—…์ด๋‹ค.

VACUUM ์ปค๋งจ๋“œ๋ฅผ ์“ธ ๋•Œ์—๋Š” ์ผ์ด ์—†๋Š” ์‹œ๊ฐ„๋Œ€์— ์‹คํ–‰ํ•ด์ฃผ๋ฉด ์ข‹๋‹ค๊ณ  ํ•œ๋‹ค.

 

VACCUM ์ฐธ๊ณ  ๋ฌธ์„œ : https://docs.aws.amazon.com/ko_kr/redshift/latest/dg/r_VACUUM_command.html

 

VACUUM - Amazon Redshift

Amazon Redshift VACUUM ๋ช…๋ น ๊ตฌ๋ฌธ๊ณผ ๋™์ž‘์€ PostgreSQL VACUUM ์ž‘์—…๊ณผ๋Š” ํฌ๊ฒŒ ๋‹ค๋ฆ…๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด Amazon Redshift์˜ ๊ธฐ๋ณธ VACUUM ์ž‘์—…์€ VACUUM FULL๋กœ์„œ, ๋””์Šคํฌ ๊ณต๊ฐ„์„ ํšŒ์ˆ˜ํ•˜๊ณ  ๋ชจ๋“  ์—ด์„ ๋‹ค์‹œ ์ •๋ ฌํ•ฉ๋‹ˆ๋‹ค. ์ด์™€

docs.aws.amazon.com