SQL 是數(shù)據(jù)分析中極其核心且常用的取數(shù)工具。絕大多數(shù)數(shù)據(jù)分析工作都離不開 SQL 從數(shù)據(jù)庫中提取所需數(shù)據(jù)。 - 如果你想從事數(shù)據(jù)分析工作,熟練使用 SQL 進行高效、準(zhǔn)確的取數(shù)和數(shù)據(jù)準(zhǔn)備是必須掌握的基本功。
SQL數(shù)據(jù)分析取數(shù),搞定平時90%的取數(shù)工作。
一、基礎(chǔ)查詢
--查詢所有字段
SELECT*FROM table_name;
--查詢指定字段
SELECT columnl, column2 FROM table_name;
--別名(Alias)
SELECT columnl AS name, column2 AS value FROM table_name;?
二、條件過濾
--基礎(chǔ)過濾(WHERE)
SELECT * FROM table_name WHERE column1>100;
--多條件組合(AND/OR)
SELECT * FROM table_name WHERE column1 ='A' AND (column2>50 OR column3 IS NOT NULL);
--IN 操作符
SELECT * FROM table_name WHERE columnl IN ('A', 'B', 'C');
--LIKE 模糊匹配
SELECT * FROM table_name WHERE columnl LlKE '%keyword%';
--NULL值判斷
SELECT * FROM table_name WHERE columnl IS NULL;?
三、排序與分頁
--排序(ORDER BY)
SELECT * FROM table_name ORDER BY column1 DESC, column2 ASC;--降序/升序
--分頁(LIMIT&OFFSET)
SELECT * FROM table_name LIMIT 10 OFFSET 20;--跳過村20條,取10條(第3頁)
四、聚合統(tǒng)計
--常用聚合函數(shù)
SELECT
COUNT(*) AS total_rows, --計數(shù)
SUM(sales) AS total_sales, --史和
AVG(price) AS avg_price, --平均值
MAX(date) AS latest_date,--最大值
MIN(date) AS earliest_date --最小值
FROM sales table;
--按分組聚合(GROUP BY)
SELECT category, SUM(sales) As total_sales FROM sales_table;
--HAVING 過濾分組結(jié)果
SELECT category, SUM(sales) AS total_sales
FROM sales table
GROUP BY category
HAVING SUM(sales)>1000;
五、子查詢與臨時表
--子查詢(Subquery)
SELECT * FROM orders
WHERE user id IN (SELECT user_id FROM users WHERE is_vip=1);
--公共表達式(CTE,WTTH 語句)
WITH top users AS (
SELECT user id, SUM(sales) AS total_sales
FROM orders
GROUP BY user_id
ORDER BY total sales DESC
LIMIT 10
)
SELECT * FROM top_users;
六、表連接(J0IN)
--內(nèi)連接(INNER JOIN)
SELECT a.*, b.extra_info
FROM table a a
INNER J0lN table b b ON a.id =b.a_id;
--左連接(LEFT JOIN)
SELECT a.*,b.extra info
FROM table a a
LEFT JOIN table b b ON a.id = b.a_id;
--多表連接
SELECT a.name, b.order_id, c.product_name
FROM users a
LEFT J0lN orders b ON a.user id= b.user_id
INNER JOIN products c ON b.product_id=c.product_id;?
七、日期處理
--提取日期部分
SELECT
DATE(order_time) AS order_date,--提取日期(如2023-10-01)
EXTRACT(YEAR FROM order time) As year,-提取年份
EXTRACT(MONTH FROM order time) AS month--提取月份
FROM orders;
--日期計算SELECT
order time, order time+INTERVAL '7 DAY' AS next_week_date --加7天FROM orders;
--日期時間格式化
--date_format() 按指定格式返回時間(對日期時間格式化)
#將“2023-09-21 15:06:51”轉(zhuǎn)化如下格式
select date_format('2023-09-21 15:06:51','%Y-%m-%d');
select date_format('2023-09-21 15:06:51','%Y-%M-%D');
select date format('2023-09-21 15:06:51','%M-%d-%y');
select date_format('2023-09-21 15:06:51','%m/%d/%y');
select date_format('2023-09-21 15:06:51','%m/%d/%Y %H:%i :%s');
select date format('2023-09-2115:06:51','%Y年%m月%d日 %H點%i分%s秒');?
八、窗口函數(shù)(Window Functions)
--排名(RANK,ROWLNUMBER)
SELECT
user_id,
sales,
RANK() OVER (ORDER BY sales DESC) AS sales rank,
ROW_NUMBER() OVER (PARTITION BY category ORDER BY sales DESC) AS row_num
FROM sales_table;
--累計求和(SUM OVER)
SELECT
date ,
daily_sales,
SUM(daily sales) OVER (ORDER BY date) AS cumulative_sales
FROM daily_sales_table;
九、條件判斷
--適用場景:數(shù)據(jù)分類打標(biāo)
SELECT
id
CASE
WHEN score >=90 THEN 'A'
WHEN score >= 80 THEN 'B'
ELSE 'C'
END AS grade
FROM students;
十、去重計數(shù)
--適用場景:統(tǒng)計UV等去重指標(biāo)
SELECT
COUNT (DISTINCT user_id) AS uv
FROM page_views
WHERE visit date='2023-10-01';
十一、排序+ToP N
--取銷售額TOP10商品
SELECT product id, SUM(sales) AS total_sales
FROM sales_data
GROUP BY product_id
ORDER BY total_sales DESC
LIMIT 10;
--適用場景:排行榜、頭部數(shù)據(jù)分析?
十二、分類打標(biāo)
--條件分支判斷
SELECT
user_id,
CASE
WHEN age<18 THEN '未成年'
WHEN age BETWEEN 18 AND 6O THEN '成年'
ELSE '老年'
END AS age_group
FROM users;?
十三、分析函數(shù)
分析函數(shù)row_number, rank, dense_rank, cume_dist, percent_rank, ntile
1.row number()
按順序排序,排序的值不會重復(fù),總數(shù)不變:
select uid,
dt,
pv,
row_number() over (partition by uid order by pv desc) as row_number_pv
from user_pv
order by uid, pv desc;
2. rank( )
大小一樣排序的值一樣,但會占用排名的位置,總數(shù)不變;下面對用戶每天瀏覽量進行一個排名。
select uid,
dt,
pv,
rank() over (partition by uid order by pv desc) as rank_pv
from user_pv
order by uid, pv desc;
3. dense_rank()
排序值相同時重復(fù),排名并列,排名依次增加,排序相同時總數(shù)會減少;例如,如果兩行排名為3,則下一個排名為4,不同于RANK()函數(shù)返回5。下面對用戶每天瀏覽量進行一個排名:
select uid,
dt,
pv,
dense_rank() over (partition by uid order by pv desc) dense_rank_pv
from user_pv
order by uid, pv desc;
對比看下,row_number, rank, dense_rank 的運行效果:
select uid,
dt,
pv,
row_number() over (partition by uid order by pv desc) as row_number_pv,
rank() over (partition by uid order by pv desc) as rank_pv,
dense_rank() over (partition by uid order by pv desc) dense_rank_pv
from user_pv
order by uid,pv desc;
總結(jié)來說,ROW_NUMBER函數(shù)為每一行分配唯一的行號,而RANK函數(shù)和DENSE_RANK函數(shù)在處理具有相同排序值的行時有所不同。RANK函數(shù)會跳過下一個排名,而DENSE_RANK函數(shù)會緊隨其后。
選擇使用哪個函數(shù)取決于具體的需求和對重復(fù)值的處理方式。
4.cume dist()
累積分布cume_dist()函數(shù),用于計算當(dāng)前行在排序結(jié)果中的累積分布比例。
計算公式=前面的行數(shù)/窗口分區(qū)中的總行數(shù)
#4、5的合并案例
select uid,
dt,
pv,
cume_dist() over (partition by uid order by pv) cume_dist_pv
from user_pv
order by uid, pv;
5. percent_rank()
非常類似于cume_dist函數(shù)。同樣用于計算當(dāng)前行在排序結(jié)果中的累積分布比例。
計算公式=前面的行數(shù)-1/窗口分區(qū)中的總行數(shù)-1
select uid,
dt,
pv,
percent_rank() over (partition by uid order by pv) as percent_rank_uv
from user_pv
order by uid, pv;
#計算機專業(yè) #數(shù)據(jù)分析我在行 #程序員 #數(shù)據(jù)庫 #sql語句 #sql刷題 #sql學(xué)習(xí) #SQL #數(shù)據(jù)分析師 #數(shù)據(jù)分析我在行
閱讀原文:原文鏈接
該文章在 2025/8/25 21:47:10 編輯過