深入浅出数据分析 7.5分
读书笔记 关系数据库
黑涩布朗尼

Page 367 & 369的练习直接用R来做了~Excel跑起来太慢 R里面清洗数据更灵活一些 http://www.headfirstlabs.com/books/hfda/hfda_ch12_articles.csv http://www.headfirstlabs.com/books/hfda/hfda_ch12_issues.csv http://www.headfirstlabs.com/books/hfda/hfda_ch12_articleHitsComments.csv http://www.headfirstlabs.com/books/hfda/hfda_ch12_sales.csv

articles <- read.csv("http://www.headfirstlabs.com/books/hfda/hfda_ch12_articles.csv",header=TRUE) # articleID issueID authorID webHits #1 1 1 8 2019 issues <- read.csv("http://www.headfirstlabs.com/books/hfda/hfda_ch12_issues.csv",header=TRUE) # issueID PubDate #1 1 10/24/04 articleHitsComments <- read.csv("http://www.headfirstlabs.com/books/hfda/hfda_ch12_articleHitsComments.csv",header=TRUE) # articleID authorName webHits commentCount #1 1 Destiny Adams 2019 14 library(sqldf) sqldf("Select issueID, count(articleID) as 'Article count' from articles group by issueID ") # issueID Article count #1 NA 0 #2 1 7 ArticleCount <- sqldf("Select issueID, count(articleID) as 'Article count' from articles group by issueID ") dispatch_analysis <- merge(issues,ArticleCount, by="issueID") sales <- read.csv("http://www.headfirstlabs.com/books/hfda/hfda_ch12_sales.csv",header=TRUE) head(sales) salesSum <- sqldf("Select issueID, sum(lotSize) as 'Article count' from sales group by issueID ") dispatch_analysis <- merge(dispatch_analysis,salesSum, by="issueID")
0
《深入浅出数据分析》的全部笔记 106篇
豆瓣
免费下载 iOS / Android 版客户端