基于数据处理与分析Python案例展示

来源:公文范文时间:2022-09-01 19:45:09 点击：推荐访问: 基于Python实现新冠疫情数据挖掘案例分析基于Python数据分析研究基于python旅游信息爬取以及数据分析

下面是小编为大家整理的基于数据处理与分析Python案例展示,供大家参考。

基于数据处理与分析Python案例展示

　1 © 南京大学 ©基于数据处理与分析的Python案例展示

　基于数据处理与分析的Python 案例展示

　张莉南京大学大学计算机基础教学部 4/23/2017 第一届“全国高校Python语言与计算生态教学研讨会”

　2 © 南京大学 ©基于数据处理与分析的Python案例展示人文社科类（基础应用）

　A 理工类（高级应用）

　B 专业定制（学科应用）

　C 3条路线

　3 © 南京大学 ©基于数据处理与分析的Python案例展示常用第三方库

　科学计算生态系统SciPy（包括NumPy、SciPy、Matplotlib和pandas等）

　Requests库 BeautifulSoup库 re模块 NLTK自然语言处理包 scikit-learn机器学习包 wordcloud词云 jieba分词 …

　4 © 南京大学 ©基于数据处理与分析的Python案例展示一些主题

　美国大选数据分析 产品评论挖掘 总统就职演说稿分析 A股股票分析 书评、音乐和电影数据爬取与分析 Google学术文献爬取器 地区空气污染数据分析 微博舆情控制

　航空公司客户价挖掘 新闻标题分析 百度贴吧表情分析 英文文献难度分析 就职网站职业需求统计 比赛数据分析球队风格 知乎热门话题讨论 B站弹幕数据分析 房价数据挖掘

　5 © 南京大学 ©基于数据处理与分析的Python案例展示人文社科类

　1

　6 © 南京大学 ©基于数据处理与分析的Python案例展示 NltK 自然语言处理包

　古腾堡

　gutenburg

　网络和聊天文本

　webtext

　就职演说

　inaugural

　布朗

　brown

　路透社

　reuters

　其他语言

　–

　多国语言

　自定义的语料库 NLTK 语料库

　7 © 南京大学 ©基于数据处理与分析的Python案例展示 >>> import nltk >>> nltk.download()

　8 © 南京大学 ©基于数据处理与分析的Python案例展示古滕堡项目

　 >>> from nltk.corpus import gutenberg >>> gutenberg.fileids() [u"austen-emma.txt", u"austen-persuasion.txt", u"austen-sense.txt", u"bible-kjv.txt", u"blake-poems.txt", u"bryant-stories.txt", u"burgess-busterbrown.txt", u"carroll-alice.txt", u"chesterton-ball.txt", u"chesterton-brown.txt", u"chesterton-thursday.txt", u"edgeworth-parents.txt", u"melville-moby_dick.txt", u"milton-paradise.txt", u"shakespeare-caesar.txt", u"shakespeare-hamlet.txt", u"shakespeare-macbeth.txt", u"whitman-leaves.txt"]

　S S ource

　9 © 南京大学 ©基于数据处理与分析的Python案例展示古滕堡项目

　>>> from nltk.corpus import gutenberg >>> allwords = gutenberg.words("shakespeare-hamlet.txt") >>> len(allwords) 37360 >>> len(set(allwords)) 5447 >>> all_words.count("Hamlet")

　99 >>> A = set(allwords) >>> longwords = [w for w in A if len(w) > 12] >>> print(sorted(longwords)) S S ource Output: [u"Circumstances", u"Guildensterne", u"Incontinencie", u"Recognizances", u"Vnderstanding", u"determination", u"encompassement", u"entertainment", u"imperfections", u"indifferently", u"instrumentall", u"reconcilement", u"stubbornnesse", u"transformation", u"vnderstanding"]

　10 © 南京大学 ©基于数据处理与分析的Python案例展示古滕堡项目

　 # Filename: freqG20.py from nltk.corpus import gutenberg from nltk.probability import * allwords = gutenberg.words("shakespeare-hamlet.txt") fd2 = FreqDist([sx.lower() for sx in allwords if sx.isalpha()]) print(fd2.B()) print(fd2.N()) fd2.tabulate(20) fd2.plot(20) F F ile Output: 4699 30266

　the

　and

　 to

　 of

　i

　you

　a

　 my

　 it

　 in that

　ham

　 is

　not

　his this with your

　but

　for

　 993

　863

　685

　610

　574

　527

　511

　502

　419

　400

　377

　337

　328

　300

　285

　276

　254

　253

　249

　245

　11 © 南京大学 ©基于数据处理与分析的Python案例展示美国总统就职演说

　# Filename: inaugural.py from nltk.corpus import inaugural from nltk.probability import ConditionalFreqDist cfd = ConditionalFreqDist(

　(fileid, len(w))

　for fileid in inaugural.fileids()

　for w in inaugural.words(fileid)

　if fileid > "1950") print(cfd.items()) cfd.plot() F F ile

　12 © 南京大学 ©基于数据处理与分析的Python案例展示美国总统就职演说

　Output: [(u"1965-Johnson.txt", FreqDist({3: 355, 2: 301, 1: 256, 4: 255, 5: 138, 7: 133, 6: 127, 8: 68, 9: 45, 10: 30, ...})), (u"1997-Clinton.txt", FreqDist({3: 534, 2: 378, 4: 352, 1: 350, 5: 225, 6: 179, 7: 171, 8: 117, 9: 70, 10: 45, ...})), (u"2009-Obama.txt", FreqDist({3: 599, 2: 441, 4: 422, 1: 350, 5: 236, 6: 225, 7: 198, 8: 96, 9: 63, 10: 59, ...})), „

　13 © 南京大学 ©基于数据处理与分析的Python案例展示情感挖掘

　喜剧类电影的标语偏向褒义情感，而恐怖类电影的标语偏向贬义情感 from nltk.corpus import sentiwordnet

　14 © 南京大学 ©基于数据处理与分析的Python案例展示经管类

　2

　15 © 南京大学 ©基于数据处理与分析的Python案例展示道指成分股数据获取

　寻找被JS隐藏的页面浏览器的“开发者工具”

　16 © 南京大学 ©基于数据处理与分析的Python案例展示 包含多个字符串 "AXP", "American Express Company", "77.77" "BA", "The Boeing Company", "177.83" "CAT", "Caterpillar Inc.", "96.39" …

　 # Filename: dji.py import requests re = requests.get("http://query1.finance.yahoo.com/v7/finance/quote?formatted=true&crumb=azVqAvrYffI&lang=en-„%2CregularMarketChangePercent&corsDomain=finance.yahoo.com") resp = re.json() for stock in resp["quoteResponse"]["result"]:

　 print(stock["symbol"], stock["longName"], stock["regularMarketPrice"]["fmt"]) F F ile 道指成分股数据获取

　利用开发者工具补全

　17 © 南京大学 ©基于数据处理与分析的Python案例展示

　 # Filename: to_excel.py from datetime import date import pandas as pd from matplotlib.finance import quotes_historical_yahoo_ochl today = date.today() start = (today.year-1, today.month, today.day) quotes = quotes_historical_yahoo_ochl("IBM", start, today) df = pd.DataFrame(quotes) df.to_excel("stockIBM.xlsx", sheet_name="IBM") F F ile 道指成分股数据获取

　用更方便的雅虎财经网站数据获取API

　18 © 南京大学 ©基于数据处理与分析的Python案例展示道指成分股数据获取

　WxPython GUI开发 # Filename: helloworldbtn.py import wx

　class Frame1(wx.Frame):

　 def __init__(self,superior):

　 wx.Frame.__init__(self, parent = superior, title = "Hello World in wxPython")

　 panel = wx.Panel(self)

　 sizer = wx.BoxSizer(wx.VERTICAL)

　 self.text1= wx.TextCtrl(panel, value = "Hello, World!", size = (200,180), style = wx.TE_MULTILINE)

　 sizer.Add(self.text1, 0, wx.ALIGN_TOP | wx.EXPAND)

　 button = wx.Button(panel, label = "Click Me")

　 sizer.Add(button)

　 panel.SetSizerAndFit(sizer)

　 panel.Layout()

　 self.Bind(wx.EVT_BUTTON,self.OnClick,button)

　 def OnClick(self, text):

　 self.text1.AppendText("\nHello, World!")

　19 © 南京大学 ©基于数据处理与分析的Python案例展示 1.选股 2.计算不同证券的均值、协方差 3.给不同资产随机分配初始权重 4.计算预期组合年化收益、组合方差 5.用蒙特卡洛模拟产生大量随机组合 6.投资组合优化1——sharpe最大 7.投资组合优化2——方差最小 8.组合的有效前沿如何构建有效的投资组合道指成分股数据投资组合

　20 © 南京大学 ©基于数据处理与分析的Python案例展示用蒙特卡洛模拟产生大量随机组合

　 # 蒙特卡洛随机产生组合

　 def monte_carlo(self):

　 for p in range(10000):

　 weights = np.random.random(self.n)

　 weights /= np.sum(weights)

　 ret = np.sum(self.mean * 252 * weights)

　 self.port_returns.append(ret)

if ret < self.y_min:

　 self.y_min = ret

　 self.port_variance.append(np.sqrt(np.dot(weights.T, np.dot(self.cov

　 * 252, weights))))

　 self.port_returns = np.array(self.port_returns)

　 self.port_variance = np.array(self.port_variance) 道指成分股数据投资组合

　21 © 南京大学 ©基于数据处理与分析的Python案例展示组合的有效前沿有效前沿：权重和为一：绘制所有的可行投资组合收益和协方差情况夏普最优：夏普指数是每一单位风险可给予的超额报酬风险最小：算出投资组合的方差最小时，风险最小的点道指成分股数据投资组合

　北大荒（600598）

　上升趋势的支撑线（上升趋势线）

　在上升趋势中，通过两个或者两个以上的波段低点进行连线，且使落在这条线上的低点尽可能地多，就是这段趋势的上升趋势线。

　上升趋势线对股价构成支撑作用。

　上升趋势的压力线在上升趋势中，通过两个或者两个以上的波段高点进行连线，可得到这段趋势的上升阻力线。

　上升阻力线对股价构成一定的压力作用。

　佛山照明（000541）

　上吊线孕线

　上吊线：卖点孕线：卖点蜡烛图中先后出现的上吊线和孕线形态，验证了这条趋势线对股价的压力作用如果股价再次来到压力线附近但这种“或有压力”没有得到蜡烛图形态的验证，那么此时持股待涨才是最佳选择

　3

　charset=utf-8">

<title>小王子短评</title>

„ </head> <body>

　 „

第一遍读时，我才4岁。

　等到真的读懂，才明白为什么这是一部“童话”。

„ </body> </html> 豆瓣书评爬取与解析

　抓取一页 r = requests.get("http://book.douban.com/subject/1084336/comments/") 抓取多页（i循环）r = requests.get("https://book.douban.com/subject/1084336/comments/hot?p=" + str(i+1))

　>>> soup = BeautifulSoup(r.text, "lxml") >>> pattern = soup.find_all("p", "comment-

　content") >>> for comment in pattern:

　print(comment.string)

>>>pattern2=re.compile(r" >>scores_temp=re.findall(pattern2, r.text)

　„ 195 看一次哭一次 196 我一直觉得，它没有安徒生写的那么好。

　197 每个成年人都不得不看的童话 198 法国

　199 几年前秋微推荐的，后来断断续续的看完。。。

　200 渴望驯服的狐狸。

　45.215053763440864 豆瓣书评爬取与解析

　豆瓣评分8.6 238278人评价

　短评114543条影评3761条豆瓣影评爬取与解析

　寄托天下留学论坛：

　提供有用的出国留学资讯和热心的留学交流论坛。在BBS上,你可以咨询签证,面试,机经,offer,奖学金,名校专业等,也可以分享雅思、托福、GRE的学习心得。

　留学论坛 offer 分析

上一篇：【原创】python缺失值处理案例分析,泰坦尼克数据分析报告论文（代码数据）（范文推荐）
下一篇：Python数据可视化分析与案例实战实战,,课件ch09,多元数据可视化

扩展阅读文章

推荐阅读文章