- 老师答疑区
- 帖子详情
置顶
老师参与
案例代码
<p>老师,您好请问可以分享以下下面两个课程案例的代码及指导文件吗?</p><p>4.2.4 案例分析4-1:怎样用Python来分析政府工作报告?</p><p>4.3.5 案例分析4-2:怎样用R来分析美国2016年大选结果</p><p><br ></p>
1
回复
-
<p>用Python来分析政府工作报告代码</p><p>@requires_authorization</p><p>from urllib.request import urlopen</p><p>from bs4 import BeautifulSoup</p><p>import re</p><p>import string</p><p>from collections import OrderedDict</p><p>import numpy as np </p><p>import matplotlib.pyplot as plt</p><p>from matplotlib import mlab</p><p>from matplotlib import rcParams</p><p><br ></p><p>def cleanInput(input):</p><p> input = re.sub('\n+'," ",input) #去除换行符</p><p> input = re.sub('\[[0-9]*\]',"",input) #去除带中括号的数字</p><p> input = re.sub('[0-9]',"",input) #去除数字</p><p> input = re.sub('[,。.、!:%;”“\[\]]',"",input) #去除中文标点符号</p><p> input = re.sub(' +', " ",input) #去除空格</p><p> input.strip(string.punctuation) #去除英文标点符号</p><p> return input</p><p><br ></p><p>def getngrams(input, n):</p><p> input = cleanInput(input)</p><p> output = dict()</p><p> for i in range(len(input)-n+1):</p><p> newNGram = "".join(input[i:i+n]) #以指定字符串连接生成新的字符串</p><p> if newNGram in output:</p><p> output[newNGram] += 1 #如果字符出现过则加1</p><p> else:</p><p> output[newNGram] = 1 #没出现过则设置为1</p><p> return output</p><p><br ></p><p>html = urlopen("https://news.ifeng.com/a/20170305/50754278_0.shtml")</p><p>bsObj = BeautifulSoup(html,"lxml")</p><p>content = bsObj.find("div",{"id":"main_content"}).get_text()</p><p>ngrams = getngrams(content,2)</p><p>ngrams = OrderedDict(sorted(ngrams.items(), key=lambda t: t[1], reverse=True))</p><p><br ></p><p>datafile = open("2017report.txt",'w+')</p><p>count = []</p><p>count_label = []</p><p>for k in ngrams:</p><p> print("(%s,%d)" % (k,ngrams[k]))</p><p> datafile.write("(%s,%d)\n" % (k,ngrams[k]))</p><p> if(ngrams[k]> 30):</p><p> count.append(ngrams[k])</p><p> count_label.append(k)</p><p>x = np.arange(len(count))+1</p><p>fig1 = plt.figure(1)</p><p>rects =plt.bar(x,count,width = 0.5,align="center",yerr=0.001)</p><p>plt.title('2017政府工作报告词频统计')</p><p>def autolabel(rects):</p><p> for rect in rects:</p><p> height = rect.get_height()</p><p> plt.text(rect.get_x(), 1.03*height, '%s' % int(height))</p><p>autolabel(rects)</p><p>plt.xticks(x,count_label,rotation=90)</p><p>#plt.xticks(x,count_label)</p><p>plt.show()</p><p><br ></p>添加评论