老师参与

案例代码

Echocruise 发表于2017年11月18日
<p>老师,您好请问可以分享以下下面两个课程案例的代码及指导文件吗?</p><p>4.2.4 案例分析4-1:怎样用Python来分析政府工作报告?</p><p>4.3.5 案例分析4-2:怎样用R来分析美国2016年大选结果</p><p><br ></p>
1 回复

    1楼

  • xmpresident 发表于2017年12月24日
    0 | 0 | 举报
    <p>用Python来分析政府工作报告代码</p><p>@requires_authorization</p><p>from urllib.request import urlopen</p><p>from bs4 import BeautifulSoup</p><p>import re</p><p>import string</p><p>from collections import OrderedDict</p><p>import numpy as np&nbsp;</p><p>import matplotlib.pyplot as plt</p><p>from matplotlib import mlab</p><p>from matplotlib import rcParams</p><p><br ></p><p>def cleanInput(input):</p><p>&nbsp; &nbsp; input = re.sub('\n+',&quot; &quot;,input) #去除换行符</p><p>&nbsp; &nbsp; input = re.sub('\[[0-9]*\]',&quot;&quot;,input) #去除带中括号的数字</p><p>&nbsp; &nbsp; input = re.sub('[0-9]',&quot;&quot;,input) #去除数字</p><p>&nbsp; &nbsp; input = re.sub('[,。.、!:%;”“\[\]]',&quot;&quot;,input) #去除中文标点符号</p><p>&nbsp; &nbsp; input = re.sub(' +', &quot; &quot;,input) #去除空格</p><p>&nbsp; &nbsp; input.strip(string.punctuation) #去除英文标点符号</p><p>&nbsp; &nbsp; return input</p><p><br ></p><p>def getngrams(input, n):</p><p>&nbsp; &nbsp; input = cleanInput(input)</p><p>&nbsp; &nbsp; output = dict()</p><p>&nbsp; &nbsp; for i in range(len(input)-n+1):</p><p>&nbsp; &nbsp; &nbsp; &nbsp; newNGram = &quot;&quot;.join(input[i:i+n]) #以指定字符串连接生成新的字符串</p><p>&nbsp; &nbsp; &nbsp; &nbsp; if newNGram in output:</p><p>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; output[newNGram] += 1 #如果字符出现过则加1</p><p>&nbsp; &nbsp; &nbsp; &nbsp; else:</p><p>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; output[newNGram] = 1 #没出现过则设置为1</p><p>&nbsp; &nbsp; return output</p><p><br ></p><p>html = urlopen(&quot;https://news.ifeng.com/a/20170305/50754278_0.shtml&quot;)</p><p>bsObj = BeautifulSoup(html,&quot;lxml&quot;)</p><p>content = bsObj.find(&quot;div&quot;,{&quot;id&quot;:&quot;main_content&quot;}).get_text()</p><p>ngrams = getngrams(content,2)</p><p>ngrams = OrderedDict(sorted(ngrams.items(), key=lambda t: t[1], reverse=True))</p><p><br ></p><p>datafile = open(&quot;2017report.txt&quot;,'w+')</p><p>count = []</p><p>count_label = []</p><p>for k in ngrams:</p><p>&nbsp; &nbsp; print(&quot;(%s,%d)&quot; % (k,ngrams[k]))</p><p>&nbsp; &nbsp; datafile.write(&quot;(%s,%d)\n&quot; % (k,ngrams[k]))</p><p>&nbsp; &nbsp; if(ngrams[k]&gt; 30):</p><p>&nbsp; &nbsp; &nbsp; &nbsp; count.append(ngrams[k])</p><p>&nbsp; &nbsp; &nbsp; &nbsp; count_label.append(k)</p><p>x = np.arange(len(count))+1</p><p>fig1 = plt.figure(1)</p><p>rects =plt.bar(x,count,width = 0.5,align=&quot;center&quot;,yerr=0.001)</p><p>plt.title('2017政府工作报告词频统计')</p><p>def autolabel(rects):</p><p>&nbsp; &nbsp; for rect in rects:</p><p>&nbsp; &nbsp; &nbsp; &nbsp; height = rect.get_height()</p><p>&nbsp; &nbsp; &nbsp; &nbsp; plt.text(rect.get_x(), 1.03*height, '%s' % int(height))</p><p>autolabel(rects)</p><p>plt.xticks(x,count_label,rotation=90)</p><p>#plt.xticks(x,count_label)</p><p>plt.show()</p><p><br ></p>
    xmpresident 发表于2017年12月24日
    添加评论