课堂交流区
帖子详情

Requests库的爬取性能分析

嵩天发表于2023年07月31日

<p>尽管Requests库功能很友好、开发简单（其实除了import外只需一行主要代码），但其性能与专业爬虫相比还是有一定差距的。请编写一个小程序，“任意”找个url，测试一下成功爬取100次网页的时间。（某些网站对于连续爬取页面将采取屏蔽IP的策略，所以，要避开这类网站。）</p><p>请回复代码，并给出url及在自己机器上的运行时间。</p><p><br/></p><p><br/></p><p><br/></p>

46 回复

华科_shandianchengzi

1楼

华科_shandianchengzi 发表于2023年08月08日

0 | 0 | 举报

<p>震惊，头一回知道requests会影响爬虫的性能，还以为爬虫的性能只受限于网速和线程数量QAQ。以前自己测，网站本身就慢的话、单次requests请求的时间就是略短于一次刷新的时间，爬取100次加上爬取后还要进行数据处理和数据导出，所以通常要两分钟左右，<span style="text-decoration: line-through;" >并且</span><span style="text-decoration: line-through;" >为了降低服务器负载我写的时候会尽量加个sleep</span>。</p><p><br ></p><p>老师，根据这个问题想向您请教一下，您提到的“专业爬虫”用的是用别的库或者直接用C写吗？</p>

华科_shandianchengzi 发表于2023年08月08日

0 | 评论(0) | 举报

添加评论

2楼

王北年发表于2023年08月10日

0 | 0 | 举报

<p>import requests</p><p>import time</p><p><br ></p><p>def pertime(url):</p><p>    try:</p><p>        r = requests.get(url,timeout=30)</p><p>        r.raise_for_status</p><p>        r.encoding = r.apparent_encoding</p><p>        return r.text</p><p>    except:</p><p>        print('产生异常')</p><p><br ></p><p>if __name__ =="__main__":</p><p>    url = 'https://www.baidu.com'</p><p>    totaltime=0</p><p>    for i in range(100):</p><p>        starttime = time.perf_counter()</p><p><br ></p><p>        pertime(url)</p><p>        endtime = time.perf_counter()</p><p><br ></p><p>        totaltime = totaltime +endtime - starttime</p><p>    print('共用时{:.4f}秒'.format(totaltime))</p><p><br ></p><p>共用时49.2819秒...</p>

王北年发表于2023年08月10日

0 | 评论(0) | 举报

添加评论

mooc15780649414065039

3楼

mooc15780649414065039 发表于2023年08月11日

0 | 0 | 举报

<p>import requests</p><p>import time</p><p><br ></p><p>def getTime(url):</p><p>    try:</p><p>        r = requests.get(url, timeout = 30)</p><p>        r.raise_for_status</p><p>        r.encoding = r.apparent_encoding</p><p>        return r.text</p><p>    except:</p><p>        print("产生异常")</p><p><br ></p><p>if __name__ == "__main__":</p><p>    url = 'https://www.baidu.com'</p><p>    totaltime = 0</p><p>    for i in range(100):</p><p>        start = time.perf_counter()</p><p>        getTime(url)</p><p>        totaltime = totaltime + time.perf_counter() - start</p><p>    print("共用时{:.2f}s".format(totaltime))</p><p>共用时5.52s</p>

mooc15780649414065039 发表于2023年08月11日

0 | 评论(0) | 举报

添加评论

4楼

仲夏夜之梦yhh 发表于2023年08月12日

0 | 0 | 举报

<p>python源码：</p><p>import requests<br >import time<br >def gettime(url):<br >    try:<br >        r = requests.get(url, timeout=30)<br >        r.raise_for_status<br >        r.encoding = r.apparent_encoding<br >        return r.text<br >    except:<br >        print('产生异常')<br >if __name__ == "__main__":<br >    url = 'https://www.baidu.com'<br >    totaltime = 0<br >    for i in range(100):<br >        starttime = time.perf_counter()<br >        gettime(url)<br >        totaltime = totaltime + time.perf_counter() - starttime<br >    print('爬取100次网页共用时{:.4f}秒'.format(totaltime))</p><p>运行结果：爬取100次网页共用时55.9501秒</p>

仲夏夜之梦yhh 发表于2023年08月12日

0 | 评论(0) | 举报

添加评论

mooc168762705722026742

5楼

mooc168762705722026742 发表于2023年08月16日

0 | 3 | 举报

<p>import requests</p><p><br ></p><p>def get_html_text(url):</p><p>         try:</p><p>                 r = requests.get(url, timeout=30)</p><p>                 r.raise_for_status()</p><p>                 r.encoding = r.apparent_encoding</p><p>                 return r.text</p><p>        except:</p><p>                 return '出现异常'</p><p><br ></p><p>url = 'https://www.baidu.com'</p><p>for number in range(100):</p><p>         print(get_html_text(url))</p><p>用时8.9s</p><p><br ></p><p><br ></p><p><br ></p><p>最后问一下伙伴们，if __name__ == '__main__'这行代码有什么意义啊？</p><p><code class="brush:python;toolbar:false" ><br ></code></p>

mooc168762705722026742 发表于2023年08月16日

0 | 评论(3) | 举报

我是真男人666 2023年08月16日

0 | 举报

这个貌似是一种格式，一种设定，设定一个定量，后面写的代码就能引用

我是真男人666 发表于2023年08月16日

0 | 举报
mooc168762705722026742 2023年08月16日

0 | 举报

<p>哦哦似懂非懂感谢你的回答</p>

mooc168762705722026742 发表于2023年08月16日

0 | 举报
Maizeman 2023年08月16日

0 | 举报

<p>简单理解，就是用这个代码生成py文件后，双击py文件就可以直接运行了</p>

Maizeman 发表于2023年08月16日

0 | 举报

添加评论

原缘mooc2

6楼

原缘mooc2 发表于2023年08月17日

0 | 0 | 举报

<p>import time<br >import requests<br >def gethtml(url):<br >    try:<br >        r = requests.get(url,timeout = 30)<br >        r.raise_for_status()<br >        r.encoding = r.apparent_encoding<br >        return r.text<br >    except:<br >        return "异常"<br >start= time.time()<br >for i in range(100):<br >    url = "https://www.cnki.net/"<br >    gethtml(url)<br >end = time.time()-start<br >print(end)<br ></p><p>58.22875738143921</p>

原缘mooc2 发表于2023年08月17日

0 | 评论(0) | 举报

添加评论

7楼

wbvchwhjw 发表于2023年08月17日

0 | 0 | 举报

import timeimport requestsdef gethtml(url): try: r = requests.get(url,timeout = 30) r.raise_for_status() r.encoding = r.apparent_encoding return r.text except: return "异常"start= time.time()for i in range(100): url = "https://www.cnki.net/" gethtml(url)end = time.time()-startprint(end)58.22875738143921

wbvchwhjw 发表于2023年08月17日

0 | 评论(0) | 举报

添加评论

mooc95344280548388358

8楼

mooc95344280548388358 发表于2023年08月18日

0 | 0 | 举报

<p>import time as t</p><p>import requests as r</p><p><br ></p><p>def getHtml(url):</p><p>    try:</p><p>        </p><p>        nn=r.get(url)</p><p>       </p><p>        print(nn.status_code)         </p><p>        </p><p>        nn.raise_for_status()</p><p>        return nn.text</p><p>    except:</p><p>        print("发生异常")</p><p><br ></p><p><br ></p><p>url="https://www.baidu.com"</p><p>start=t.time()</p><p>for i in range(100):</p><p>    </p><p>    nr=getHtml(url)</p><p>end=t.time()</p><p>print("{}秒".format(end-start))</p><p>print(nr)</p><p>运行13.36秒</p>

mooc95344280548388358 发表于2023年08月18日

0 | 评论(0) | 举报

添加评论

笨笨猫VIP

9楼

笨笨猫VIP 发表于2023年08月20日

0 | 0 | 举报

<p>运行100次，用了17.22秒,成功爬取100次</p><p><code class="brush:python;toolbar:false" >import requests import time def getHTML(url):     try:         r=requests.get(url,headers=hd)         return "OK"     except:         return "爬取失败！" hd={"user-agent":"chrome/10"} url='https://www.shanghairanking.cn/rankings/bcur/2023' start_time=time.time() count=0 for i in range(100):     info=getHTML(url)     if info=='OK':         count+=1 print("运行100次，用了{:.2f}秒,成功爬取{}次".format(time.time()-start_time,count))</code></p>

笨笨猫VIP 发表于2023年08月20日

0 | 评论(0) | 举报

添加评论

白板儿

10楼

白板儿发表于2023年08月20日

0 | 0 | 举报

<p>import requests<br >import time<br ><br ><br >def netCra(url):<br >    try:<br >        r = requests.get(url)<br >        r.raise_for_status()<br >        return True<br >    except:<br >        return False<br ><br ><br >url = "https://www.tfrerc.cn/"<br >stime = time.perf_counter()<br >re = {"True": 0, "False": 0}<br >for i in range(100):<br >    if netCra(url):<br >        re["True"] = re.get("True", 0) + 1<br >    else:<br >        re["True"] = re.get("False", 0) + 1<br >runTime = time.perf_counter() - stime<br >print(f"成功了{re['True']}次,\n失败了{re['False']}次")<br >print(f"共花费了{runTime:.2f}秒")<br ><br ></p><p>成功了100次,</p><p>失败了0次</p><p>共花费了6.01秒<code class="brush:python;toolbar:false" ><br ></code></p>

白板儿发表于2023年08月20日

0 | 评论(0) | 举报

添加评论

11楼

辽工大工商研22-3班赵晟群发表于2023年08月23日

0 | 0 | 举报

import requests import time def netCra(url): try: r = requests.get(url) r.raise_for_status() return True except: return False url = "https://www.tfrerc.cn/" stime = time.perf_counter() re = {"True": 0, "False": 0} for i in range(100): if netCra(url): re["True"] = re.get("True", 0) + 1 else: re["True"] = re.get("False", 0) + 1 runTime = time.perf_counter() - stime print(f"成功了{re['True']}次,\n失败了{re['False']}次") print(f"共花费了{runTime:.2f}秒") 成功了100次, 失败了0次共花费了6.01秒

辽工大工商研22-3班赵晟群发表于2023年08月23日

0 | 评论(0) | 举报

添加评论

上野生

12楼

上野生发表于2023年08月24日

0 | 0 | 举报

<p>import requests<br >import time<br >def getHTMLText(ur1):<br >    try:<br >        r=requests.get(ur1)<br >        r.raise_for_status()<br >        return True<br >    except:<br >        return False<br ><br >t=0<br >f=0<br >url = "https://www.tfrerc.cn/"<br >stime = time.perf_counter()<br >for i in range(0,100):<br >    if getHTMLText(url):<br >        t+=1<br >    else:<br >        f+=1<br >runTime = time.perf_counter() - stime<br >print("响应时间为：%0.2f秒"%runTime)<br >print("共成功"+str(t)+"次，共失败"+str(f)+"次")</p><p><br ></p><p>响应时间为：36.92秒</p><p>共成功100次，共失败0次</p><p><br ></p>

上野生发表于2023年08月24日

0 | 评论(0) | 举报

添加评论

13楼

石阡县白沙镇星星幼儿园发表于2023年08月24日

0 | 0 | 举报

import requestsimport timedef netCra(url): try: r = requests.get(url) r.raise_for_status() return True except: return Falseurl = "https://www.tfrerc.cn/"stime = time.perf_counter()re = {"True": 0, "False": 0}for i in range(100): if netCra(url): re["True"] = re.get("True", 0) 1 else: re["True"] = re.get("False", 0) 1runTime = time.perf_counter() - stimeprint(f"成功了{re['True']}次,\n失败了{re['False']}次")print(f"共花费了{runTime:.2f}秒")成功了100次,失败了0次共花费了6.01秒

石阡县白沙镇星星幼儿园发表于2023年08月24日

0 | 评论(0) | 举报

添加评论

14楼

FAFU1200430009 发表于2023年08月24日

1 | 0 | 举报

<p>import requests<br >import time<br ><br ><br >def spider1(url):<br >    try:<br >        r = requests.get(url, timeout=30)<br >        r.raise_for_status()<br >        return True<br >    except:<br >        return False<br ><br ><br >url1 = "https://ssr1.scrape.center/"<br >url2 = "https://www.tfrerc.cn/"<br ><br >stime = time.perf_counter()<br >re = {"True": 0, "False": 0}<br >for i in range(100):<br >    if spider1(url1):<br >        re["True"] = re.get("True", 0)+1<br >    else:<br >        re["False"] = re.get("False", 0)+1<br ><br >runtime = time.perf_counter()-stime<br >print(f"成功了{re['True']}次,\n失败了{re['False']}次")<br >print(f"共花费了{runtime:.2f}秒")</p><p><br ></p><p>运行结果：</p><p>成功了100次,</p><p>失败了0次</p><p>共花费了32.82秒</p><p>Process finished with exit code 0</p>

FAFU1200430009 发表于2023年08月24日

1 | 评论(0) | 举报

添加评论

15楼

Dani7217 发表于2023年08月25日

0 | 0 | 举报

<p>import requests</p><p>import time</p><p>start=time.perf_counter()</p><p>for i in range(100):</p><p>url="https://item.jd.com/2967929.html"</p><p>r=requests.get(url)</p><p>T=time.perf_counter()-start</p><p>if i!=99:</p><p>continue</p><p>print("time of crawling this website for 100 times:{}".format(T))</p><p><br ></p>

Dani7217 发表于2023年08月25日

0 | 评论(0) | 举报

添加评论

16楼

李心田1234 发表于2023年08月25日

0 | 0 | 举报

<p>import requests</p><p>import time</p><p>def getHTMLText(ur1):</p><p>    try:</p><p>        r=requests.get(ur1)</p><p>        r.raise_for_status()</p><p>        return True</p><p>    except:</p><p>        return False</p><p><br ></p><p>t=0</p><p>f=0</p><p>url = "https://www.tfrerc.cn/"</p><p>stime = time.perf_counter()</p><p>for i in range(0,100):</p><p>    if getHTMLText(url):</p><p>        t+=1</p><p>    else:</p><p>        f+=1</p><p>runTime = time.perf_counter() - stime</p><p>print("响应时间为：%0.2f秒"%runTime)</p><p>print("共成功"+str(t)+"次，共失败"+str(f)+"次")</p><p><br ></p><p><br ></p><p><br ></p><p>响应时间为：36.92秒</p><p><br ></p><p>共成功100次，共失败0次</p><p><br ></p>

李心田1234 发表于2023年08月25日

0 | 评论(0) | 举报

添加评论

17楼

只吃一碗飯_ 发表于2023年08月30日

0 | 0 | 举报

<p><code class="brush:python;toolbar:false" >import requests,time def getHTML(url):     try:         r = requests.get(f'https://{url}.com')         r.raise_for_status()         return True     except:         return False     def recordTime(url,Count):     start = time.perf_counter()     c = 0     while c <= int(Count):         if getHTML(url):             c += 1     totalTime = time.perf_counter() - start         print(f'成功爬取 https://{url}.com {Count}次的时间为 {round(totalTime,2)} s')     url = input('请输入要爬取网站的域名') Count = input('请输入爬取次数') print('开始爬取....') recordTime(url,Count)</code></p>

只吃一碗飯_ 发表于2023年08月30日

0 | 评论(0) | 举报

添加评论

18楼

虞啸帆发表于2023年08月31日

0 | 0 | 举报

import requests,time def getHTML(url): try: r = requests.get(f'https://{url}.com') r.raise_for_status() return True except: return False def recordTime(url,Count): start = time.perf_counter() c = 0 while c < intCount ifgetHTMLurl c=" 1" totalTime=" time.perf_counter() - start" printfhttpurlcomCountroundtotalTimes url=" input('请输入要爬取网站的域名')" Count=" input('请输入爬取次数')" print recordTimeurlCountcode>

虞啸帆发表于2023年08月31日

0 | 评论(0) | 举报

添加评论

19楼

张培雄217520336 发表于2023年08月31日

0 | 0 | 举报

通过学习该门课程，我学习到了很多有关于该门课程的知识，很大程度上提升的我对该领域的兴趣，加强了课外兴趣，培养了创新能力，学到了很多知识，收获颇多！！！！

张培雄217520336 发表于2023年08月31日

0 | 评论(0) | 举报

添加评论

20楼

张培雄217520336 发表于2023年08月31日

0 | 0 | 举报

通过学习该门课程，我学习到了很多有关于该门课程的知识，很大程度上提升的我对该领域的兴趣，加强了课外兴趣，培养了创新能力，学到了很多知识，收获颇多！！！！

张培雄217520336 发表于2023年08月31日

0 | 评论(0) | 举报

添加评论

点击加载更多

由高教社联手网易推出，让每一个有提升愿望的用户能够学到中国知名高校的课程，并获得认证。

友情链接

网易云课堂智慧高教

关注我们

关于我们

关于我们学校云联系我们常见问题意见反馈法律条款

网上有害信息举报（涉未成年人）：网站 https://www.12377.cn 邮箱（涉未成年人） youdao_jubao@rd.netease.com

粤B2-20090191-26 | 京ICP备12020869号-2 | 京公网安备44010602000207
©2014-2026 icourse163.org

浙公网安备 33010802012594号