- 课堂交流区
- 帖子详情
459
回复
-
<p>import requests<br >import time<br >def getHEMLText(url):<br > try:<br > r = requests.get(url, timeout = 30)<br > r.raise_for_status() #如果状态不是200,引发HTTPError异常<br > r.encoding = r.apparent_encoding<br > return r.text<br > except:<br > return "产生异常"<br >if __name__ == "__main__":<br > url = "https://www.baidu.com"<br > begin = time.time()<br > for i in range(0, 100):<br > getHEMLText(url)<br > print("{:.2f}".format(time.time()-begin))<br ><br ><br >D:\Python题库\Scripts\python.exe D:/Python题库/网络爬取通用代码框架.py<br >9.68<br ><br >Process finished with exit code 0<br ></p>添加评论
-
<code class="brush:python;toolbar:false" >import requests import time t1 = time.time() for i in range(100): r = requests.get("https://www.baidu.com") t2 = time.time() t = t2 - t1 print(t)</code><p><img src="https://nos.netease.com/edu-image/0d658e51e5674e81956b4cc7a6d268e3.png" /></p><br />
-
<p><code class="brush:python;toolbar:false" >import requests import time t1 = time.time() for i in range(5): r = requests.get("https://lq.fyxfw.gov.cn/display.php?id=35139")#爬取某可计数网页 t2 = time.time() t = t2 - t1 print(t) #可以通过爬取可计数网页来验证是否真正的爬取成功,我的例子是“临泉县先锋网”的,每次爬取耗时3.4秒。</code></p>
添加评论 -
-
<p>import requests<br >import time<br ><br ># 京东商品页<br >prefix_url = "https://item.jd.com/{}.html"<br >urls = []<br >num = 100000768781</p><p># 随机的100个任意页面<br >for i in range(100):<br > urls.append(prefix_url.format(num))<br > num +=1</p><p><br ></p><p># 启动性能计时<br >start_time = time.time()<br >for url in urls:<br > r = requests.get(url)<br >end_time = time.time()<br >delta_time = end_time - start_time</p><p># 打印耗时<br >print(delta_time)</p><p><br ></p><p>D:\Python\python.exe E:/Code/python/web_crawler/test.py</p><p>36.75410223007202</p><p><br ></p><p>Process finished with exit code 0</p>添加评论
-
<p><img src="https://nos.netease.com/edu-image/34a07db1d4444c378c217e6c36380410.png" /></p><p>爬取的是中国大学MOOC(慕课)国家精品页面,用时17.84秒。</p>添加评论
-
<p>import requests</p><p>import time</p><p>t1 = time.time()</p><p>for i in range(100):</p><p> r = request.get("https://www.baidu.com")</p><p>t2 = time.time()</p><p>t = t2 - t1</p><p>print(t)</p><p><br ></p>添加评论
-
<p>import requests</p><p>import time</p><p><br ></p><p>def getHtmlText(url):</p><p> try:</p><p> r = requests.get(url, timeout = 30)</p><p> r.raise_for_status</p><p> r.encoding = r.apparent_encoding</p><p> return r.text</p><p> except:</p><p> return ''</p><p>if __name__ == "__main__":</p><p> url = 'https://baidu.com'</p><p> start = time.perf_counter()</p><p> for i in range(100):</p><p> getHtmlText(url)</p><p> end = time.perf_counter()</p><p> dur = end - start</p><p> print(f'{dur = :.2f}')</p><p>url = '<a href="https://baidu.com'" >https://baidu.com'</a> </p><p>运行时间dur = 9.78</p>添加评论
-
<p><img src="https://nos.netease.com/edu-image/9789858e594c43298a1dc0d2860773e8.png" /></p>添加评论
-
<p>import requests</p><p>import time</p><p><br ></p><p>def getHTMLText(url):</p><p> try:</p><p> r=requests.get(url,timeout=30)</p><p> r.raise_for_status()</p><p> r.encoding=r.apparent_ecoding</p><p> return r.text</p><p> except:</p><p> return "产生异常"</p><p><br ></p><p>if __name__=="__main__":</p><p> start=time.perf_counter()</p><p> url="https://www.qq.com"</p><p> for i in range(100):</p><p> r=requests.get(url)</p><p> end=time.perf_counter()</p><p> print("{:.2f}".format(end-start))</p><p><br ></p><p>39.11</p>添加评论
-
<img src='https://edu-image.nosdn.127.net/638935717B899E0BCEB346699819B229.jpg' />添加评论
-
<p>import requests<br >import time<br ><br ><br >def getHtmlText(url):<br > try:<br > r = requests.get(url, timeout=30)<br > r.raise_for_status<br > r.encoding = r.apparent_encoding<br > return r.text<br > except:<br > return '爬取失败!'<br ><br ><br >if __name__ == "__main__":<br > url = 'https://m.bilibili.com/'<br > start = time.perf_counter()<br ><br > for i in range(100):<br > getHtmlText(url)<br ><br > end = time.perf_counter()<br > dur = end - start<br > print(f'{dur = :.2f}')</p><p><br ></p><p>url = '<a href="https://m.bilibili.com/'" >https://m.bilibili.com/'</a> </p><p><br ></p><p>运行时间:199.30</p>添加评论
-
<p><img src="https://nos.netease.com/edu-image/98d924c035ad4f1ea0e012416f8bad56.png" /><img src="https://nos.netease.com/edu-image/58e0d02dbd5b44cebfc41f681c34edcd.jpg" /></p>添加评论
-
<p>#coding=gbk<br >from time import perf_counter<br >import requests<br >def getHTMLText(url):<br > try:<br > r=requests.get(url,timeout=30)<br > r.raise_for_status() #如果状态不是200,引发HTTPError异常<br > r.encoding=r.apparent_encoding<br > return r.text<br > except:<br > return "产生异常"<br >start=perf_counter() <br >for i in range(100):<br > if __name__=="__main__":<br > url="https://www.danda.com.cn/index/index/page_title/cate_id/3/sub_style/0.html?device=pc&renqun_youhua=165944"<br > print(getHTMLText(url))<br > <br >print("爬取网页100次的时间为{}s".format(perf_counter()-start))</p>添加评论
-
<p><code class="brush:python;toolbar:false" >import requests import time t = time.perf_counter() for i in range(100): r = requests.get("https://www.baidu.com", timeout=10) duri = time.perf_counter()-t print(duri)</code>结果:16.93397787</p>添加评论
-
import requests<br >import time<br >def getHTMLText(url):<br > try:<br > r=requests.get(url,timeout=30)<br > r.raise_for_status()<br > r.encode=r.apparent_encode<br > return r.text<br > except:<br > return("返回错误")<br >if __name__=="__main__":<br > start=time.perf_counter()<br > url="https://www.baidu.com"<br ><br > for i in range(100):<br > getHTMLText(url)<br > dur=time.perf_counter()-start<br > print("{:.2f}".format(dur))<br ><br >4.57添加评论
-
<p><code class="brush:python;toolbar:false" >import requests as rq import time as t def gethtml(url): try: r = rq.get(url) r.raise_for_status() r.encoding = r.apparent_encoding return r except: return '爬取失败' if __name__ == '__main__': start = t.perf_counter() url = 'https://baidu.com' for i in range(100): gethtml(url) end = t.perf_counter() print('一百次爬取时间为{:.2f}秒'.format(end-start))</code>一百次爬取时间为10.38秒</p>
-
<p>感觉可读性很好。</p>
-
同学我可以加你联系方式吗?感觉自己有点辣鸡,希望大佬可以带带我
-
if中的判断是什么意思呢?
-
当函数名为主函数时运行一下代码
添加评论 -
-
<p><code class="brush:python;toolbar:false" >import requests as rq import time as t def gethtml(url): try: r = rq.get(url) r.raise_for_status() r.encoding = r.apparent_encoding return r except: return '爬取失败' if __name__ == '__main__': start = t.perf_counter() url = 'https://baidu.com' for i in range(100): gethtml(url) end = t.perf_counter() print('一百次爬取时间为{:.2f}秒'.format(end-start))</code>一百次爬取时间为103.37秒</p>添加评论
-
<p><code class="brush:python;toolbar:false" >import requests as rq import time as t def gethtml(url): try: r = rq.get(url) r.raise_for_status() r.encoding = r.apparent_encoding return r except: return '爬取失败' if __name__ == '__main__': start = t.perf_counter() url = 'https://baidu.com' for i in range(100): gethtml(url) end = t.perf_counter() print('一百次爬取时间为{:.2f}秒'.format(end-start))</code>一百次爬取时间为27.85秒</p>添加评论
-
<p><code class="brush:python;toolbar:false" >import requests as rq import time as t def gethtml(url): try: r = rq.get(url) r.raise_for_status() r.encoding = r.apparent_encoding return r except: return '爬取失败' if __name__ == '__main__': start = t.perf_counter() url = 'https://www.icourse163.org' for i in range(100): gethtml(url) end = t.perf_counter() print('一百次爬取时间为{:.2f}秒'.format(end-start))</code>一百次爬取时间为94.74秒</p>添加评论
-
<p><img src="https://nos.netease.com/edu-image/50b9bb8fedcc452189132bf41d619013.png" /></p><p>85.53秒</p>添加评论
-
<p>import requests<br >import time as t<br >def getHTMLText(url):<br > try :<br > r = requests.get(url, timeout = 30)<br > r.raise_for_status()#如果状态不是200,引发HTTPError异常<br > r.encoding = r.apparent_encoding #按照内容猜测的编码格式赋值给解析内容用的编码格式<br > return r.text<br > except:<br > return "产生异常"<br >if __name__ == "__main__":<br > url = "https://www.duba.com"<br > start = t.perf_counter()<br > for i in range(100):<br > getHTMLText(url)<br > end = t.perf_counter()<br > print(f"爬取该网页一百次所用时间: {end-start}")</p><p></p><p><br ></p><p>12.387908399999999</p>添加评论
点击加载更多
到底啦~