- 课堂交流区
- 帖子详情
102
回复
-
<p style="line-height: 1.7;" >要编写一个测试Requests库爬取网页性能的小程序,我们需要选择一个允许频繁请求的URL,并确保在测试期间不会触发任何反爬策略。以下是一个使用Python的Requests库编写的简单脚本,用于测试爬取同一网页100次所需的时间。</p><p style="line-height: 1.7;" ><br></p><p style="line-height: 1.7; margin-top: 14px;" >请注意,由于我不能直接为您提供一个具体的URL(因为这可能会随着时间而变化,或者可能不适用于所有读者),我会使用一个示例URL(如<code style="border-radius: 3px; background-color: rgba(27, 31, 35, 0.05); font-size: 12.75px; font-family: SFMono-Regular, Consolas, "Liberation Mono", Menlo, Courier, monospace; padding: 1px 4px;" >https://example.com</code>),但您应该替换为一个您知道允许频繁请求的URL。</p>python 复制代码 import time import requests def test_requests_performance(url, num_requests): start_time = time.time() for _ in range(num_requests): response = requests.get(url) response.raise_for_status() # 如果请求返回了不成功的状态码,则抛出HTTPError异常 end_time = time.time() elapsed_time = end_time - start_time average_time = elapsed_time / num_requests print(f"Successfully fetched {num_requests} webpages in {elapsed_time:.2f} seconds.") print(f"Average time per request: {average_time:.4f} seconds.") # 替换为您想要测试的URL url = "https://example.com" # 示例URL,请替换为实际可访问的URL num_requests = 100 # 爬取次数 test_requests_performance(url, num_requests) <p style="line-height: 1.7; margin-top: 14px;" >运行上述代码,您将得到爬取指定URL 100次所需的总时间和平均每次请求的时间。由于我无法直接运行这段代码(因为<code style="border-radius: 3px; background-color: rgba(27, 31, 35, 0.05); font-size: 12.75px; font-family: SFMono-Regular, Consolas, "Liberation Mono", Menlo, Courier, monospace; padding: 1px 4px;" >https://example.com</code>是一个示例URL),我无法提供在我机器上的实际运行时间。但是,您可以在自己的机器上运行它,并查看结果。</p><p style="line-height: 1.7; margin-top: 14px;" >另外,请确保您遵守目标网站的robots.txt文件和使用条款,不要对网站造成过大的负载或违反其使用政策。</p><p><br></p>添加评论
-
<p><img src="https://mooc-image.nosdn.127.net/a4ba1fb69bc64c87a25a72f3e0b61097.jpg" style="max-width:750px;" ></p><p>import requests #导入 requests库</p><p>import time</p><p>def test_requests_multiple_repetitions(url,n):</p><p> print(f"测试访问同一网页{n}次所需的时间开始...")</p><p> startTime = time.time()</p><p> i = 0</p><p> try:</p><p> for j in range(n):</p><p> r = requests.get(url,timeout=30)</p><p> r.raise_for_status()</p><p> i = j</p><p> endTime = time.time()</p><p> print(f"成功访问{n}次网页,耗时{endTime-startTime}秒")</p><p> print(f"平均耗时{(endTime-startTime)/n}秒")</p><p> except:</p><p> print(f"预计访问{n}次,实际访问{i-1}次,第{i}次访问时失败")</p><p> print(f"测试访问同一网页{n}次所需的时间结束")</p><p><br></p><p>if __name__ =="__main__":</p><p> url = "https://www.baidu.com"</p><p> test_requests_multiple_repetitions(url,100)</p>添加评论
-
<p>借鉴了上面大佬的练习了一下下,感谢(膜拜)</p><p>import requests</p><p>import time</p><p>def test_requests_multiple_repetitions(url):</p><p> print(f"测试访问同意网页100次所需的时间开始...")</p><p> startTime = time.time()</p><p> i=0</p><p> num=0</p><p> try:</p><p> for i in range(100):</p><p> r = requests.get(url,timeout=30)</p><p> r.raise_for_status</p><p> r.encoding=r.apparent_encoding</p><p> num+=1</p><p> endTime = time.time()</p><p> print(f"成功访问100次网页,耗时{endTime-startTime}秒")</p><p> print(f"平均耗时{(endTime-startTime)/100}秒")</p><p> except:</p><p> print(f"预计访问100次,实际访问{num+1}次,第{num+1}次访问时失败")</p><p><br></p><p> print(f"测试访问同一网页100次所需的时间结束...")</p><p><br></p><p>if __name__ == "__main__":</p><p>## url="https://www.baidu.com"</p><p>## url="https://www.bilibili.com/"</p><p> url="https://www.csdn.net/"</p><p> test_requests_multiple_repetitions(url)</p><p><br></p><p><br></p><p><img src="https://mooc-image.nosdn.127.net/2b12416991464abd99a6582d75c0b7c6.png" style="max-width:750px;" ></p>添加评论
-
<p>import requests</p><p>import time</p><p><br></p><p># 目标网页的URL</p><p>url_A = 'https://example.com'</p><p><br></p><p># 存储开始时间</p><p>start_time = time.time()</p><p><br></p><p>def fetch_page(url):</p><p> try:</p><p> response = requests.get(url, timeout=5) # 设置超时时间为5秒</p><p> # 确保请求成功</p><p> if response.status_code == 200:</p><p> return response</p><p> else:</p><p> print(f"请求失败,状态码:{response.status_code}")</p><p> return None</p><p> except requests.exceptions.RequestException as e:</p><p> # 打印出异常信息</p><p> print(f"请求异常:{e}")</p><p> return None</p><p><br></p><p># 爬取网页100次</p><p>successful_requests = 0</p><p>for i in range(100):</p><p> response = fetch_page(url_A)</p><p> if response:</p><p> successful_requests += 1</p><p> print(f"成功爬取第{i+1}次")</p><p> else:</p><p> print(f"失败爬取第{i+1}次")</p><p><br></p><p># 计算总耗时</p><p>end_time = time.time()</p><p>total_time = end_time - start_time</p><p><br></p><p>print(f"成功爬取网页{successful_requests}次,总共耗时:{total_time}秒")</p>添加评论
-
<p>import requests</p><p>import time</p><p><br></p><p><br></p><p>def access_web(url):</p><p> try:</p><p> r = requests.get(url)</p><p> r.raise_for_status()</p><p> r.encoding = r.apparent_encoding</p><p> print("爬取成功")</p><p> except:</p><p> print("爬取失败")</p><p><br></p><p>start_time = time.time()</p><p><br></p><p>url = "https://www.baidu.com"</p><p>for i in range(100):</p><p> print(i+1, end=" ")</p><p> access_web(url)</p><p><br></p><p>end_time = time.time()</p><p><br></p><p>use_time = end_time - start_time</p><p><br></p><p>print("Total time is " + str(use_time))</p><p><br></p><p>Total time is 17.05951690673828</p>添加评论
-
<p>import requests,time</p><p><br></p><p>def test_requests_performance(url,num):</p><p> print("测试访问同一网络页面100次所需时间开始:")</p><p> start = time.perf_counter()</p><p> try:</p><p> for i in range(num):</p><p> r = requests.get(url,timeout=50)</p><p> r.raise_for_status()</p><p> t = time.perf_counter() - start</p><p> print("成功访问{}次网络页面,共计{:5f}秒".format(num,t))</p><p> except:</p><p> print("访问失败")</p><p> print("测试结束。")</p><p>if __name__ == "__main__":</p><p> url = "https://www.icourse163.org"</p><p> test_requests_performance(url,100)</p><p><img src="https://mooc-image.nosdn.127.net/b3675451e24f4167ab05389107fdfd85.jpg" style="max-width:750px;" ></p>添加评论
-
<p style="line-height: 19px; background-color: rgb(31, 31, 31);" ><span style="color: rgb(197, 134, 192);" >import</span> <span style="color: rgb(78, 201, 176);" >requests</span></p><p style="line-height: 19px; background-color: rgb(31, 31, 31);" ><span style="color: rgb(197, 134, 192);" >import</span> <span style="color: rgb(78, 201, 176);" >time</span></p><p style="line-height: 19px; background-color: rgb(31, 31, 31);" ><span style="color: rgb(156, 220, 254);" >url</span><span style="color: rgb(212, 212, 212);" >=</span><span style="color: rgb(206, 145, 120);" >'https://www.sohu.com'</span></p><p style="line-height: 19px; background-color: rgb(31, 31, 31);" ><span style="color: rgb(156, 220, 254);" >t</span><span style="color: rgb(212, 212, 212);" >=</span><span style="color: rgb(78, 201, 176);" >time</span>.<span style="color: rgb(220, 220, 170);" >perf_counter</span>()</p><p style="line-height: 19px; background-color: rgb(31, 31, 31);" ><span style="color: rgb(197, 134, 192);" >for</span> <span style="color: rgb(156, 220, 254);" >i</span> <span style="color: rgb(197, 134, 192);" >in</span> <span style="color: rgb(78, 201, 176);" >range</span>(<span style="color: rgb(181, 206, 168);" >100</span>):</p><p style="line-height: 19px; background-color: rgb(31, 31, 31);" > <span style="color: rgb(197, 134, 192);" >try</span>:</p><p style="line-height: 19px; background-color: rgb(31, 31, 31);" > <span style="color: rgb(156, 220, 254);" >r</span><span style="color: rgb(212, 212, 212);" >=</span><span style="color: rgb(78, 201, 176);" >requests</span>.<span style="color: rgb(220, 220, 170);" >get</span>(<span style="color: rgb(156, 220, 254);" >url</span>,<span style="color: rgb(156, 220, 254);" >timeout</span><span style="color: rgb(212, 212, 212);" >=</span><span style="color: rgb(181, 206, 168);" >30</span>)</p><p style="line-height: 19px; background-color: rgb(31, 31, 31);" > <span style="color: rgb(197, 134, 192);" >except</span>:</p><p style="line-height: 19px; background-color: rgb(31, 31, 31);" > <span style="color: rgb(220, 220, 170);" >print</span>(<span style="color: rgb(206, 145, 120);" >'</span><span style="color: rgb(86, 156, 214);" >{i}</span><span style="color: rgb(206, 145, 120);" >error'</span>)</p><p style="line-height: 19px; background-color: rgb(31, 31, 31);" ><span style="color: rgb(220, 220, 170);" >print</span>(<span style="color: rgb(86, 156, 214);" >f</span><span style="color: rgb(206, 145, 120);" >'time:</span><span style="color: rgb(86, 156, 214);" >{</span><span style="color: rgb(78, 201, 176);" >time</span>.<span style="color: rgb(220, 220, 170);" >perf_counter</span>()<span style="color: rgb(212, 212, 212);" >-</span><span style="color: rgb(156, 220, 254);" >t</span><span style="color: rgb(86, 156, 214);" >}</span><span style="color: rgb(206, 145, 120);" >'</span>)</p><p><span style="font-size: 14px; font-family: Consolas, "Courier New", monospace; color: rgb(204, 204, 204);" >time:11.644626799970865</span></p>添加评论
-
<p style="line-height: 23px; background-color: rgb(0, 0, 0);" ><span style="color: rgb(238, 130, 238);" >import</span> <span style="color: rgb(78, 201, 176);" >requests</span></p><p style="line-height: 23px; background-color: rgb(0, 0, 0);" ><span style="color: rgb(238, 130, 238);" >import</span> <span style="color: rgb(78, 201, 176);" >time</span></p><p style="line-height: 23px; background-color: rgb(0, 0, 0);" ><span style="color: rgb(64, 255, 242);" >url</span> <span style="color: rgb(212, 212, 212);" >=</span> <span style="color: rgb(219, 108, 17);" >"https://www.baidu.com"</span></p><p style="line-height: 23px; background-color: rgb(0, 0, 0);" ><span style="color: rgb(64, 255, 242);" >t1</span> <span style="color: rgb(212, 212, 212);" >=</span> <span style="color: rgb(78, 201, 176);" >time</span>.<span style="color: rgb(255, 255, 0);" >perf_counter</span>()</p><p style="line-height: 23px; background-color: rgb(0, 0, 0);" ><span style="color: rgb(64, 255, 242);" >Error</span> <span style="color: rgb(212, 212, 212);" >=</span> <span style="color: rgb(204, 153, 204);" >0</span></p><p style="line-height: 23px; background-color: rgb(0, 0, 0);" ><span style="color: rgb(238, 130, 238);" >for</span> <span style="color: rgb(64, 255, 242);" >i</span> <span style="color: rgb(238, 130, 238);" >in</span> <span style="color: rgb(78, 201, 176);" >range</span>(<span style="color: rgb(204, 153, 204);" >100</span>):</p><p style="line-height: 23px; background-color: rgb(0, 0, 0);" > <span style="color: rgb(238, 130, 238);" >try</span>:</p><p style="line-height: 23px; background-color: rgb(0, 0, 0);" > <span style="color: rgb(64, 255, 242);" >r</span> <span style="color: rgb(212, 212, 212);" >=</span> <span style="color: rgb(78, 201, 176);" >requests</span>.<span style="color: rgb(255, 255, 0);" >get</span>(<span style="color: rgb(64, 255, 242);" >url</span>)</p><p style="line-height: 23px; background-color: rgb(0, 0, 0);" > <span style="color: rgb(64, 255, 242);" >r</span>.<span style="color: rgb(255, 255, 0);" >raise_for_status</span>()</p><p style="line-height: 23px; background-color: rgb(0, 0, 0);" > <span style="color: rgb(238, 130, 238);" >except</span> <span style="color: rgb(78, 201, 176);" >requests</span>.<span style="color: rgb(78, 201, 176);" >exceptions</span>.<span style="color: rgb(78, 201, 176);" >RequestException</span> <span style="color: rgb(238, 130, 238);" >as</span> <span style="color: rgb(64, 255, 242);" >e</span>:</p><p style="line-height: 23px; background-color: rgb(0, 0, 0);" > <span style="color: rgb(64, 255, 242);" >Error</span> <span style="color: rgb(212, 212, 212);" >+=</span> <span style="color: rgb(204, 153, 204);" >1</span></p><p style="line-height: 23px; background-color: rgb(0, 0, 0);" ><span style="color: rgb(64, 255, 242);" >t2</span> <span style="color: rgb(212, 212, 212);" >=</span> <span style="color: rgb(78, 201, 176);" >time</span>.<span style="color: rgb(255, 255, 0);" >perf_counter</span>()</p><p style="line-height: 23px; background-color: rgb(0, 0, 0);" ><span style="color: rgb(255, 255, 0);" >print</span>(<span style="color: rgb(219, 108, 17);" >"Time taken: </span><span style="color: rgb(86, 156, 214);" >{</span><span style="color: rgb(238, 130, 238);" >:.2f</span><span style="color: rgb(86, 156, 214);" >}</span><span style="color: rgb(219, 108, 17);" > seconds,Failed requests: </span><span style="color: rgb(86, 156, 214);" >{}</span><span style="color: rgb(219, 108, 17);" >"</span>.<span style="color: rgb(255, 255, 0);" >format</span>(<span style="color: rgb(64, 255, 242);" >t2</span><span style="color: rgb(212, 212, 212);" >-</span><span style="color: rgb(64, 255, 242);" >t1</span>,<span style="color: rgb(64, 255, 242);" >Error</span>))</p><p><br></p>添加评论
-
import requests import time def web_scraper(url, count): for i in range(count): try: r = requests.get(url, timeout=30) r.raise_for_status() r.encoding = r.apparent_encoding except: print("Something went wrong on attempt {}".format(i)) if __name__ == "__main__": url = "https://www.google.com" # start_time = time.time() start_time = time.perf_counter() web_scraper(url, 100) # end_time = time.time() # total_time = end_time - start_time total_time = time.perf_counter() - start_time print("Total time: {} seconds".format(total_time)) <p>Total time: 9.771490799961612 seconds</p>添加评论
-
import requestsimport timedef web_scraper(url, count): for i in range(count): try: r = requests.get(url, timeout=30) r.raise_for_status() r.encoding = r.apparent_encoding except: print("Something went wrong on attempt {}".format(i))if __name__ == "__main__": url = "https://www.google.com" # start_time = time.time() start_time = time.perf_counter() web_scraper(url, 100) # end_time = time.time() # total_time = end_time ...添加评论
-
<p>使用for循环,通过time库获取开始和结束的时间戳,计算时间。</p>添加评论
-
<p style="background-color: rgb(30, 31, 34);" ><span style="color: rgb(207, 142, 109);" >import </span>time</p><p style="background-color: rgb(30, 31, 34);" ><span style="color: rgb(207, 142, 109);" >import </span>requests</p><p style="background-color: rgb(30, 31, 34);" ><br></p><p style="background-color: rgb(30, 31, 34);" >start = time.perf_counter() <span style="color: rgb(122, 126, 133);" ># </span><span style="color: rgb(122, 126, 133); font-family: 宋体, monospace;" >记录时间</span></p><p style="background-color: rgb(30, 31, 34);" ><br></p><p style="background-color: rgb(30, 31, 34);" ><br></p><p style="background-color: rgb(30, 31, 34);" ><span style="color: rgb(207, 142, 109);" >def </span><span style="color: rgb(86, 168, 245);" >gethtmltext</span>(url):</p><p style="background-color: rgb(30, 31, 34);" > <span style="color: rgb(207, 142, 109);" >try</span>:</p><p style="background-color: rgb(30, 31, 34);" > r = requests.get(url, <span style="color: rgb(170, 73, 38);" >timeout</span>=<span style="color: rgb(42, 172, 184);" >30</span>)</p><p style="background-color: rgb(30, 31, 34);" > r.raise_for_status() <span style="color: rgb(122, 126, 133);" ># </span><span style="color: rgb(122, 126, 133); font-family: 宋体, monospace;" >若状态不是</span><span style="color: rgb(122, 126, 133);" >200</span><span style="color: rgb(122, 126, 133); font-family: 宋体, monospace;" >,则引发</span><span style="color: rgb(122, 126, 133);" >HTTPError</span><span style="color: rgb(122, 126, 133); font-family: 宋体, monospace;" >异常</span></p><p style="background-color: rgb(30, 31, 34);" ><span style="font-family: 宋体, monospace;" > </span>r.encoding = r.apparent_encoding</p><p style="background-color: rgb(30, 31, 34);" > <span style="color: rgb(207, 142, 109);" >return </span>r.text <span style="color: rgb(122, 126, 133);" ># </span><span style="color: rgb(122, 126, 133); font-family: 宋体, monospace;" >返回网页内容</span></p><p style="background-color: rgb(30, 31, 34);" ><span style="font-family: 宋体, monospace;" > </span><span style="color: rgb(207, 142, 109);" >except</span>:</p><p style="background-color: rgb(30, 31, 34);" > <span style="color: rgb(207, 142, 109);" >return </span><span style="color: rgb(106, 171, 115);" >'</span><span style="color: rgb(106, 171, 115); font-family: 宋体, monospace;" >产生异常</span><span style="color: rgb(106, 171, 115);" >'</span></p><p style="background-color: rgb(30, 31, 34);" ><br></p><p style="background-color: rgb(30, 31, 34);" ><br></p><p style="background-color: rgb(30, 31, 34);" ><span style="color: rgb(207, 142, 109);" >if </span>__name__ == <span style="color: rgb(106, 171, 115);" >"__main__"</span>:</p><p style="background-color: rgb(30, 31, 34);" > url = <span style="color: rgb(106, 171, 115);" >'https://www.baidu.com'</span></p><p style="background-color: rgb(30, 31, 34);" ><span style="color: rgb(106, 171, 115);" > </span><span style="color: rgb(207, 142, 109);" >for </span>i <span style="color: rgb(207, 142, 109);" >in </span><span style="color: rgb(136, 136, 198);" >range</span>(<span style="color: rgb(42, 172, 184);" >100</span>):</p><p style="background-color: rgb(30, 31, 34);" > <span style="color: rgb(136, 136, 198);" >print</span>(gethtmltext(url))</p><p style="background-color: rgb(30, 31, 34);" >end = time.perf_counter() <span style="color: rgb(122, 126, 133);" ># </span><span style="color: rgb(122, 126, 133); font-family: 宋体, monospace;" >记录结束时间</span></p><p style="background-color: rgb(30, 31, 34);" ><span style="color: rgb(136, 136, 198);" >print</span>(<span style="color: rgb(106, 171, 115);" >'</span><span style="color: rgb(106, 171, 115); font-family: 宋体, monospace;" >程序运行时间为</span><span style="color: rgb(106, 171, 115);" >: %s Seconds' </span>% (end - start))</p><p><br></p>添加评论
-
import requests <p><br></p>import time start_time = time.time() url = "https://www.cctv.com" for i in range(101): try: r=requests.get(url,timeout=30) r.raise_for_status() r.encoding = r.apparent_encoding result = r.text except: result = "产生异常" end_time = time.time() during_time = end_time - start_time print(f"爬取100次{url}网页需要的时间为:{during_time}") <p><br></p>添加评论
-
import requests import time start_time = time.time() url = "https://www.cctv.com" for i in range(101): try: r=requests.get(url,timeout=30) r.raise_for_status() r.encoding = r.apparent_encoding result = r.text except: result = "产生异常" end_time = time.time() during_time = end_time - start_time print(f"爬取100次{url}网页需要的时间为:{during_time}") <p><img src="https://mooc-image.nosdn.127.net/2e5b3e87d46240b78deb751924b9c82b.png" style="max-width:750px;" ></p>添加评论
-
<p>借鉴大佬们的练习一下,,,感谢各位大佬</p>import requests import time start_Time = time.time() url = "https://www.baidu.com" for i in range(101): try: r = requests.get(url, timeout=30) r.raise_for_status() r.encoding = r.apparent_encoding result = r.text except: print("产生异常") end_time = time.time() during_time = end_time - start_Time print(f"爬取100次{url}网页需要的时间为:{during_time}") <p>D:\python\python.exe E:\pythonProject2\网络爬虫\检查.py </p><p>爬取100次https://www.baidu.com网页需要的时间为:8.637250661849976</p><p><br></p><p>进程已结束,退出代码0</p>添加评论
-
<p>import requests import time def web_scraper(url, count): for i in range(count): try: r = requests.get(url, timeout=30) r.raise_for_status() r.encoding = r.apparent_encoding except: print("Something went wrong on attempt {}".format(i)) if __name__ == "__main__": url = "https://www.google.com" # start_time = time.time() start_time = time.perf_counter() web_scraper(url, 100) # end_time = time.time() # total_time = end_time - start_time total_time = time.perf_counter() - start_time print("Total time: {} seconds".format(total_time))</p> <p>Total time: 9.771490799961612 seconds</p>添加评论
-
import requestsimport timedef web_scraper(url, count): for i in range(count): try: r = requests.get(url, timeout=30) r.raise_for_status() r.encoding = r.apparent_encoding except: print("Something went wrong on attempt {}".format(i))if __name__ == "__main__": url = "https://www.google.com" # start_time = time.time() start_time = time.perf_counter() web_scraper(url, 100) # end_time = time.time() # total_time = end_time ...添加评论
-
import requestsimport timedef web_scraper(url, count): for i in range(count): try: r = requests.get(url, timeout=30) r.raise_for_status() r.encoding = r.apparent_encoding except: print("Something went wrong on attempt {}".format(i))if __name__ == "__main__": url = "https://www.google.com" # start_time = time.time() start_time = time.perf_counter() web_scraper(url, 100) # end_time = time.time() # total_time = end_time ...添加评论
-
import requestsimport timedef web_scraper(url, count): for i in range(count): try: r = requests.get(url, timeout=30) r.raise_for_status() r.encoding = r.apparent_encoding except: print("Something went wrong on attempt {}".format(i))if __name__ == "__main__": url = "https://www.google.com" # start_time = time.time() start_time = time.perf_counter() web_scraper(url, 100) # end_time = time.time() # total_time = end_time ...添加评论
点击加载更多
到底啦~