全球主机交流论坛

标题: 求助爬虫大佬 [打印本页]

作者: stuazt    时间: 2023-6-2 22:07
标题: 求助爬虫大佬
https://www.visa.com.sg/cmsapi/fx/rates?amount=100&fee=0&utcConvertedDate=06/02/2023&exchangedate=06/02/2023&fromCurr=CNY&toCurr=USD

想从visa网站爬汇率,上面这个url,一访问就给脸色看,返回的部分内容:
  1. <h1 data-translate="block_headline">Sorry, you have been blocked</h1>
  2.         <h2 class="cf-subheadline"><span data-translate="unable_to_access">You are unable to access</span> visa.com.sg</h2>
  3. <p data-translate="blocked_why_detail">This website is using a security service to protect itself from online attacks. The action you just performed triggered the security solution. There are several actions that could trigger this block including submitting a certain word or phrase, a SQL command or malformed data.</p>
  4.           <h2 data-translate="blocked_resolve_headline">What can I do to resolve this?</h2>

  5.             <p data-translate="blocked_resolve_detail">You can email the site owner to let them know you were blocked. Please include what you were doing when this page came up and the Cloudflare Ray ID found at the bottom of this page.</p>
  6.           <span class="cf-footer-item sm:block sm:mb-1">Cloudflare Ray ID: <strong class="font-semibold">7d1030877eac3f87</strong></span>
  7.     <span class="cf-footer-separator sm:hidden">•</span>
  8.     <span id="cf-footer-item-ip" class="cf-footer-item sm:block sm:mb-1">
  9.       Your IP:
  10.       <button type="button" id="cf-footer-ip-reveal" class="cf-footer-ip-reveal-btn">Click to reveal</button>
  11.       <span class="hidden" id="cf-footer-ip">20.212.226.221</span>
  12.       <span class="cf-footer-separator sm:hidden">•</span>
  13.     </span>
  14.     <span class="cf-footer-item sm:block sm:mb-1"><span>Performance &amp; security by</span> <a rel="noopener noreferrer"  id="brand_link" target="_blank">Cloudflare</a></span>
复制代码



原始页面是: https://www.visa.com.sg/support/consumer/travel-support/exchange-rate-calculator.html


试过了,不是因为ip,因为第一个url随便打开一个浏览器直接访问就能返回json;根本不需要访问原始页面也可以。Chrome隐私模式也都没问题。

用的selenium+chromedriver, headless模式
作者: ttyang    时间: 2023-6-2 22:39
curl也不行
说明需要header等信息,
不能让网站识别出来你是selenium

伪装一下吧
作者: Tankie    时间: 2023-6-2 22:55
去试下stealth.min.js,我用这个通过不少检测,再高级的我也不会




欢迎光临 全球主机交流论坛 (https://lilynana.eu.org/) Powered by Discuz! X3.4