I am using Cloduscraper Python library in order to obtain a JSON response from an url.
The probem is that I have to retry the same request 2-3 times before I get the correct output. The first responses have a 403 HTTP status code.
Here is my code:
import json
from time import sleep
import cloudscraper
url = "https://www.endpoint.com/api/"
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:95.0) Gecko/20100101 Firefox/95.0",
"Accept": "*/*",
"Content-Type": "application/json"
}
json_response = 0
while json_response == 0:
try:
scraper = cloudscraper.create_scraper()
r = scraper.get(url, headers=headers)
json_response = json.loads(r.text)
except:
print(r.status_code)
sleep(2)
return json_response
What can I do in order to optimize my code and prevent the 403 responses?
3
cloudscraper code base hasn’t been updated in a while and I’m unsure if it is even supported at this time.
Here is something that I wrote on bypassing a Cloudflare protected site. This task takes effort and most Python packages become obsolete quickly, because the vendor mitigates these bypass techniques.
@Lifeiscomplex Thank you for all the information reported. Unfortunately cfscrape doesn’t work in my case. Selenium is a lot slower than cloudscraper, maybe because I can’t use the option ‘headless’ or I get a 403. Therefore, isn’t there a supported library for bypassing cloudflare?