amazon web services: Is it possible to block all IPs of AWS by some pattern or similar approach?

lundi 24 août 2015

Is it possible to block all IPs of AWS by some pattern or similar approach?

I was trying to crawl a site, I wrote a script which is perfectly working in my Local system, but when I run on amazon instance, it throw connection error/protocol error.

A Connection error occurred. - ConnectionError(ProtocolError('Connection aborted.', error(110, 'Connection timed out')),)

when I hit requests.get(url) line, it kinda get hang for a while, and I have to I interrupt this proccess by CTRL+C (In Ubuntu).

I tried this script on 3 different AWS instances, and I'm sure that I've never run any script to crawl that site on those instance, so I can be sure that this site has not blocked this particular IP.

I tried all possible things, like cookies settings, making session and all but no success, all these are working fine in my local system.

I want to know if this is possible to block all remote IPs or is there any thing such that there server can detect that Its a headless browser/remote machine not actual browser?

I was following this approach

import json
import requests
from bs4 import BeautifulSoup

s = requests.Session()
s.headers['User-Agent'] = "Mozilla/5.0 (X11; Linux x86_64) \
AppleWebKit/537.36 (KHTML, like Gecko) \
Ubuntu Chromium/43.0.2357.130 Chrome/43.0.2357.130 Safari/537.36"
s.headers['Connection'] = "keep-alive"
s.headers['Accept'] = "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8"
s.headers['Accept-Encoding'] = "gzip, deflate, sdch"
s.headers["Accept-Language"] = "en-US,en;q=0.8"

url = "http://ift.tt/1Jgk1sm"
resp = s.get(url)

this code is working perfectly fine in local but not on remote.

I can tell you exact site(in comments) if you need more information. Any help/ideas/trick/hint would be appreciated. I have enough experience in web crawling/scrapping but still I dont have any clue, I have already spent a whole day on this.

amazon web services

lundi 24 août 2015

Is it possible to block all IPs of AWS by some pattern or similar approach?

Aucun commentaire:

Enregistrer un commentaire