Python Proxy Authentication
Some times Our crawler programs needs to satify Proxy Authentication, Generelly we get this situation when we work under Proxy Server. So we can authenticate the Proxy Server through Program.
In Python we can handle the proxy with urllib2 package, so if we look deep into the program urllib2.ProxyHandler and urllib2.HTTPBasicAuthHandler functions helps for authentication.
Code:
import urllib2
from lxml import html
proxy_handler = urllib2.ProxyHandler({’http’: ‘http://www.ragalahari.com/wallpapers.asp’})
proxy_auth_handler = urllib2.HTTPBasicAuthHandler()
Now we need to append our User Name,Password and Proxy Name to the realm.
Code:
proxy_auth_handler.add_password(’realm’, ‘Proxy Name’, ‘User Name’, ‘Password’)
opener = urllib2.build_opener(proxy_handler, proxy_auth_handler)
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
from lxml import htmlproxy_handler = urllib2.ProxyHandler({’http’: ‘http://www.ragalahari.com/wallpapers.asp’})
proxy_auth_handler = urllib2.HTTPBasicAuthHandler()
proxy_auth_handler.add_
opener = urllib2.build_opener(proxy_handler, proxy_auth_handler)
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
sock = opener.open(’http://www.ragalahari.com/wallpapers.asp‘)
print “Debug Got the Connection”
htmlPage = sock.read()
print “Debug Got the page”
sock.close()
root = html.fromstring(htmlPage)
print “Debug going to collect the wallpaper division”
wallPaperEle = root.xpath(’body/div/table/tr/td’)[1]
wallThumbList = wallPaperEle.xpath(’table/tr’)
print “Debug0″, len(wallThumbList)
for tr in wallThumbList:
targets = tr.xpath(’td/a’)
#print “length of Targets: “,len(targets)
for target in targets:
#print “Debug2″
try:
print “Link: “,target.get(’href’)
except AttributeError:
print “Error”
pass
