Understanding URLLIB in Python Web Programming
Here we are going to learn about the urllib (stands for url library) in Python. ‘urllib’ is a very important Python internet module, often used in Python networking / Internet Programming. Whenever you have to deal with HTTP protocols(associated with webpages, port No. 80), you would most certainly think of using urllib module as it deals with handling of connection, basic authentication, reading data, transfer of data, cookies, proxies etc.
In order to make use of this module you will have to import urllib in your python file as shown below:
import urllib
‘urllib’ offers some very important packages that help us work with urls. Whenever there is a need to open and read URLS we can import urllib.request as shown below:
import urllib.request
As the name suggests urllib.request allows you to request data from a web server(port 80 by default) while accessing a url with urllib.request you can provide the domain name or an ip address of the web page as a parameter, both cases will work fine. This module defines function and classes that help developers to access the HTTP and HTTPS web pages.
req=urllib.request.urlopen(‘https://www.google.com’)
The parameter provided in urlopen function can be a string or a request object. Along with this you can provide two more parameters. When your HTTP request is POST instead of GET you need to provide additional data. You also need to define a timeout parameter which is in seconds and is used for blocking operations such as connection attempts. When timeout parameter is not defined, the global timeout setting is used.
Now to read the information use the read() function as follows:
print(req.read())
So, this is how the final piece of code will look like:
import urllib.request req=urllib.request.urlopen(‘https://www.google.com’) print(req.read())
If you now execute your code, the file would open as follows:
The output file displays the source code of the web page. Sometimes, websites don’t like other programs visiting their sites and accessing their data. For, these purposes in other programming languages developers often modify the user-agent which is a variable of the header sent in. However, in case of Python you generally don’t have to face any such issues because by default Python notifies the website that your piece of code is making use of urllib and it also mentions the Python’s version that is being used.
GET YOUR FREE PYTHON EBOOK!