Understanding URLLIB in Python Web Programming

Here we are going to learn about the urllib (stands for url library) in Python. ‘urllib’ is a very important Python internet module, often used in Python networking / Internet Programming. Whenever you have to deal with HTTP protocols(associated with webpages, port No. 80), you would most certainly think of using urllib module as it deals with handling of connection, basic authentication, reading data, transfer of data, cookies, proxies etc.

In order to make use of this module you will have to import urllib in your python file as shown below:

import urllib

‘urllib’ offers some very important packages that help us work with urls. Whenever there is a need to open and read URLS we can import urllib.request as shown below:

import urllib.request

As the name suggests urllib.request allows you to request data from a web server(port 80 by default) while accessing a url with urllib.request you can provide the domain name or an ip address of the web page as a parameter, both cases will work fine. This module defines function and classes that help developers to access the HTTP and HTTPS web pages.

req=urllib.request.urlopen(‘https://www.google.com’)

The parameter provided in urlopen function can be a string or a request object. Along with this you can provide two more parameters. When your HTTP request is POST instead of GET you need to provide additional data. You also need to define a timeout parameter which is in seconds and is used for blocking operations such as connection attempts. When timeout parameter is not defined, the global timeout setting is used.

Now to read the information use the read() function as follows:

print(req.read())

So, this is how the final piece of code will look like:

import urllib.request

req=urllib.request.urlopen(‘https://www.google.com’)
print(req.read())

If you now execute your code, the file would open as follows:

urllib
(output file)
The output file displays the source code of the web page. Sometimes, websites don’t like other programs visiting their sites and accessing their data. For, these purposes in other programming languages developers often modify the user-agent which is a variable of the header sent in. However, in case of Python you generally don’t have to face any such issues because by default Python notifies the website that your piece of code is making use of urllib and it also mentions the Python’s version that is being used.

 

The piece of code mentioned above expresses the simplest way of using the urllib module to access the internet. We have just imported the urllib.request module, opened the url and assigned the value to a variable. We then invoked the read() command. The ouput file looks messy but do not panic. As you advance in Python you will learn how to extract important information from such files. For this purpose we have to make use of regular expressions. As you know HTML, JavaScript and CSS are used for designing a web page which is why the content of the file looks confusing however meaningful data can be easily retrieved from this file.

GET YOUR FREE PYTHON EBOOK!

(Visited 1,446 times, 1 visits today)