urllib3

urllib3#

The urllib3 is a package for implementing network communication.

Check more in the official website.

import urllib3

ParseURL#

The ulrlib3.util.url.parse_url takes the urls as a string and returns a special urllib3.util.url.URL instance containing the elements of the URL as a separate attributes.

The following cell shows the usage of the url.

from urllib3.util.url import parse_url
parse_url("http://google.com/the/path/of/the/url?qury=param")

Url(scheme='http', auth=None, host='google.com', port=None, path='/the/path/of/the/url', query='qury=param', fragment=None)

Retries#

The urllib3.util.Retry implements object that allows to specify the retry policy.

In general, you need to pass this object to the urllib3 interfaces, that could fail. If they will try to complete the operation with retries according to the rules specified in the urllib3.util.Retry.

The following cell shows how the urllib3.util.Retry generally works. Each call to increment method decreases total.

retry = urllib3.util.Retry(total=1)
retry

Retry(total=1, connect=None, read=None, redirect=None, status=None)

The following cell shows output of the increment method:

retry = retry.increment()
retry

Retry(total=0, connect=None, read=None, redirect=None, status=None)

The total attribute decreased. The next call raises a special MaxRetries exception.

try:
    retry.increment()
except Exception as e:
    print(e)

None: Max retries exceeded with url: None (Caused by ResponseError('too many error responses'))

Is retry#

The is_retry method of the urllib3 allows to check wheather the output requires retry.

The following cell builds Retry instance that requires 500 status code to retry the requiest.

retry = urllib3.util.Retry(total=1, status_forcelist=[500])

The is_retry call with status_code=200 returns False which means that this status code doesn’t require a retry.

retry.is_retry("GET", status_code=200)

False

The following cell shows the alternative is_retry call, indicating that retry is supposed to be repeated.

retry.is_retry("GET", status_code=500)

True

Sleep#

There are methods for organising the time interval between retries:

The sleep method stops the flow according to the retry rules.
The get_backoff_time method returns the time for which flow have to be stopped.

The following cell creates the retry object and invokes sleep.

retry = urllib3.util.Retry(total=2, backoff_factor=3, backoff_max=3)
%time retry.sleep()

CPU times: user 10 μs, sys: 1 μs, total: 11 μs
Wall time: 12.6 μs

The programme wasn’t stopped because it doesn’t make sence to create an interval when there is no increment calls (no requests was made according to urllib3 design).

The output of get_backoff_time method have the corresponding output:

retry.get_backoff_time()

There is also no delay after the first increment - there is no delay between initial request and first retry.

retry = retry.increment()
%time retry.sleep()

CPU times: user 11 μs, sys: 0 ns, total: 11 μs
Wall time: 14.8 μs

However, there is some delay between retries.

retry = retry.increment()
%time retry.sleep()

CPU times: user 1.31 ms, sys: 32 μs, total: 1.34 ms
Wall time: 3 s

retry.get_backoff_time()

3.0

Connection#

There is a special object that implements connection urllib3.connection.HTTPConnection.

The following cell shows the usage flow of the urllib3.connection.HTTPConnection.

conn = urllib3.connection.HTTPConnection("www.google.com", port=80)
conn.connect()
conn.request("GET", "/")
response = conn.getresponse()
response.status