Python Web Scraping Cookbook
上QQ阅读APP看书,第一时间看更新

How it works

The URL is defined as a constant  const.ApodEclipseImage() in the const module:

def ApodEclipseImage():
return "https://apod.nasa.gov/apod/image/1709/BT5643s.jpg"

The constructor of the URLUtility class has the following implementation:

def __init__(self, url, readNow=True):
""" Construct the object, parse the URL, and download now if specified"""
self._url = url
self._response = None
self._parsed = urlparse(url)
if readNow:
self.read()

The constructor stores the URL, parses it, and downloads the file with the read() method.  The following is the code of the read() method:

def read(self):
self._response = urllib.request.urlopen(self._url)
self._data = self._response.read()

This function uses urlopen to get a response object, and then reads the stream and stores it as a property of the object.  That data can then be retrieved using the data property:

@property
def data(self):
self.ensure_response()
return self._data

The code then simply reports on the length of that data, with the value of 171014.