Go Web Scraping Quick Start Guide
上QQ阅读APP看书,第一时间看更新

HTTP headers

Below the request line are a series of key-value pairs that provide metadata describing how the request should be handled. These metadata fields are called HTTP headers. In our simple request, made earlier, we have a single HTTP header that defines the target host we are trying to reach. This information is not required by the HTTP protocol; however, it is almost always sent in order to provide clarification on who should be receiving the request.

If you were to inspect the HTTP request sent by your web browser, you would see many more HTTP headers. The following is an example sent by a Google Chrome browser to the same example.com website:

GET /index.html HTTP/1.1
Host: example.com
Connection: keep-alive
Cache-Control: max-age=0
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8
Accept-Encoding: gzip, deflate
Accept-Language: en-US,en;q=0.9
If-None-Match: "1541025663+gzip"
If-Modified-Since: Fri, 09 Aug 2013 23:54:35 GMT

The basics of the HTTP request are the same, however, your browser provides significantly more request headers, mostly related to how to handle cached HTML pages. We will discuss some of these headers in more detail in the following chapters.

The server reads the request and processes all of the headers to decide how to respond to your request. In the most basic scenario, the server will respond saying Your request is OK and deliver the contents of index.html.