HTTP Message Format
The HTTP specifications [RFC 1945 ; RFC 2616] include the definitions of the HTTP message formats. There are two types of HTTP messages, request messages and response messages, both of which are discussed below.
HTTP Request Message
Below we provide a typical HTTP request message :
GET /somedir/page.html HTTP/1.1
We can learn a lot by taking a close look at this simple request message. First of all, we see that the message is written in ordinary ADCII text, so that your ordinary computer-literate human being can read it. Secondly, we see that the message consists of five lines, each followed by a carriage return and a line feed. The last line is followed by an additional carriage return and line feed. Although this particular request message has five line, a request message can have many more lines or as few as one line. The first line of an HTTP request message is called the request line ; the subsequent lines are called the header lines. The request line has three fields: the method field, the URL field, and the HTTP version field. The method field can take on several different values, including GET, POST, HEAD, PUT, and DELETE. The great majority of HTTP request messages use the GET method. The GET method is used from the browser requests an object, with the requested object identified in the URL field. In this example, the browser is requesting the object /somedir/page.html . The version is self-explanatory; in this example, the browser implements version HTTP/1.1.
Now let’s look at the header lines in the example. The header line Host: www.someschool.edu specifies the host on which the object resides. You might think that this header line is unnecessary, as there is already a TCP connection in place to the host. But, as we’ll see very soon, the information provided by the host header line is required by Web proxy caches.
By including the Connection: close header line, the browser is telling the server that it doesn’t want to bother with persistent connections; it wants the server to close the connection after sending the requested object.
The User-agent: header line specifies the user agent, that is, the browser type that is making the request to the server. Here the user agent is Mozilla/5.0, a Firefox browser. This header line is useful because the server can actually send different versions of the same object to different types of user agents. (Each version is addressed by the same URL). Finally, the Accept-langauge: header indicates that the user prefers to receive a French version of the object, if such an object exists on the server; otherwise, the server should send its default version. The Accept-language: header is just one of many content negotiation headers available in HTTP.
Having looked at an example, let’s now look at the general format of a request message, as shown in figure 2.8. We see that the general format closely follows our earlier example. You may have noticed, however, that after the header lines (and the additional carriage return and line feed) there is an “entity body.” The entity body is empty with the GET method, but is used with the POST method. An HTTP client often uses the POST method when the user fills out a form – for example, when a user provides search words to a search engine. With a POST message, the user is still requesting a web page from the server, but the specific contents of the web page depend on what the user entered into the form fields. If the value of the method field is POST, then the entity body contains what the user entered into the form fields.
We would be remiss if we didn’t mention that a request generated with a form does not necessarily use the POST method. Instead, HTML forms often use the GET method and include the inputted data (in the form fields) in the requested URL. For example, if a form uses the GET method, has two fields, and the inputs to the two fields are monkeys and bananas, then the URL will have the structure www.soemsite.com/animalsearch?monekeys&bananas. In your day-to-day web surfing, you have probably noticed extended URLs of this sort.
The HEAD method is similar to the GET method. When a server receives a request with the HEAD method, it responds with an HTTP message but it leaves out the requested object. Application developers often use the HEAD method for debugging. The PUT method is often used in conjunction with web publishing tools. It allows a user to upload an object to a specific path (directory) on a specific web server. The PUT method is also used by applications that need to upload objects to web servers. The DELETE method allows a user, or an application, to delete an object on a web server.
HTTP Response Message
Below we provide a typical HTTP response message. This response message cold be the response to the example request message just discussed.
HTTP/1.1 200 OK
Date: Tue, 09 Aug 2011 15:44:04 GMT
Server: Apache/2.2.3 (CentOS)
Last-Modified: Tue, 09 Aug 2011 15:11:03 GMT
(data data data data ………………..)
Let’s take a careful look at this response message. It has three sections: an initial status line, six header lines, and then the entity body. The entity body is the meat of the message – it contains the requested object itself (represented by data data data ………..). The status line has three fields: the protocol version field, a status code, and a corresponding status message. In this example, the status line indicated that the server is using HTTP/1.1 and that everything is OK (that is, the server has found, and is sending, the requested object).
Now let’s look at the header lines. The server uses the Connection: close header lint to tell the client that it is going to close the TCP connection after sending the message. The Date: header line indicates the time and date when the HTTP response was created and sent by the server. Not that this is not the time when the object was created or last modified; it is the time when the server retrieves the object from its file system, inserts the object into the response message, and send the response message. The Server: header line indicates that the message was generated by an Apache Web Server; it is analogous to the User-agent: header line in the HTTP request message. The Last-Modified : header line indicates the time and date when the object was created or last modified. The Last-Modified: header, which we will soon cover in more detail, is critical for object caching, both in the local client and in network cache servers (also known as proxy servers). The Content-Length: header line indicates the number of bytes in the object header sent. The Content-Type: header line indicates that the object in the entire body is HTML text. (The object type is officially indicated by the Content-Type: header and not by the file extension.)
Having looked at an example, let’s now examine the general format of a response message, which is shown in figure 2.9. This general format of the response message matches the previous example of a response message. Let’s say a few additional words about status codes and their phrases. The status code and associated phrase indicate the result of the request. Some common status codes and associated phrases include:
200 OK: Request succeeded and the information is returned in the response
301 Moved Permanently: Request object has been permanently moved; the new URL is specified in Location: header of the response message. The client software will automatically retrieve the new URL.
400 Bad Request: This is a generic error code indicating that the request could not be understood by the server
404 Not Found: The requested document does not exist on this server
505 HTTP Version Not Supported: The requested HTTP protocol version is not supported by the server
How would you like to see a real HTTP response message? This is highly recommended and very easy to do! First Telnet into your favourite Web server. Then type in a one-line request message for some object that is housed on the server. For example, if you have access to command prompt, type:
telent cis.poly.edu 80
GET /~ross/ HTTP/1.1
(Press the carriage return twice after typing the last line). This opens a TCP connection to port 80 of the host cis.poly.edu and then sends the HTTP request message. You should see a response message that includes the base HTML file of Professor Ross’s homepage. If you’d rather just see the HTTP message lines and not receive the object itself, replace GET with HEAD. Finally, replace /~ross/ with /~banana/ and see what kind of response message you get.
In this section we discusses a number of header lines that can be used within HTTP request and response messages. The HTTP specification defines many, many more header lines that can be inserted by browsers, Web servers, and network cache servers. We have covered only a small number of the totality of header lines. We’ll cover a few more below and another small number when we discuss network Web caching.
Hoes does a browser decide which header lines to include in a request message? How does a web server decide which header lines to include in the response message? A browser will generate header lines as a function of the browser type and version (for example, an HTTP/1.0 browser will not generate any 1.1 header lines), the user configuration of the browser (for example, preferred language), and whether the browser currently has a cached, but possibly out-of-date, version of the object. Web servers behave similarly: There are different products, versions, and configurations, all of which influence which header lines are included in response messages.