Understanding CGI better
Posted On January 20, 2008 by Sneha Latha filed under Internet
In this article the author takes quick tour through the intricacies of CGI programming. The article also discusses different HTTP headers that are useful to a CGI programmer.
In the last article I gave a quick overview of CGI. You also have the power of a fully configured web server on your PC. In this article we will be handling a little bit of theory on CGI concepts.
Remember the analogy, which I spoke off in the last article of your writing a letter and sending through a postman. It is indeed a crude analogy, but CGI works on similar lines. Fundamentally the program gets executed because the web client (your browser) interacts with the server.
But how does the web browser contact server? It identifies the server through the URL. It happens the same way a post man (or the postal department) identifies your friend’s address from top of your letter. Similarly a web site identified through the URL. The browser contacts the server through the URL as in the case of http://storm.prohosting.com/webzary/test.cgi
A URL is made up of basically three fields. You probably are familiar with at least the first two parts of a URL, and all parts are discussed in detail in the following sections. A URL has this format:
protocol://<domain name>/<requested file>
The first field of a URL is the Protocol field. The Protocol field specifies the Internet protocol that will be used to transfer the data between the client and the server. There are many valid Internet protocol schemes: FTP, WAIS, Gopher, Telnet, HTTP, and more. For CGI programming you generally use HTTP. That's why the messages passed between the client and the server are called HTTP headers. HTTP is used to designate files, programs, and directories on a remote or local server.
The Domain Name
Immediately following the protocol is a :// and then the domain name. The domain name is the machine address of your server on the Internet. This name or address is between the :// and the next forward slash (/).
Following the domain name and before the trailing forward slash is an optional :port number. If no port number is given, the default port of 80 is assumed. The UNIX server handles different services by sending messages received at different port addresses to programs registered for those ports. The default port for the HTTP daemon is 80. Other programs, such as FTP and Telnet, have different default port addresses. These system default port addresses are set in a file named services under the system directory /etc on your UNIX server.
Then in a URL you can have one or more directories set in a path, which usually ends in a file. For a cgi program to work recommended file names are *.cgi or *.pl as the case may be. In case the server supports other languages that support cgi programming such as Python, then *.py is also accepted.
If the URL ends in a directory name then most servers try to
If there is an index.html file in the directory, that file is returned. index.html is the default home page name. Some of the servers search for other options such as default.asp, default.php etc.
Because PATH_INFO and QUERY_STRING data can be added to the URL after the target filename or program, the execution of the program or returning of the file does not occur until the entire URL is parsed. Each element of the URL is parsed until the target filename, program, or directory is found. If the final element of the URL is a file, the file is returned to the client. If the final element is a program, the program is executed and the data it generates is returned to the client.
Additional data can be appended to the URL by adding a question mark to the last element instead of a forward slash. This data then is called the QUERY_STRING and also is made available as an environment variable.
QUERY_STRING data also can be any valid text data. It begins after the PATH_INFO data, as shown in the following line of code, and is limited only by the size of the input buffer-usually, 1,024 bytes:
http://raja.com/info/query?pitambara
http://aa.f901.mail.yahoo.com/ym/login?.insha=434634
The above two URLs are examples of such cases
When a link to your CGI program is activated, the browser or client generates request headers. The server receives the request headers, which include the address to your CGI program on the server. The server translates the headers into environment variables and executes your CGI program. Your CGI program must generate the required response headers and HTML for the server to return to the browser.
Fundamentally HTTP request headers identify to the server the basic information the client is requesting and what type of response can be accepted by the client. The server also takes all the headers sent by the client and makes them available to your CGI program in a format called environment variables
HTTP headers are the language your browser and server use to talk to each other. Think of each of the HTTP headers as a single message. In the client and server sense, first there are a bunch of questions (which are the request headers) and then the answers to those questions (which are the response headers).
The Method Request Header
The client sends to the server several request headers defining for the server what the client wants, how the client can accept data, how to handle the incoming request, and any data that needs to be sent with the request.
The first request header for every client server communication is the method request header. This request header tells the server what other types of request headers to expect and how the server is expected to respond. Two types of method headers exist: The simple method request and the full method request.
The simple method request header is used only to support browsers that accept only HTTP/0.9 protocol. Because HTTP/0.9 is no longer the standard and the full method request header duplicates the definition of the simple method request header, an explanation of the simple method request header is not included here.
The simple method request header is made up of two parts separated by spaces: the request type, followed by the URL requested:
Request_Method URL \n
The most common request methods are Get, Post, and Head. The HTTP specification also allows for the Put, Delete, Link, and Unlink methods, along with an undefined extension method. Because you mainly will be dealing with the Get and Post methods, this chapter concentrates on those.
Each of the request headers identifies a URL to the server. The difference between Get and Post is the effect on how data is transferred. The Head request method affects how the requested URL is returned to the client.
The Full Method Request Header
The full method request header is the first request header sent with any client request. The full method request line is made up of three parts separated by spaces: the method type, the URL requested, and the HTTP version number.
· Request_Method can be any of the following method types: Get, Post, Head, Put, Delete, Link, or Unlink.
· URL is the address of the file, program, or directory you are trying to access.
· HTTP_Protocol_Version is the version number of the HTTP protocol that the client/browser can handle.
The Get HTTP Header
The Get method is the default method for following links and passing data on the Internet. After you click on a link, your browser sends a Get method request header. When you click the Submit button on a form, if the method is undefined in the Action field of the form, the Get method request header is used to call the CGI program that handles the form data.
When you click on a URL, it usually is of the form
http://www.somewhere.com/filename.html.
The first header of every HTTP request/response sequence is the method request header. And the first response header always will be a Status response header. The method response header defines what the server is expected to do with any additional data and how that data might affect the URL in the method response header. The Status response header from the server defines the success or failure status of the method response header.
It is beyond the scope of this article to cover all types of headers in detail. Hence I will advise to check this table given below.
