|
|
  |
CERT® Coordination Center
How To Remove Meta-characters From User-Supplied Data In CGI Scripts
Introduction
- Definition of the Problem
- Definition of "Sanitize"
- A Common But Inadvisable Approach
- A Recommended Approach
- Recommendation
- Additional Tips
Document Revision History
Please Note:
- The examples here are written in C and Perl, since these are two popular
languages that most readers will be familiar with. Developers who work in
other languages are encouraged to adapt these examples accordingly.
- The examples presented in this document are simplified examples to
illustrate the problem and the general solution. They are not intended
to be directly inserted into applications without modification. It is
the responsibility of the programmer and/or system administrator that the
general concepts presented here are adapted appropriately for each
application.
- Definition of the Problem
We have noticed several reports to us and to public mailing lists about
CGI scripts that allow an attacker to execute arbitrary commands on a
WWW server under the effective user-id of the server process.
In many of these cases, the author of the script has not sufficiently
sanitized user-supplied input.
- Definition of "Sanitize"
Consider an example where a CGI script accepts user-supplied data. In
practice, this data may come from any number of sources of user-supplied
data; but for this example, we will say that the data is taken from an
environment variable $QUERY_STRING. The manner in which
data was inserted into the variable is not important - the important
point here is that the programmer needs to gain control over the
contents of the data in $QUERY_STRING before further
processing can occur. The act of gaining this control is called
"sanitizing" the data.
- A Common But Inadvisable Approach
A script writer who is aware of the need to sanitize data may decide to
remove a number of well-known meta-characters from the script and
replace them with underscores. A common but inadvisable way to do this
is by removing particular characters.
For instance, in Perl:
#!/usr/local/bin/perl
$user_data = $ENV{'QUERY_STRING'}; # Get the data
print "$user_data\n";
$user_data =~ s/[\/ ;\[\]\<\>&\t]/_/g; # Remove bad characters. WRONG!
print "$user_data\n";
exit(0);
In C:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int
main(int argc, char *argv[], char **envp)
{
static char bad_chars[] = "/ ;[]<>&\t";
char * user_data; /* our pointer to the environment string */
char * cp; /* cursor into example string */
/* Get the data */
user_data = getenv("QUERY_STRING");
printf("%s\n", user_data);
/* Remove bad characters. WRONG! */
for (cp = user_data; *(cp += strcspn(cp, bad_chars)); /* */)
*cp = '_';
printf("%s\n", user_data);
exit(0);
}
In this method, the programmer determines which characters should NOT be
present in the user-supplied data and removes them. The problem with
this approach is that it requires the programmer to predict all
possible inputs that could possibly be misused. If the user uses
input not predicted by the programmer, then there is the possibility
that the script may be used in a manner not intended by the programmer.
- A Recommended Approach
A better approach is to define a list of acceptable characters and
replace any character that is NOT acceptable with an underscore. The
list of valid input values is typically a predictable, well-defined
set of manageable size. For example, consider the tcp_wrappers package
written by Wietse Venema. In the percent_x.c module, Wietse has
defined the following:
char *percent_x(...)
{
{...}
static char ok_chars[] = "1234567890!@%-_=+:,./\
abcdefghijklmnopqrstuvwxyz\
ABCDEFGHIJKLMNOPQRSTUVWXYZ";
{...}
for (cp = expansion; *(cp += strspn(cp, ok_chars)); /* */ )
*cp = '_';
{...}
The benefit of this approach is that the programmer is certain that
whatever string is returned, it contains only characters now under his
or her control.
This approach contrasts with the approach we discussed earlier. In the
earlier approach, which we do not recommend, the programmer must
ensure that he or she traps all characters that are unacceptable,
leaving no margin for error. In the recommended approach, the
programmer errs on the side of caution and only needs to ensure that
acceptable characters are identified; thus the programmer can be less
concerned about what characters an attacker may try in an attempt
to bypass security checks.
Building on this philosophy, the Perl program we presented above could
be thus sanitized to contain ONLY those characters allowed. For example:
#!/usr/local/bin/perl
$_ = $user_data = $ENV{'QUERY_STRING'}; # Get the data
print "$user_data\n";
$OK_CHARS='-a-zA-Z0-9_.@'; # A restrictive list, which
# should be modified to match
# an appropriate RFC, for example.
s/[^$OK_CHARS]/_/go;
$user_data = $_;
print "$user_data\n";
exit(0);
Likewise, the same updated example in C:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int
main(int argc, char *argv[], char **envp)
{
static char ok_chars[] = "abcdefghijklmnopqrstuvwxyz\
ABCDEFGHIJKLMNOPQRSTUVWXYZ\
1234567890_-.@";
char * user_data; /* our pointer to the environment string */
char * cp; /* cursor into example string */
user_data = getenv("QUERY_STRING");
printf("%s\n", user_data);
for (cp = user_data; *(cp += strspn(cp, ok_chars)); /* */)
*cp = '_';
printf("%s\n", user_data);
exit(0);
}
Some questions that we have received from sites indicate the mistaken
belief that this sanitization technique only needs to be applied to
user data that is passed to the environment in which the application
is executing. This is not strictly true.
For instance, many Perl scripts accept arbitrary filenames from users.
While the script should obviously check the filename to ensure that it
represents a file that the user should have access to, the first step in
any filename processing should be sanitization (as discussed above). The
reason for this is that metacharacters (such as ">" and "|") have
special meaning in file oriented functions in Perl.
Another example is Perl scripts which call the eval function, using
user-supplied arguments. A call to eval essentially represents the
execution of a mini-program within the Perl script being executed.
Programmers are encouraged to ensure that control is maintained over the
content of the user-supplied data with the intent of preventing the user
executing uncontrolled instructions within that environment.
- Recommendation
We strongly encourage you to review all CGI scripts available via your
web server to ensure that any user-supplied data is sanitized using
the approach described in Section 4, adapting the
example to meet whatever specification you are using (such as the
appropriate RFC).
- Additional Tips
The following comments appeared in
CERT Advisory
CA-1997-12.html "Vulnerability in webdist.cgi" and AUSCERT Advisory
AA-97.14, "SGI IRIX webdist.cgi Vulnerability" at
http://www.auscet.org/au.
We strongly encourage all sites should consider taking this opportunity
to examine their entire httpd configuration. In particular, all CGI
programs that are not required should be removed, and all those
remaining should be examined for possible security vulnerabilities.
It is also important to ensure that all child processes of httpd are
running as a non-privileged user. This is often a configurable option.
See the documentation for your httpd distribution for more details.
Resources relating to WWW security are available. The following may provide a useful starting point:
The World Wide Web Security FAQ:
http://www.w3.org/Security/Faq/
The following book contains useful information including sections on
secure programming techniques.
Practical Unix & Internet Security, Simson Garfinkel and
Gene Spafford, 2nd edition, O'Reilly and Associates, 1996.
Please note that the CERT/CC and
AUSCERT do not endorse the URL
that appears above. If you have any problem with the sites, please
contact the site administrator.
Wall, et al, discusses techniques and resources that can be used for
handling user-supplied data within Perl in this book:
Programming Perl, Larry Wall, Tom Christiansen and Randall
L. Schwartz, 2nd edition, O'Reilly and Associates, 1996.
Readers are referred to Chapter 6, pages 336 and 355-363.
Another resource that sites can consider is the CGI.pm
module. Details about this module are available from:
http://www.genome.wi.mit.edu/ftp/pub/software/WWW/cgi_docs.html
This module provides mechanisms for creating forms and other web-based
applications. Be aware, however, that it does not absolve the programmer
from the safe-coding responsibilities discussed above.
This document is available from:
http://www.pre-preview.cert.org/tech_tips/cgi_metacharacters.html
CERT/CC Contact Information
Email: cert@cert.org
Phone: +1 412-268-7090 (24-hour hotline)
Fax: +1 412-268-6989
Postal address:
-
CERT Coordination Center
Software Engineering Institute
Carnegie Mellon University
Pittsburgh PA 15213-3890
U.S.A.
CERT/CC personnel answer the hotline 08:00-17:00 EST(GMT-5) / EDT(GMT-4)
Monday through Friday; they are on call for emergencies during other
hours, on U.S. holidays, and on weekends.
Using encryption
We strongly urge you to encrypt sensitive information sent by
email. Our public PGP key is available from
If you prefer to use DES, please call the CERT hotline for more
information.
Getting security information
CERT publications and other security information are available from
our web site
* "CERT" and "CERT Coordination Center" are registered in the U.S. Patent and Trademark Office.
NO WARRANTY
Any material furnished by Carnegie Mellon University and the
Software Engineering Institute is furnished on an "as is"
basis. Carnegie Mellon University makes no warranties of any kind,
either expressed or implied as to any matter including, but not
limited to, warranty of fitness for a particular purpose or
merchantability, exclusivity or results obtained from use of the
material. Carnegie Mellon University does not make any warranty of any
kind with respect to freedom from patent, trademark, or copyright
infringement.
Conditions for use, disclaimers, and sponsorship information
Copyright 1998, 1999 Carnegie Mellon University.
|