Part 6. Files

Parsing the command line

Before we learn about files, let's take a little detour into how to parse the command line. So far in this series, we've intentionally avoided parsing the command line. When we've needed to read input, we've prompted the user. But command line utilities are much more useful if they can run unprompted, taking all of their parameters through the command line.

When you run a utility program from the command line, such as the TYPE command in DOS or the cat command in Linux, you typically specify one or more files that the program should display to the screen. For example, you might type this to display the contents of the readme.txt and copying.txt files:

TYPE readme.txt copying.txt

or, on Linux:

cat readme.txt copying.txt

Behind the scenes, the operating system executes the TYPE program and gives it an argument list of both file names.

In a C program, this argument list is passed to the main function using function arguments. The main function takes two arguments:

  1. The number of arguments on the command line
  2. The list of command line arguments, as a list of strings

Let's start with the second argument, the list of command line arguments. This is a list of strings. From the previous chapter, a list of strings is really a list of arrays, because a string is really an array of char values. So this list uses the variable type char ** to indicate a list of strings. The list counts from zero, like any array, and the "zeroth" item in the list is the name of the program itself. Any other command line arguments are array elements 1, 2, 3, and so on.

The first argument to main is the number of command line arguments, and is always an int value that starts at one. If you did not give any options on your command line, the number of arguments is 1.

For example, for a command line like this:

TYPE readme.txt copying.txt

…the argument count would be 3, and the char ** array would be:

  1. TYPE
  2. readme.txt
  3. copying.txt

Parsing the command line options is a matter of reading each element in the array:

#include <stdio.h>

int
main(int arg_count, char **arg_strings)
{
   int i;

   for (i = 0; i < arg_count; i = i + 1) {
      /* from what we learned earlier, "char **" means a list of lists,
         or an array of arrays .. and a string is an array of chars. So
         "char **arg_strings" is an array of strings */

      /* that means we can just refer to each element in the arg_strings
         array as a string .. because it is! */
      printf("%d : %s\n", i, arg_strings[i]);
   }

   return 0;
}

You can name the main arguments whatever you like. Most programmers have adopted the standard naming convention of argc for the argument count, and argv for the argument list (the argument vector). Throughout the rest of this series, I'll use argc and argv.

Reading and writing files

For this series, I'll keep to simple topics in reading and writing files. With this assumption, we will assume that reading and writing a file will be done sequentially. Everything will be done in order. This suits most programming needs.

To read and write a file, you first need a way to track your position in the file. This is done using a file pointer. Just as pointers for arrays were indicators of locations in the computer's memory, the file pointer maintains a simple abstraction that points to the location in a file. Declare a file pointer using the FILE * variable type. (FILE is not really a C variable type, but you can safely treat it as one. Behind the scenes, C created this variable type using a type definition, which we won't go into in this series.)

To open a file for reading or writing, using the fopen function. This takes two arguments: the name of the file, and a string argument that indicates if the file should be opened for reading ("r") or writing ("w"). If fopen cannot open the file—for example, you cannot read a file that does not exist—the fopen function will return NULL. Otherwise, it returns a pointer to the file.

When you are done with the file, you should close it with the fclose function.

Here is a short demonstration program that opens a file, then immediately closes it:

#include <stdio.h>

int
main()
{
   char filename[] = "test.txt";
   FILE *fileptr;

   /* open the file */

   fileptr = fopen(filename, "r");

   if (fileptr == NULL) {
      printf("oops! could not open file: %s\n", filename);
      return 1;
   }

   printf("we have opened the file: %s\n", filename);

   /* close the file */

   fclose(fileptr);
   puts("the file is closed.");

   return 0;
}

Functions to read and write files

Once you have a pointer to a file, you can read data from it (or write data to it) using different f functions. Some of these are similar to the functions we've already learned. For example, the printf function prints formatted data to the user. To write data to a file instead of the user, use the fprintf function. The first argument is a file pointer, but otherwise the fprintf function is the same as the regular printf function.

Actually, the printf function is really a shorthand for fprintf but using a special file pointer. C keeps a few file pointers for you, ready to be used at any time:

stdin
The standard input is usually the keyboard. But note that if you use the command line to redirect (sometimes called a “pipe” such as TYPE | MYPROG) another program's output into this program, stdin will be whatever the previous command is printing out.
stdout
The standard output is usually the screen. However, if you use the command line to redirect this program's output to another program (for example, MYPROG | MORE) the stdout will go to that other program.
stderr
The standard error makes a handy way to let the user know when something has gone wrong. By default, redirecting the standard output of a program does not redirect the standard error. So if you print a warning message using stderr, the warning will get displayed to the user instead of sent to another program.

A few other useful programs that help you read and write files include:

fprintf(FILE *fileptr, "%d, %d, %d, …", a, b, c, …)
Prints a formatted string to a file, using a format string with % placeholders to define values.
The printf function is basically the same as fprintf(stdout, "%d, %d, %d, …", a, b, c, …)
fscanf(FILE *fileptr, "%d …", &var …)
Reads a value from a file, using a format to define what kind of value to read.
The scanf function is very similar, and is equivalent to fscanf(stdin, "%d …", &var …)
fputc(int ch, FILE *fileptr)
Writes a single character to a file.
The putchar(int ch) function is basically the same as fputc(int ch, stdout)
fgetc(FILE *fileptr)
Reads a single character from a file.
The getchar() function is essentially an alias for fgetc(stdin)
fputs(char *string, FILE *fileptr)
Writes a string to a file.
The puts(char *string) function should be familiar by now, and is similar to fputs(char *string, stdout)
fgets(char *string, int n, FILE *fileptr)
Reads a string (up to n characters long) from a file.
The similarly-named gets(char *str) to read a string from the user is very similar to the fgets function without the length limit. Avoid using gets when you need to read strings. The fgets function is much safer because you can limit how much the function reads, so it doesn't go over the amount of data that your string can hold.

Sample programs to read and write files

We now have all the programming knowledge to write some sample programs that read and write files!

The FreeDOS TYPE command displays one or more files that you specify on the command line. For example: TYPE readme.txt will print the contents of the readme.txt file. We can write our own TYPE program! Let's make the program easier to write by only opening and closing files in the main function, and using a function showfile to print the contents of the file pointer that we give it.

#include <stdio.h>

void
showfile(FILE * from)
{
   int ch;

   /* copy the contents of the file */

   while ((ch = fgetc(from)) != EOF) {
      putchar(ch);
   }
}

int
main(int argc, char **argv)
{
   int i;
   FILE *fileptr;

   for (i = 1; i < argc; i = i + 1) {
      fileptr = fopen(argv[i], "r");
      if (fileptr == NULL) {
         fprintf(stderr, "cannot open file: %s\n", argv[i]);
      }
      else {
         showfile(fileptr);
         fclose(fileptr);
      }
   }

   /* but what if we didn't have any arguments? then argc is 1, and the
      "for" loop didn't do anything. so check here */

   if (argc == 1) {
      showfile(stdin);
   }

   return 0;
}

But good programmers know it's best to put in a little extra effort to solve a more general problem, so you can make other programs simpler to write. In the above program, we've assumed that TYPE will always write to the standard output. Let's modify the showfile function to become a copyfile function that reads from one file pointer and writes to another file pointer. If the output is always stdout then the behavior is still the same as the original TYPE, except that we will have written a more useful function that we can use elsewhere. We can also make minor updates to copyfile to write to a file pointer instead of standard output, and we can change how we call copyfile to include the stdout file pointer.

#include <stdio.h>

void
copyfile(FILE * from, FILE * to)
{
   int ch;

   /* copy the contents of the file */

   while ((ch = fgetc(from)) != EOF) {
      fputc(ch, to);
   }
}

int
main(int argc, char **argv)
{
   int i;
   FILE *fileptr;

   for (i = 1; i < argc; i = i + 1) {
      fileptr = fopen(argv[i], "r");
      if (fileptr == NULL) {
         fprintf(stderr, "cannot open file: %s\n", argv[i]);
      }
      else {
         copyfile(fileptr, stdout);
         fclose(fileptr);
      }
   }

   /* but what if we didn't have any arguments? then argc is 1, and the
      "for" loop didn't do anything. so check here */

   if (argc == 1) {
      copyfile(stdin, stdout);
   }

   return 0;
}

The FreeDOS COPY command copies one file to another. This program becomes much easier to write if you notice that copying a file is basically the same as the TYPE command, but writing to a file instead of the standard output. We can make minor adjustments to the TYPE program to become a new COPY program. The main function opens the source file for reading, and the destination file for writing, and passes those file pointers to copyfile.

#include <stdio.h>

void
copyfile(FILE * from, FILE * to)
{
   int ch;

   /* get the contents of the file, one character at a time using
      fgetc() */

   while ((ch = fgetc(from)) != EOF) {
      fputc(ch, to);
   }
}

int
main(int argc, char **argv)
{
   FILE *src;
   FILE *dest;

   /* check usage */

   if (argc != 3) {
      fprintf(stderr, "Wrong number of arguments\n");
      fprintf(stderr, "Usage: %s {srcfile} {destfile}\n", argv[0]);
      return 1;
   }

   /* open the files */

   /* src is [1] and dest is [2] */

   src = fopen(argv[1], "r");
   if (src == NULL) {
      fprintf(stderr, "Cannot open file for reading: %s\n", argv[1]);
      return 2;
   }

   dest = fopen(argv[2], "w");
   if (dest == NULL) {
      fprintf(stderr, "Cannot open file for writing: %s\n", argv[2]);

      /* close the open src file */
      fclose(src);

      return 3;
   }

   /* files are open, copy from src to dest */

   copyfile(src, dest);

   /* close files, then quit */

   fclose(src);
   fclose(dest);

   return 0;
}

The FreeDOS FIND utility will display the lines from a series of text files that contains a particular string. Let's assume the string we want to find is the first command line argument, followed by any files that we want to examine.

#include <stdio.h>
#include <string.h>                    /* for strstr */

int
findstr_file(char *string, FILE * fileptr)
{
   char linefromfile[1000];
   char *foundstr;
   int numfound = 0;

   while ((fgets(linefromfile, 1000, fileptr)) != NULL) {
      foundstr = strstr(linefromfile, string);

      if (foundstr != NULL) {
         /* puts(linefromfile);  /* this prints an extra newline at end */
         printf("%s", linefromfile);
         numfound = numfound + 1;
      }
   }

   return numfound;
}

int
main(int argc, char **argv)
{
   int i;
   FILE *ptr;
   char *str;

   /* check usage */

   if (argc < 2) {
      fprintf(stderr, "Wrong number of arguments\n");
      fprintf(stderr, "Usage: %s {string} {files...}\n", argv[0]);
      return 1;
   }

   /* save the string */

   str = argv[1];

   /* open each file and find the string */

   for (i = 2; i < argc; i = i + 1) {
      ptr = fopen(argv[i], "r");

      if (ptr == NULL) {
         fprintf(stderr, "Cannot read file: %s\n", argv[i]);
      }
      else {
         findstr_file(str, ptr);
         fclose(ptr);
      }
   }

   /* what if no files were given? read from stdin */

   if (argc == 2) {
      /* argv[0] = programname and argv[1] = string */
      findstr_file(str, stdin);
   }

   /* done */

   return 0;
}

Read more data at once with buffers

The TYPE, COPY, and FIND programs are easy enough to write by reading and writing a character at a time, but on real systems you will see some performance slowdowns with this method. This is especially noticeable on DOS systems with floppy disk drives. Whenever you read single characters, the operating system needs to access the disk, and read a single character. That sounds easy enough, but DOS does a lot of work behind the scenes, and this method becomes really slow when you act on one character at a time.

Instead, it is often more efficient to read a bunch of data at once, and act on the data you're read so far. You can do this using a mechanism called buffers. The idea is you read data into a buffer, so the operating system accesses the disk fewer times. You read data into a buffer using the fread function, and write data from a buffer using the fwrite function.

fread(void *buffer, size_t size, size_t n, FILE *fileptr)
Reads n items of data that is each size bytes long. This reads from a file and saves the data into a buffer.
fwrite(void *buffer, size_t size, size_t n, FILE *fileptr)
Writes n items of data that is each size bytes long. This writes data from a buffer to a file.

The size_t is not really a variable type, but you can treat it as one. This is the same as a very large integer variable.

Let's update the TYPE program to read data into a larger buffer, rather than reading one character at a time. We can use a #define constant to set the buffer size, so we don't have to remember it is 200 every time we use it. Using constants like this makes it easier to update the program later; with a #define constant, you only need to change the code once.

#include <stdio.h>

/* define size of the buffer */
#define BUFFER_SIZE 200

void
copyfile(FILE * from, FILE * to)
{
   /* char buffer[200]; */
   char buffer[BUFFER_SIZE];
   int buffer_length;

   /* copy the contents of the file */

   while (!feof(from)) {
      buffer_length = fread(buffer, sizeof(char), BUFFER_SIZE, from);
      fwrite(buffer, sizeof(char), buffer_length, to);
   }
}

int
main(int argc, char **argv)
{
   int i;
   FILE *fileptr;

   for (i = 1; i < argc; i = i + 1) {
      fileptr = fopen(argv[i], "r");
      if (fileptr == NULL) {
         fprintf(stderr, "cannot open file: %s\n", argv[i]);
      }
      else {
         copyfile(fileptr, stdout);
         fclose(fileptr);
      }
   }

   /* but what if we didn't have any arguments? then argc is 1, and the
      "for" loop didn't do anything. so check here */

   if (argc == 1) {
      copyfile(stdin, stdout);
   }

   return 0;
}

Note that reading a bunch of data into a buffer requires an extra step in the copyfile function. The first few times you read data from the file, you will probably fill the buffer entirely with data. But eventually, you'll reach the end of the file. When that happens, the buffer won't be completely filled. That's why you need to track how much data was read with each call to fread. You can save that into a variable.

When reading one character at a time, the fgetc function returned the value EOF when it reached the end of the file. Not so with fread. With fread, you need to do a separate test to see if you've reached the end of the file. The feof function will tell you if the file has reached the end; this will return a true value when you reach the end of the file.

Standard C library of functions

Here's a quick reference guide for the functions you can use with files:

Opening and closing files

fopen(char *filename, char *mode)
Opens a file and returns a file pointer. Use "r" to open a file for reading, starting at the beginning of the file. Use "w" to write to a file (an existing file will be immediately overwritten). And use "a" to append to the end of an existing file.
You can include a plus sign on any of those modes to open the file for both reading and writing at the same time, but we won't cover that in this series. For example, "r+" or "w+" or "a+".
fclose(FILE *fileptr)
Close the file.

Reading from standard input, writing to standard output

printf("%d, %d, %d, …", a, b, c, …)
Prints a formatted string, using a format string with % placeholders to define values
scanf("%d …", &var …)
Scans (reads) a value from the user, using a format to define what kind of value to read
putchar(int ch)
Put a single character back to the user, usually to the screen unless the user has redirected the output of the program to a file (at the command line, when they run the program). The character is stored in an int value but you can think of it as just a single character.
getchar()
Get a single character from the user. This is a better way to read a character than scanf. This actually returns an int value, but you can treat it like a char character value.
puts("string")
Prints a string to the user
gets(*string)
Reads a string. Never use this function. Use the fgets function instead.

Reading from files, writing to files

fprintf(FILE *fileptr, "%d, %d, %d, …", a, b, c, …)
Prints a formatted string to a file, using a format string with % placeholders to define values.
fscanf(FILE *fileptr, "%d …", &var …)
Reads a value from a file, using a format to define what kind of value to read.
fputc(int ch, FILE *fileptr)
Writes a single character to a file.
fgetc(FILE *fileptr)
Reads a single character from a file.
fputs(char *string, FILE *fileptr)
Writes a string to a file.
fgets(char *string, int n, FILE *fileptr)
Reads a string (up to n characters long) from a file.

PRACTICE

Now that you've learned about reading and writing files, let's practice with these sample programs:

Practice program 1.

Write a program that counts the lines in the files listed on the command line. (This is very similar to the Linux wc -l command.)

Practice program 2.

Write a program to read the files listed on the command line, and print their contents to standard output. After every 24 lines, prompt the user with "<Press Enter for more>" (This is basically the same as the FreeDOS MORE command.)

The file might contain lines longer than 80 columns, which will wrap on a standard 80×25 display. That will throw off your line count, but we can ignore that in this simple example. You also don't need to worry about tabs in the input. However, a proper MORE program should account for these.

Practice program 3.

Update the above MORE program by reading the files into buffers at a time.

Practice program 4.

Write a program that takes two single-letter arguments. Write to standard output what you read from standard input—except when you see the first letter, replace it with the second letter instead. (This is very similar to the Linux tr command.)

Need help? Check out the sample solutions.