Files in C – Binary & Text Files

Binary Files

Binary files are very similar to arrays of structures, except the structures are in a disk file rather than in an array in memory. Because the structures in a binary file are on disk, you can create very large collections of them (limited only by your available disk space). They are also permanent and always available. The only disadvantage is the slowness that comes from disk access time.

Binary files have two features that distinguish them from text files:

You can jump instantly to any structure in the file, which provides random access as in an array.
You can change the contents of a structure anywhere in the file at any time.

Binary files also usually have faster read and write times than text files, because a binary image of the record is stored directly from memory to disk (or vice versa). In a text file, everything has to be converted back and forth to text, and this takes time.

C supports the file-of-structures concept very cleanly. Once you open the file you can read a structure, write a structure, or seek to any structure in the file. This file concept supports the concept of a file pointer. When the file is opened, the pointer points to record 0 (the first record in the file). Any read operationreads the currently pointed-to structure and moves the pointer down one structure. Any write operationwrites to the currently pointed-to structure and moves the pointer down one structure. Seek moves the pointer to the requested record.

Keep in mind that C thinks of everything in the disk file as blocks of bytes read from disk into memory or read from memory onto disk. C uses a file pointer, but it can point to any byte location in the file. You therefore have to keep track of things.

The following program illustrates these concepts:

#include <stdio.h>

/* random record description - could be anything */
struct rec
{
    int x,y,z;
};

/* writes and then reads 10 arbitrary records
   from the file "junk". */
int main()
{
    int i,j;
    FILE *f;
    struct rec r;

    /* create the file of 10 records */
    f=fopen("junk","w");
    if (!f)
        return 1;
    for (i=1;i<=10; i++)
    {
        r.x=i;
        fwrite(&r,sizeof(struct rec),1,f);
    }
    fclose(f);

    /* read the 10 records */
    f=fopen("junk","r");
    if (!f)
        return 1;
    for (i=1;i<=10; i++)
    {
        fread(&r,sizeof(struct rec),1,f);
        printf("%dn",r.x);
    }
    fclose(f);
    printf("n");

    /* use fseek to read the 10 records
       in reverse order */
    f=fopen("junk","r");
    if (!f)
        return 1;
    for (i=9; i>=0; i--)
    {
        fseek(f,sizeof(struct rec)*i,SEEK_SET);
        fread(&r,sizeof(struct rec),1,f);
        printf("%dn",r.x);
    }
    fclose(f);
    printf("n");

    /* use fseek to read every other record */
    f=fopen("junk","r");
    if (!f)
        return 1;
    fseek(f,0,SEEK_SET);
    for (i=0;i<5; i++)
    {
        fread(&r,sizeof(struct rec),1,f);
        printf("%dn",r.x);
        fseek(f,sizeof(struct rec),SEEK_CUR);
    }
    fclose(f);
    printf("n");

    /* use fseek to read 4th record,
       change it, and write it back */
    f=fopen("junk","r+");
    if (!f)
        return 1;
    fseek(f,sizeof(struct rec)*3,SEEK_SET);
    fread(&r,sizeof(struct rec),1,f);
    r.x=100;
    fseek(f,sizeof(struct rec)*3,SEEK_SET);
    fwrite(&r,sizeof(struct rec),1,f);
    fclose(f);
    printf("n");

    /* read the 10 records to insure
       4th record was changed */
    f=fopen("junk","r");
    if (!f)
        return 1;
    for (i=1;i<=10; i++)
    {
        fread(&r,sizeof(struct rec),1,f);
        printf("%dn",r.x);
    }
    fclose(f);
    return 0;
}

In this program, a structure description rec has been used, but you can use any structure description you want. You can see that fopen and fclose work exactly as they did for text files.

The new functions here are fread, fwrite and fseek. The fread function takes four parameters:

A memory address
The number of bytes to read per block
The number of blocks to read
The file variable

Thus, the line fread(&r,sizeof(struct rec),1,f); says to read 12 bytes (the size of rec) from the file f (from the current location of the file pointer) into memory address &r. One block of 12 bytes is requested. It would be just as easy to read 100 blocks from disk into an array in memory by changing 1 to 100.

The fwrite function works the same way, but moves the block of bytes from memory to the file. The fseekfunction moves the file pointer to a byte in the file. Generally, you move the pointer in sizeof(struct rec)increments to keep the pointer at record boundaries. You can use three options when seeking:

SEEK_SET
SEEK_CUR
SEEK_END

SEEK_SET moves the pointer x bytes down from the beginning of the file (from byte 0 in the file).SEEK_CUR moves the pointer x bytes down from the current pointer position. SEEK_END moves the pointer from the end of the file (so you must use negative offsets with this option).

Several different options appear in the code above. In particular, note the section where the file is opened with r+ mode. This opens the file for reading and writing, which allows records to be changed. The code seeks to a record, reads it, and changes a field; it then seeks back because the read displaced the pointer, and writes the change back.

Text Files

Text files in C are straightforward and easy to understand. All text file functions and types in C come from the stdio library.

When you need text I/O in a C program, and you need only one source for input information and one sink for output information, you can rely on stdin (standard in) and stdout (standard out). You can then use input and output redirection at the command line to move different information streams through the program. There are six different I/O commands in <stdio.h> that you can use with stdin and stdout:

printf – prints formatted output to stdout
scanf – reads formatted input from stdin
puts – prints a string to stdout
gets – reads a string from stdin
putc – prints a character to stdout
getc, getchar – reads a character from stdin

The advantage of stdin and stdout is that they are easy to use. Likewise, the ability to redirect I/O is very powerful. For example, maybe you want to create a program that reads from stdin and counts the number of characters:

#include <stdio.h>
#include <string.h>

void main()
{
    char s[1000];
    int count=0;
     while (gets(s))
        count += strlen(s);
    printf("%dn",count);
}

Enter this code and run it. It waits for input from stdin, so type a few lines. When you are done, press CTRL-D to signal end-of-file (eof). The gets function reads a line until it detects eof, then returns a 0 so that the while loop ends. When you press CTRL-D, you see a count of the number of characters in stdout (the screen). (Use man gets or your compiler’s documentation to learn more about the gets function.)

Now, suppose you want to count the characters in a file. If you compiled the program to an executable named xxx, you can type the following:

xxx < filename

Instead of accepting input from the keyboard, the contents of the file named filename will be used instead. You can achieve the same result using pipes:

cat < filename | xxx

You can also redirect the output to a file:

xxx < filename > out

This command places the character count produced by the program in a text file named out.

Sometimes, you need to use a text file directly. For example, you might need to open a specific file and read from or write to it. You might want to manage several streams of input or output or create a program like a text editor that can save and recall data or configuration files on command. In that case, use the text file functions in stdio:

fopen – opens a text file
fclose – closes a text file
feof – detects end-of-file marker in a file
fprintf – prints formatted output to a file
fscanf – reads formatted input from a file
fputs – prints a string to a file
fgets – reads a string from a file
fputc – prints a character to a file
fgetc – reads a character from a file

Text Files: Opening

You use fopen to open a file. It opens a file for a specified mode (the three most common are r, w, and a, for read, write, and append). It then returns a file pointer that you use to access the file. For example, suppose you want to open a file and write the numbers 1 to 10 in it. You could use the following code:

#include <stdio.h>
#define MAX 10

int main()
{
    FILE *f;
    int x;
    f=fopen("out","w");
    if (!f)
        return 1;
    for(x=1; x<=MAX; x++)
        fprintf(f,"%dn",x);
    fclose(f);
    return 0;
}

The fopen statement here opens a file named out with the w mode. This is a destructive write mode, which means that if out does not exist it is created, but if it does exist it is destroyed and a new file is created in its place. The fopen command returns a pointer to the file, which is stored in the variable f. This variable is used to refer to the file. If the file cannot be opened for some reason, f will contain NULL.

The fprintf statement should look very familiar: It is just like printf but uses the file pointer as its first parameter. The fclose statement closes the file when you are done.

Text Files: Reading

To read a file, open it with r mode. In general, it is not a good idea to use fscanf for reading: Unless the file is perfectly formatted, fscanf will not handle it correctly. Instead, use fgets to read in each line and then parse out the pieces you need.

The following code demonstrates the process of reading a file and dumping its contents to the screen:

#include <stdio.h>

int main()
{
    FILE *f;
    char s[1000];

    f=fopen("infile","r");
    if (!f)
        return 1;
    while (fgets(s,1000,f)!=NULL)
        printf("%s",s);
    fclose(f);
    return 0;
}

The fgets statement returns a NULL value at the end-of-file marker. It reads a line (up to 1,000 characters in this case) and then prints it to stdout. Notice that the printf statement does not include n in the format string, because fgets adds n to the end of each line it reads. Thus, you can tell if a line is not complete in the event that it overflows the maximum line length specified in the second parameter to fgets.

Source : HowStuffWorks