Reading Strings with Undefined Length in C
Note: I’ve revised this article as of July 07, 2010. The prior implementation of my function was silly from the memory-management point of view. Without further adieu:
If you’re starting out in C programming, you might have some trouble with reading characters (either from the Standard Input or whatever other stream), because you might not be intimate with a few functions and particularities regarding character strings in C.
As you must know, in C, strings are nothing but arrays of characters. There are two ways of declaring a string: statically (char str[size];) or dynamically (char *str;). If you use a static string and you try to read more characters than the size you specified it to hold (actually (size-1), because the last character must ALWAYS be the null-character), your program won’t function correctly. This is acceptable in simple applications, but when you’re dealing with many strings which can get big, this is problematic, because you don’t wanna allocate memory you won’t be using (you’ll only be wasting it if the strings aren’t as big as you thought they would be).
By the way, NEVER use gets(). gets() will keep trying to read stuff and throw it in the array, regardless of the size of the array you call it with, and overflows suck. Use fgets() instead, it lets you specify how many characters you want to read. Research on this.
Back to dynamic arrays… there are two ways of using them:
1. Asking first how many characters will be read into your string, allocating that many “slots” with malloc(), and then using fgets() with the specified size, AFTER CHECKING that the size is <= sizeof(string to read to).
2. Doing it all dynamically with malloc(), realloc() and getc(), and then throwing the null byte manually at the last position.
The first way is obviously impractical. Besides the lack of elegance, sometimes you won’t be reading from the Standard Input and there won’t be a user on the other side of the app to tell you how many chars will be read. So we’re really just left with the second option. I created a function that does exactly that. The usage is:
char *str = dgets(FILE *source, int alloc_size);
This function will dynamically read char by char from the source stream (which could be, for example, ‘stdin’), store it in an auxiliar dynamic array, and then return the address of the zeroth element of that array. In other words, if you wanted to read an undefined amount of characters from the keyboard, and then print it on the screen, this is what you would do:
char *str;
str = lerdinamica(stdin, 50);
printf("%s", str);
Similarly, if you wanna throw the contents of a textfile (the length of which is unknown) into an array, you could do this:
//...
int main()
{
char *leitura;
FILE *arq;
arq = fopen("textfile.txt", "r"); // no error checking, i know.
leitura = lerdinamica(arq, 1024); // allocate 1024 bytes at a time (files are usually a few kbs big)
printf("\n\n%s\n\n", leitura);
fclose(arq);
return 0;
}
The function is simple:
/*
The below function will read the specified stream dynamically,
reallocating space for the resulting string as needed ([tamanhobase] bytes at a time).
If any getc() fails, the function returns .
The resulting string will be null-terminated.
If stdin is specified, dgets() will stop
reading when it reaches a \n, and it will not
be thrown into the string. If a different stream
is specified (such as a file in the hard drive),
the resulting string will contain the whole of the
stream's contents (even if there are multiple lines
of text, for example), plus the null-byte termination.
The function was previously called "lerdinamica()". Later on, it was
translated to english and was renamed to "readdyn()". Finally, tehra
suggested that we renamed it to "dgets()" (as in "dynamic gets()").
*/
char *dgets(FILE *origem, int tamanhobase)
{
int i, allocs;
char *aux, letra;
if (tamanhobase <= 0) return 0;
i = allocs = 0;
letra = 0;
aux = malloc (tamanhobase);
while (letra != (origem == stdin) ? '\n' : EOF)
{
if (!(letra = getc(origem))) return 0;
if (origem != stdin || letra != '\n') // check for the new value read
{
if (!allocs || i >= tamanhobase*allocs)
{
aux = realloc(aux, ++allocs*tamanhobase); // make space for [tamanhobase] more letters
}
aux[i++] = letra;
}
}
aux[i] = 0;
return aux;
}
As you can see, the function will return the address of the array, or zero if there’s an error while reading one of the characters. I’ll add more on the way the function works later.
Quick Note on Allocation Failure
Because of the way realloc() works, there’s no practical way to tell for sure (at least not that I’m aware of, let me know if you do know one) if the function was able to store ALL the characters into the array. The function would fail if the heap limit was hit (or whenever else a realloc() call would fail). This should be a very rare phenomenom, and my thoughts are it’s the user’s job to make sure the machine has enough memory to run the software.
