entab program which replaces 4 spaces with a tab

Question

I wrote an entab program which needs to replace every \$n\$ spaces with 1 tab. I picked \$n\$ to be 4, so every 4 spaces should be 1 tab. I have a feeling that my code is far too complicated than it should be, and the logic I made it with could be much simpler.

The logic of the code:

Note: \$n\$ = 4

Get line from input and put it in line array.
Whenever n spaces is found in that line kick n-1 of them from the array and the remaining space is replaced with tab.
Print the line.

#include <stdio.h> 
#define MAXLINESIZE 100
#define TABSPACES 4

int mgetline(char *s, int lim);
void kick(char *x, char val);
int countx(char *x);
void swap(char* x, int i, int j);

int main(void) {
    char line[MAXLINESIZE];
    char *linep = line;
    char *tmp = 0;
    int tab = 0;
    int i = TABSPACES - 1;
    while (mgetline(line, MAXLINESIZE) > 0) {
        while (*linep) {
            if (*linep == ' ') {
                if (tab == TABSPACES - 1) {
                    while (i > 0) {         
                        tmp = linep - i;
                        kick(tmp, ' ');
                        linep--;
                        i--;
                    }
                    *linep = '\t';
                    tab = 0;
                }
                else
                    tab++;
            }
            else 
                tab = 0;
            linep++;
        }
        printf("%s", line);
        linep = line;
    }
    return 0;
}

int mgetline(char *s, int lim)
{
    int c;
    char *t = s;

    while (--lim > 0 && (c = getchar()) != EOF && c != '\n')
        *s++ = c;
    if (c == '\n')
        *s++ = c;

    *s = '\0';

    return s - t;
}

void kick(char *x, char val) {
    char *sp = x;
    int i = 1;
    while (*x) {
        if (*x == val) {
            while (countx(sp) - i != x - sp) {
                swap(sp, x - sp, countx(sp) - i);
                i++;
            }
            *(sp + countx(sp) - 1) = 0;
            return;
        }
        x++;
    }
}

int countx(char* x) {
    int i;
    for (i = 0; x[i]; i++);
    return i;
}

void swap(char* x, int i, int j) {
    char tmp = x[i];
    x[i] = x[j];
    x[j] = tmp;
}

Quuxplusone · Answer 1 · 2016-12-31 20:16:46Z

The biggest thing you can do to improve your C style (and this also goes for any curly-brace language) is to learn to love the for loop.

foo(i);
while (bar(i)) {
    something;
    baz(i);
}

should almost always be recast as

for (foo(i); bar(i); baz(i)) {
    something;
}

Examples from your code:

char *linep = line;
while (...) {
    while (*linep) {
        // body
        linep++;
    }
    printf("%s", line);
    linep = line;
}

should be

while (...) {
    for (char *linep = line; *linep != '\0'; ++linep) {
        // body
    }
    printf("%s", line);
}

(Notice that I added an explicit comparison with '\0' for clarity — since it doesn't affect the generated code — and I changed your statement "linep increment" to the more readable "increment linep." There's another reason to use preincrement over postincrement, but it doesn't matter until you get to C++.)

Here's another example:

void kick(char *x, char val) {
    char *sp = x;
    int i = 1;
    while (*x) {
        if (*x == val) {
            while (countx(sp) - i != x - sp) {
                swap(sp, x - sp, countx(sp) - i);
                i++;
            }
            *(sp + countx(sp) - 1) = 0;
            return;
        }
        x++;
    }
}

should be

void kick(char *sp, char val) {
    for (char *x = sp; *x != '\0'; ++x) {
        if (*x == val) {
            for (int i = 1; countx(sp) - i != x - sp; ++i) {
                swap(sp, x - sp, countx(sp) - i);
            }
            sp[countx(sp) - 1] = '\0';
            return;
        }
    }
}

Let's keep working on the kick function. The next thing I notice is that you repeatedly call this function countx(sp) inside a tight loop. Surely that's a performance hit; it would make more sense to cache the value once in a local variable and then use the variable. But let's see what countx(sp) actually does.

int countx(char* x) {
    int i;
    for (i = 0; x[i]; i++);
    return i;
}

Well, that's just strlen(x) (modulo some const-incorrectness). You should write

int countx(const char *x) {
    return strlen(x);
}

and then get rid of the countx function altogether. So now we have

void kick(char *sp, char val) {
    const int n = strlen(sp);
    for (char *x = sp; *x != '\0'; ++x) {
        if (*x == val) {
            for (int i = 1; n - i != x - sp; ++i) {
                swap(sp, x - sp, n - i);
            }
            sp[n - 1] = '\0';
            return;
        }
    }
}

Next, I notice that if (*x == val), we just do a bit more processing and then return. So we don't actually need that extra bit of processing to happen inside the loop; it makes more sense to say "loop until we find a val; then break the loop; then do more processing and return." Also, there's another convenient library function for the looping part of that: it's called strchr.

So the next most productive thing you can do for your programming skill is to learn to love the standard library.

void kick(char *sp, char val) {
    const int n = strlen(sp);
    char *x = strchr(sp, val);
    if (x != NULL) {
        for (int i = 1; n - i != x - sp; ++i) {
            swap(sp, x - sp, n - i);
        }
        sp[n - 1] = '\0';
    }
}

Okay, let's look at the remaining complexity in that kick function. What's this three-argument swap routine? I'm familiar with the idea of swapping two objects... does this rotate the three of them, or something?

...Ah, no, it's just doing some unnecessary array operations. Let's write it the normal two-argument way:

void swap(char *x, char *y) {
    char tmp = *x;
    *x = *y;
    *y = tmp;
}

... swap(&sp[x - sp], &sp[n - i]); ...

Ah, wait, that call can be just

... swap(x, &sp[n - i]); ...

So now what do we have?

void kick(char *sp, char val) {
    const int n = strlen(sp);
    char *x = strchr(sp, val);
    if (x != NULL) {
        for (int i = 1; i < sp + n - x; ++i) {
            swap(x, sp[n - i]);
        }
        sp[n - 1] = '\0';
    }
}

Well, that's a lot of calls to swap. Let's see what's actually happening here... aha, I see, you're trying to bubble *x from the first position all the way to the end of the array. (But doing it in a very odd way where the value *x actually pops into the right place immediately, and then you spend the rest of the O(n) time just trying to get all the other elements back into their proper places.)

Okay, why do we want *x at the end of the array? (After all, we know its value is val, so we wouldn't even technically need to swap it.) ...Aw, really?

sp[n - 1] = '\0';

All that work, just to null it out? Okay, seems like we don't care about *x; all this loop is really trying to do is bump the remaining elements down by one position in the array. Again, there's a library function for that.

void kick(char *sp, char val) {
    const int n = strlen(sp);
    char *x = strchr(sp, val);
    if (x != NULL) {
        memmove(x, x+1, sp + n - x);
    }
}

Notice that since the sp + nth character of sp is '\0', we can just memmove it with the rest of the characters; we don't need to treat it specially.

And now we can see that n is used in only one place, as a part of the expression sp + n; and now we know about strchr, so we can reuse it and move the expensive computation inside the if so that we won't waste time computing it if it's not going to be used.

void kick(char *sp, char val) {
    char *x = strchr(sp, val);
    if (x != NULL) {
        char *end = strchr(sp, '\0');
        memmove(x, x+1, end - x);
    }
}

Now we can go back up and look for the callers of kick. Turns out there's only one — kick(tmp, ' ') — so we don't need val to be a parameter; in fact we should inline the entire three-line body of kick straight into the one place it's called.

...And of course there's much more to clean up, but at this point I'll call it a day. I hope that by reading and studying answers like these, you'll learn how to apply the same kinds of thinking to your own code. A particularly good book on the subject is Kernighan and Plauger's The Elements of Programming Style. (Don't worry that it deals in languages that may be unfamiliar to you. That's part of the point: the techniques of programming are largely language-independent.)

Oh, I suppose I should also point out some problems starting from the top and working down. First, look up strstr. Then, try your program on the input "hello\tworld\nhello world" (that's four spaces after the second "hello") and observe the trouble with it.

I vaguely recall that a working entab/detab program is one of the programs developed in Kernighan and Plauger's Software Tools, which I recommend not as highly as The Elements of Programming Style but it's still pretty good. I also have a working version here.

Nice cleanup! I will wait one day more to see if there will be more improvements and compare with this. Then i will accept this as answer if it is the best one. — Siliproksi, 1 hour ago
And you have a good point by using standard library's functions, but because i am beginner i want to learn much more by making my own functions. — Siliproksi, 1 hour ago

asked	today
viewed	30 times
active	today

current community

your communities

more stack exchange communities

entab program which replaces 4 spaces with a tab

1 Answer 1

Your Answer

Not the answer you're looking for? Browse other questions tagged c io formatting or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

entab program which replaces 4 spaces with a tab

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged c io formatting or ask your own question.

Related

Hot Network Questions