In creating this answer to a recent question, I thought I might write a fully worked example that implements a finite state machine in C.

The purpose is to implement the following function:

Given a const char *, search for a string conforming to the pattern WWWDD where WWW is one of { "Mon", "Tue", "Wed, "Thu", "Fri", "Sat", "Sun" } and DD is an integer in the range \$[1, 31]\$ (inclusive). If the pattern exists within the string, return the offset; otherwise return -1.

I implemented this state machine:finite state machine using simple table structure. This is the code, which includes a simple driver with test vectors.

#include <string.h>
#include <stdlib.h>
#include <ctype.h>

static const struct fsmNode {
    const char target;
    const struct fsmNode *next;
} fsm[] = {
    { 'T' , &fsm[6] },    // 0
    { 'M' , &fsm[9] },    // 1
    { 'S' , &fsm[11] },   // 2
    { 'W' , &fsm[12] },   // 3
    { 'F' , &fsm[14] },   // 4
    { '\0' , NULL },      // 5
    // second letter
    { 'h' , &fsm[18] },   // 6
    { 'u' , &fsm[20] },   // 7
    { '\0' , NULL },      // 8
    { 'o' , &fsm[22] },   // 9
    { '\0' , NULL },      // 10
    { 'a' , &fsm[24] },   // 11
    { 'u' , &fsm[22] },   // 12
    { '\0' , NULL },      // 13
    { 'e' , &fsm[26] },   // 14
    { '\0' , NULL },      // 15
    { 'r' , &fsm[27] },   // 16
    { '\0' , NULL },      // 17
    // third letter
    { 'u' , &fsm[30] },   // 18
    { '\0' , NULL },      // 19
    { 'e' , &fsm[30] },   // 20
    { '\0' , NULL },      // 21
    { 'n' , &fsm[30] },   // 22
    { '\0' , NULL },      // 23
    { 't' , &fsm[30] },   // 24
    { '\0' , NULL },      // 25
    { 'd' , &fsm[30] },   // 26
    { '\0' , NULL },      // 27
    { 'i' , &fsm[30] },   // 28
    { '\0' , NULL },      // 29
    // first digit
    { '0' , &fsm[36] },   // 30
    { '1' , &fsm[35] },   // 31
    { '2' , &fsm[35] },   // 32
    { '3' , &fsm[46] },   // 33
    { '\0' , NULL },      // 34
    // second digit
    { '0' , NULL },       // 35
    { '1' , NULL },       // 36
    { '2' , NULL },       // 37
    { '3' , NULL },       // 38
    { '4' , NULL },       // 39
    { '5' , NULL },       // 40
    { '6' , NULL },       // 41
    { '7' , NULL },       // 42
    { '8' , NULL },       // 43
    { '9' , NULL },       // 44
    { '\0' , NULL },      // 45
    { '0' , NULL },       // 46
    { '1' , NULL },       // 47
    { '\0' , NULL },      // 48
};

int findWeekdayDate(const char *str) {
    const struct fsmNode *state = fsm;
    const char *curr;
    for (curr = str ; *curr; ++curr) {
        for ( ; state->target && *curr != state->target; ++state) 
        { /* do nothing */ }
        if (state->next == NULL) {
            if (state->target) {
                // found a valid string
                return curr-str-4;  
            } else {
                state = fsm;
            }
        } else {
            state = state->next;
        }
    }
    return -1;
}

// Test code follows...
static const struct test_s {
    const char *string;
    int expected;
} test[] = {
    { "abcSun24Def", 3},
    { "abcMun24Def", -1},
    { "abcSun091", 3},
    { "MON31", -1},
    { "Mon31", 0},
    { NULL, -1 }    // last item
};

#include <stdio.h>

#define RED    "\033[31m"
#define GREEN  "\033[32m"
#define WHITE  "\033[39m"

int main() {
    for (size_t i=0; test[i].string != NULL; ++i) {
        int val = findWeekdayDate(test[i].string);
        printf("%s: \"%s\", expected %d, got %d\n", 
            (val == test[i].expected ? GREEN " OK" WHITE: RED "BAD" WHITE),
            test[i].string, test[i].expected, val
            );
    }
}

Comments on efficiency, style, performance, data structure choice and implementation are all of interest.

share|improve this question
1  
Because it's been asked, the software I used to create the FSM diagram is graphviz. – Edward 5 hours ago

Nice work.

  1. On findWeekdayDate() return, I'd expect a value that is enumerated like fWD_SUCCESS, fWD_FAIL, fWD_SUCCESS_for_foo_reason, fWD_FAIL_for_bar_reason, etc. Rather than magic numbers -1,0,3,.... It could end up being a simple bool.

  2. As fsm[] is const struct fsmNode, I see no value in making a field const like const char target. char target is simpler and sufficient. OTOH, if there is some value, then next should also be const as in const struct fsmNode * const next.

  3. Delay/remove #includes.

    #include <string.h>  // Needed?
    #include <stdlib.h>  // OK, for NULL
    #include <ctype.h>   // Needed?
    
  4. Style inconsistency. Recommend empty for body to match other indentations.

    // for ( ; state->target && *curr != state->target; ++state) 
    //{ /* do nothing */ }
    for ( ; state->target && *curr != state->target; ++state) {
      /* do nothing */
    }
    
  5. Use an auto formatter. The extra space betrays manual formatting as it is inconsistent with the rest of code. Life is too short for manual formatting.

    //             v
    for (curr = str ; *curr; ++curr) {
    

Suggest adding link to software used to create FSM.

share|improve this answer
    
Thanks for the review. On the first point, the return value can't be just a bool because it represents the offset within the string that the pattern was found. That's why -1 is useful as a "not found" indicator. – Edward 5 hours ago
    
I'm confused by your second point. The definition of the struct has both elements declared as const, so I'm not sure what you mean when you write "next should also be const" -- it seems to me that it already is. Can you clarify? – Edward 5 hours ago
    
@Edward The field next as const struct fsmNode *next is a pointer to const struct fsmNode, but is not itself const. Compare struct fsmNode * const next and const struct fsmNode * const next. – chux 5 hours ago
    
Let me be more clear: what would adding an additional const keyword prevent that is currently possible but undesirable? – Edward 3 hours ago
1  
Oh, I see now. You're advocating removing const (in front of target) rather than adding one. – Edward 2 hours ago

This was quite fun to read and to figure out how it works.

It took me a while to understand the control flow, which is my main point of critisism. I rearranged it a bit so that, at least in my opinion, the logic is easier to follow.

for ( ; state->target && *curr != state->target; ++state)
{ /* do nothing */ }

Here, instead of having to use a comment to "excuse" the empty body of the for loop, I would use a while loop and add a real comment explaining its purpose.

if (state->next == NULL) {
    if (state->target) {
        // found a valid string
        return curr-str-4;  
    } else {
        state = fsm;
    }
} else {
    state = state->next;
}

For this part, I would apply the following changes:

  • Consistently use implicit boolean evaluation (i.e. use !p instead of p == NULL).
  • Use continue, in order to
    • reduce nesting,
    • group each condition with its corresponding action,
    • sort the cases by increasing success (no match → partial match → full match).
  • Add comments explaining each case.

This is what comes out:

int findWeekdayDate(const char *str) {
    const struct fsmNode *state = fsm;
    const char *curr;
    for (curr = str; *curr; ++curr) {
        // try to find a match
        while (state->target && *curr != state->target) {
            ++state;
        }
        if (!state->target) {
            // no match, go back to start
            state = fsm;
            continue;
        }
        if (state->next) {
            // current character matched, but the pattern goes on
            state = state->next;
            continue;
        }
        // found a valid string
        return curr - str - 4;
    }
    return -1;
}

I hope this is still correct, at least all the provided test cases are passed.


Other remarks are:

  • I'm afraid this is quite an inflexible solution. Adding, removing or changing the patterns would probably require updating most of the next indexes (and the "line number" comments).

  • Assuming it was somewhat convenient to generate the fsm structure, it would make sense to make it a parameter of the matching function so that the function can be reused for different FSMs.

  • The magic number 4 (in curr-str-4) should be a named constant.

    • Its value should actually by 5 (because it is the length of the WWWDD pattern),
    • the result of the function should then be calculated as (curr - PATTERN_LENGTH + 1) - str in order to (1) make the awareness of the "fence post problem" apparent and (2) highlight the fact that the offset into str is calculated.
    • Hard-coding it into the function makes it also impossible to support patterns where not all matches have the same length, e.g. the full weekday names (which would already be supported by the given FSM data structure).
share|improve this answer
    
Perhaps if (state->next) { state = state->next; continue; } return curr - str - 4; --> if (!state->next) return curr - str - 4; state = state->next; ? – chux 3 hours ago
    
Then the cases are not sorted by increasing success anymore, which was one of my arguments. – mkrieger1 3 hours ago
    
Yes, the sort idea has merit - UV. // no match, go back to start threw me off a bit as the idea of going back to the beginning of the loop makes sense, but a continue is slightly incongruent with "go back to start". As it does more than go back to the loop beginning, it also does a ++curr, the iterative step between loops. More like a "next" than "go back". – chux 3 hours ago
    
It means "continue with the next character, at the start of the pattern". I named it after the "start" node in the FSM drawing. – mkrieger1 2 hours ago

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.