Count text occurrences per line

Question

I have to parse huge text files where certain lines are of interest and others are not. Within those of interest I have to count the occurrences of a certain keyword.

Assumed the file is called input.txt and it looks like this:

format300,format250,format300
format250,ignore,format160,format300,format300
format250,format250,format300

I want to exclude the lines with ignore and count the number of format300, how do I do that?

What I've got so far is this command which only counts ONCE PER LINE (which is not yet good enough):

cat input.txt | grep -v ignore | grep 'format300' | wc -l

Any suggestions? If possible I want to avoid using perl.

terdon · Answer 1 · 2016-04-08 14:28:24Z

up vote 6 down vote

This one-liner should be able to do what you want:

grep -v ignore input.txt | sed 's/format300/format300\n/g' | grep -c "format300"

basically you are replacing each occurrence of your keyword with the keyword itself and a newline character, which effectively makes your input stream have the keyword only once on any given line. Then grep -c is counting lines with your keyword in them.

edited 13 hours ago

terdon♦

75.8k12118218

answered 13 hours ago

MelBurslan

3,366624

2

I would replace the sed with tr ',' '\n' otherwise you're going to count format3000s as well not just format300s – 1_CR 12 hours ago

1

@1_CR, agreed, but you would also have to use grep -xc format300 instead of grep -c format300 to avoid false positives on "format3000". So full solution is grep -v ignore input.txt | tr , '\n' | grep -xc format300 – Wildcard 9 hours ago

@Wildcard, indeed – 1_CR 9 hours ago

add a comment |

Carlos Campderrós · Answer 2 · 2016-04-08 16:39:08Z

You don't need the first cat, that it is known as a Useless use of cat (UUOC).

Also, very useful is grep -o, that only outputs the matching patterns, one per line.

And then, count lines with wc -l.

grep -v ignore YOUR_FILE | grep -o format300 | wc -l

This prints 3 for your small sample.

terdon · Answer 3 · 2016-04-08 14:54:29Z

A Perl way:

perl -lne '$k+=(s/format300//g) unless /ignore/; }{ print $k' input.txt

The s/format300//g will replace all occurrences of format300 with nothing, and returns the number of replacements. It's a simple way of counting the occurrences. The number is then added to $k and the whole thing only happens if the line doesn't match ignore. The }{ is perl shorthand for "do this after you've finished reading the file, so print $k will print the total number found.

user1598390 · Answer 4 · 2016-04-08 17:51:19Z

up vote 1 down vote

Input file may potentially contain partial matches that would invalidate the result, for example:

1 format300,format250,format300
2 format250,ignore,format160,format300,format300
3 format250,format250,format300
4 format999,format300000,format999
5 format999,ignore_me_not,format300

You don't want to count format300000 on line 4 or ignore line 5 because ignore_me_not contains the substring ignore.

This would do the trick:

grep -v "\bignore\b" FILE |grep -o "\bformat300\b"|wc -l

Correct output is

..becase line 2 is ignored, line 5 is not, and line 4 doesn't contain exactly format300.

If you let the wc -l part out, you can see what exactly is being matched:

edited 10 hours ago

answered 10 hours ago

user1598390

2451314

Clever, but it goes beyond his actual stated requirements: "I want to exclude the lines with ignore..." Your point about format3000 et. al. is perfectly valid, though. (As I commented on another answer.) – Wildcard 9 hours ago

This is the only solution that is considering the false positives, the others so far will all fail with this input. There is still a problem, though, because the word boundaries aren't quite what are needed. A character like - is not a word character, so ignore-me-not will match \bignore\b but shouldn't. – leftclickben 9 hours ago

add a comment |

asked	today
viewed	146 times
active	today

current community

your communities

more stack exchange communities

Count text occurrences per line

4 Answers 4

Your Answer

Not the answer you're looking for? Browse other questions tagged text-processing grep or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

Count text occurrences per line

4 Answers 4

Did you find this question interesting? Try our newsletter

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged text-processing grep or ask your own question.

Related

Hot Network Questions