Remove one-character words

Question

I'm looking for a regexp to remove one character words. I don't mind whether using perl, awk, sed or bash built-ins.

Test case:

$ echo "a b c d e f g h ijkl m n opqrst u v" | $COMMAND

Desired output:

ijkl opqrst

What I've tried so far:

$ echo "a b c d e f g h ijkl m n opqrst u v" | sed 's/ . //g'
acegijkln opqrstv

I'm guessing that:

the a isn't removed because there is no white space before it
the c remains because once the b has been removed, there is no more whitespace before it
and so on...

Attempt #2:

$ echo "a b c d e f g h ijkl m n opqrst u v" | sed 's/\w.\w//g'
     s v

Here I don't get at all what's happening.

Any help + explanations are welcome, I want to learn.

@nicoco That's not a question, though. IMHO, this looks like a give-me-the-code post. — Biffen, Jan 17 at 9:44
@Biffen: I disagree. The OP has written a solution to their problem and is asking for help to get it working. — Borodin, Jan 17 at 10:53

sat · Accepted Answer · 2017-01-17 09:55:55Z

up vote 7 down vote accepted

You have to use word boundary \b (or) \< and \> respectively match the empty string at the beginning and end of a word.

echo "a b c d e f g h ijkl m n opqrst u v" | sed 's/\b\w\b \?//g'

(OR)

echo "a b c d e f g h ijkl m n opqrst u v" | sed 's/\<.\> \?//g'

edited Jan 17 at 9:55

answered Jan 17 at 9:49

sat

9,40112451

It leaves a lot of white spaces before the "long" words, but I can work with that. Thanks! – nicoco Jan 17 at 9:51

1

@nicoco You can use s/\b\w\b ?//g to remove the whispaces aswell. – Dada Jan 17 at 9:52

Be very careful with \b: what you have will clobber things like "will-o'-the-wisp" and "Build-A-Bear". – ThisSuitIsBlackNot Jan 17 at 15:39

2

Or the same solution with GNU awk: awk '{gsub(/\<.\> ?/,"")}1'. – Ed Morton Jan 17 at 16:44

add a comment |

oliv · Answer 2 · 2017-01-17 10:35:17Z

up vote 4 down vote

You could simply use grep:

echo "a b c d e f g h ijkl m n opqrst u v"  | grep -o '[a-z]\{2,\}'

where the regex is matching any word composed with at least 2 characters.

The -o option in grep prints the matching pattern (and not the entire line).

edited Jan 17 at 10:35

answered Jan 17 at 9:53

oliv

2,087613

You could use grep -E so you wouldn't need those pesky backslashes. – tripleee Jan 17 at 10:43

It should be noted that this separates all matches with a newline, which is not exactly the same as the desired output as written in the question. This may or may not be a problem, depending on the circumstances. – Toby Speight Jan 17 at 10:46

In that case, pipe into | paste -sd " " – glenn jackman Jan 17 at 14:06

add a comment |

Inian · Answer 3 · 2017-01-17 10:43:48Z

up vote 2 down vote

Albeit, Awk is not the most efficient of ways to do this, answering only because it is tagged awk, using its length() string function. It is POSIX compliant, so no issues on portability.

echo "a b c d e f g h ijkl m n opqrst u v" | \
  awk '{for(i=1;i<=NF;i++) {if (length($i)>1) { printf "%s ", $i }} }'
ijkl opqrst

edited Jan 17 at 10:43

answered Jan 17 at 10:37

Inian

11.5k11239

You shouldn't say awk is not the most efficient way...., just that the specific awk code you posted is not the most efficient way. – Ed Morton Jan 17 at 16:43

1

@EdMorton: As you say Ed! May be you can correct my logic or provide a more efficient way for this. – Inian Jan 17 at 16:44

I added the awk equivalent of the accepted answer under that answer, see stackoverflow.com/a/41693834/1745001 – Ed Morton Jan 17 at 16:45

@EdMorton: Well! everybody can't answer in the same class as of Ed Morton in awk – Inian Jan 17 at 16:46

add a comment |

Arunesh Singh · Answer 4 · 2017-01-17 11:01:19Z

up vote 1 down vote

Perl solution: just filter elements on length

echo "a b c d e f g h ijkl m n opqrst u v" | perl -lanE \
  'say join " ", grep {length($_) > 1} @F'

answered Jan 17 at 11:01

Arunesh Singh

2,686819

If you want to be more terse, you can omit the default variable: grep {length > 1} @F – glenn jackman Jan 17 at 14:07

add a comment |

glenn jackman · Answer 5 · 2017-01-17 14:09:28Z

up vote 1 down vote

Just for fun, another option: translate spaces to newlines and look for lines with at least 2 characters

$ echo "a b c d e f g h ijkl m n opqrst u v" | tr ' ' '\n' | grep .. | paste -sd " "
ijkl opqrst

answered Jan 17 at 14:09

glenn jackman

127k2095174

add a comment |

ClasG · Answer 6 · 2017-01-17 11:15:37Z

Not being familiar with any linux sprung tools, this is somewhat of a guess, but I think the (a) regex you want is

(?:\s\w\b|\b\w\s)

like

$ echo "a b c d e f g h ijkl m n opqrst u v" | sed 's/(?:\s\w\b|\b\w\s)//g'

This would replace any single character either preceded by, or foolowed by, a space with nothing.

Check the regex out here at regex101.

James Brown · Answer 7 · 2017-01-17 12:44:33Z

Another in awk. A non-space ([^ ]) is considered a word. Feel free to replace it with your definition of a word.

$ awk '{while(sub(/^[^ ] | [^ ]$/,"")||sub(/ [^ ] /," "));}1'

Using sub it replaces [a space][non-space][a space] tuples with a space and removes from the beginning and end of record the single characters and leading / trailing space. It's in a while so it keeps doing it until there are no hits left. To test it:

$ echo "a b c d e f g h ijkl m n opqrst u v"|awk '{while(sub(/^[^ ] | [^ ]$/,"")||sub(/ [^ ] /," "));}1'
ijkl opqrst

user3369871 · Answer 8 · 2017-01-17 10:28:15Z

up vote -1 down vote

echo "a b c d e f g h ijkl m n opqrst u v"  | grep -wo "\b[a-z][a-z]\+\b"

edited Jan 17 at 10:28

answered Jan 17 at 10:22

user3369871

1328

With -w you don't need the \b anchors. – tripleee Jan 17 at 10:45

yes -w is actually redundant it can be removed – user3369871 Jan 17 at 10:46

add a comment |

asked	7 days ago
viewed	233 times
active	7 days ago

current community

your communities

more stack exchange communities

Remove one-character words

8 Answers 8

Your Answer

Not the answer you're looking for? Browse other questions tagged regex bash perl awk sed or ask your own question.

Linked

Hot Network Questions

current community

your communities

more stack exchange communities

Remove one-character words

8 Answers 8

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged regex bash perl awk sed or ask your own question.

Linked

Related

Hot Network Questions