Join the Stack Overflow Community
Stack Overflow is a community of 6.6 million programmers, just like you, helping each other.
Join them; it only takes a minute:
Sign up

I'm looking for a regexp to remove one character words. I don't mind whether using perl, awk, sed or bash built-ins.

Test case:

$ echo "a b c d e f g h ijkl m n opqrst u v" | $COMMAND

Desired output:

ijkl opqrst

What I've tried so far:

$ echo "a b c d e f g h ijkl m n opqrst u v" | sed 's/ . //g'
acegijkln opqrstv

I'm guessing that:

  • the a isn't removed because there is no white space before it
  • the c remains because once the b has been removed, there is no more whitespace before it
  • and so on...

Attempt #2:

$ echo "a b c d e f g h ijkl m n opqrst u v" | sed 's/\w.\w//g'
     s v

Here I don't get at all what's happening.

Any help + explanations are welcome, I want to learn.

share|improve this question
2  
Possible duplicate of Learning Regular Expressions – Biffen Jan 17 at 9:41
1  
Hum I disagree, there is a specific question in my post. – nicoco Jan 17 at 9:42
1  
@nicoco, You can try with word boundary (\b). – sat Jan 17 at 9:44
1  
@nicoco That's not a question, though. IMHO, this looks like a give-me-the-code post. – Biffen Jan 17 at 9:44
3  
@Biffen: I disagree. The OP has written a solution to their problem and is asking for help to get it working. – Borodin Jan 17 at 10:53
up vote 7 down vote accepted

You have to use word boundary \b (or) \< and \> respectively match the empty string at the beginning and end of a word.

echo "a b c d e f g h ijkl m n opqrst u v" | sed 's/\b\w\b \?//g'

(OR)

echo "a b c d e f g h ijkl m n opqrst u v" | sed 's/\<.\> \?//g'
share|improve this answer
    
It leaves a lot of white spaces before the "long" words, but I can work with that. Thanks! – nicoco Jan 17 at 9:51
1  
@nicoco You can use s/\b\w\b ?//g to remove the whispaces aswell. – Dada Jan 17 at 9:52
    
Be very careful with \b: what you have will clobber things like "will-o'-the-wisp" and "Build-A-Bear". – ThisSuitIsBlackNot Jan 17 at 15:39
2  
Or the same solution with GNU awk: awk '{gsub(/\<.\> ?/,"")}1'. – Ed Morton Jan 17 at 16:44

You could simply use grep:

echo "a b c d e f g h ijkl m n opqrst u v"  | grep -o '[a-z]\{2,\}'

where the regex is matching any word composed with at least 2 characters.

The -o option in grep prints the matching pattern (and not the entire line).

share|improve this answer
    
You could use grep -E so you wouldn't need those pesky backslashes. – tripleee Jan 17 at 10:43
    
It should be noted that this separates all matches with a newline, which is not exactly the same as the desired output as written in the question. This may or may not be a problem, depending on the circumstances. – Toby Speight Jan 17 at 10:46
    
In that case, pipe into | paste -sd " " – glenn jackman Jan 17 at 14:06

Albeit, Awk is not the most efficient of ways to do this, answering only because it is tagged , using its length() string function. It is POSIX compliant, so no issues on portability.

echo "a b c d e f g h ijkl m n opqrst u v" | \
  awk '{for(i=1;i<=NF;i++) {if (length($i)>1) { printf "%s ", $i }} }'
ijkl opqrst
share|improve this answer
    
You shouldn't say awk is not the most efficient way...., just that the specific awk code you posted is not the most efficient way. – Ed Morton Jan 17 at 16:43
1  
@EdMorton: As you say Ed! May be you can correct my logic or provide a more efficient way for this. – Inian Jan 17 at 16:44
    
I added the awk equivalent of the accepted answer under that answer, see stackoverflow.com/a/41693834/1745001 – Ed Morton Jan 17 at 16:45
    
@EdMorton: Well! everybody can't answer in the same class as of Ed Morton in awk – Inian Jan 17 at 16:46

Perl solution: just filter elements on length

echo "a b c d e f g h ijkl m n opqrst u v" | perl -lanE \
  'say join " ", grep {length($_) > 1} @F'
share|improve this answer
    
If you want to be more terse, you can omit the default variable: grep {length > 1} @F – glenn jackman Jan 17 at 14:07

Just for fun, another option: translate spaces to newlines and look for lines with at least 2 characters

$ echo "a b c d e f g h ijkl m n opqrst u v" | tr ' ' '\n' | grep .. | paste -sd " "
ijkl opqrst
share|improve this answer

Not being familiar with any linux sprung tools, this is somewhat of a guess, but I think the (a) regex you want is

(?:\s\w\b|\b\w\s)

like

$ echo "a b c d e f g h ijkl m n opqrst u v" | sed 's/(?:\s\w\b|\b\w\s)//g'

This would replace any single character either preceded by, or foolowed by, a space with nothing.

Check the regex out here at regex101.

share|improve this answer

Another in awk. A non-space ([^ ]) is considered a word. Feel free to replace it with your definition of a word.

$ awk '{while(sub(/^[^ ] | [^ ]$/,"")||sub(/ [^ ] /," "));}1'

Using sub it replaces [a space][non-space][a space] tuples with a space and removes from the beginning and end of record the single characters and leading / trailing space. It's in a while so it keeps doing it until there are no hits left. To test it:

$ echo "a b c d e f g h ijkl m n opqrst u v"|awk '{while(sub(/^[^ ] | [^ ]$/,"")||sub(/ [^ ] /," "));}1'
ijkl opqrst
share|improve this answer
echo "a b c d e f g h ijkl m n opqrst u v"  | grep -wo "\b[a-z][a-z]\+\b"
share|improve this answer
    
With -w you don't need the \b anchors. – tripleee Jan 17 at 10:45
    
yes -w is actually redundant it can be removed – user3369871 Jan 17 at 10:46

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.