Join the Stack Overflow Community
Stack Overflow is a community of 6.6 million programmers, just like you, helping each other.
Join them; it only takes a minute:
Sign up

I'm having difficulty writing a Perl program to extract the word following a certain word.

For example:

Today i'm not  going anywhere except to office.

I want the word after anywhere, so the output should be except.

I have tried this

my $words = "Today i'm not  going anywhere except to office.";
my $w_after = ( $words =~ /anywhere (\S+)/ );

but it seems this is wrong.

share|improve this question
    
You can accept answers from the below updates. – ssr1012 Jan 16 at 11:22
    
@ssr1012: One may also wait a day or two to see if a better answer appears – Borodin Jan 16 at 15:28
    
@Borodin: OP said/confirms it helps for Jim Garrison answers. Hence I requested here. – ssr1012 Jan 17 at 6:50

Very close:

my ($w_after) = ($words =~ /anywhere\s+(\S+)/);
   ^        ^                       ^^^
   +--------+                        |
     Note 1                        Note 2

Note 1: =~ returns a list of captured items, so the assignment target needs to be a list.

Note 2: allow one or more blanks after anywhere

share|improve this answer
    
Thanks Jim..it helps!!! – Azizul Jan 16 at 6:27
    
@JimGarrison Can you explain the use of ().? my ($w_after) = $words =~ /anywhere\s+(\S+)/; it is also give the same result then why.? – mkHun Jan 16 at 7:13
2  
@mkHun that is used for operator precedence. In this case =~ has higher prcedence than = that's why giving same result. – Arunesh Singh Jan 16 at 7:36

First, you have to write parentheses around left side expression of = operator to force array context for regexp evaluation. See m// and // in perlop documentation.[1] You can write parentheses also around =~ binding operator to improve readability but it is not necessary because =~ has pretty high priority.

Use POSIX Character Classes word

my ($w_after) = ($words =~ / \b anywhere \W+ (\w+) \b /x);

Note I'm using x so whitespaces in regexp are ignored. Also use \b word boundary to anchor regexp correctly.

[1]: I write my ($w_after) just for convenience because you can write my ($a, $b, $c, @rest) as equivalent of (my $a, my $b, my $c, my @rest) but you can also control scope of your variables like (my $a, our $UGLY_GLOBAL, local $_, @_).

share|improve this answer
1  
Whilst your answer is correct, the thing the OP needs is the my ( $w_after ) so it does the assignment in a list context. I think it would be useful to spell that out. – Sobrique Jan 16 at 10:40

In Perl v5.22 and later, you can use \b{wb} to get better results for natural language. The pattern could be

/anywhere\b{wb}.+?\b{wb}(.+?\b{wb})/

"wb" stands for word break, and it will account for words that have apostrophes in them, like "I'll", that plain \b doesn't.

.+?\b{wb}

matches the shortest non-empty sequence of characters that don't have a word break in them. The first one matches the span of spaces in your sentence; and the second one matches "except". It is enclosed in parentheses, so upon completion $1 contains "except".

\b{wb} is documented most fully in perlrebackslash

share|improve this answer

This Regex to be matched:

my ($expect) = ($words=~m/anywhere\s+([^\s]+)\s+/);

^\s+ the word between two spaces

Thanks.

share|improve this answer
    
Thanks @ssr1012..it helps!!! – Azizul Jan 16 at 6:28

If you want to also take into consideration the punctuation marks, like in:

my $words = "Today i'm not going anywhere; except to office.";

Then try this:

my ($w_after) = ($words =~ /anywhere[[:punct:]|\s]+(\S+)/);
share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.