Programming Puzzles & Code Golf Stack Exchange is a question and answer site for programming puzzle enthusiasts and code golfers. It's 100% free, no registration required.

Sign up
Here's how it works:
  1. Anybody can ask a question
  2. Anybody can answer
  3. The best answers are voted up and rise to the top

What is the most frequent word?

Given a text file, your program must trawl through it, counting the frequencies of each word, then output the most used word. Because a text file has no fixed length, and so can get very long, your code must be as short as possible.

Rules/Requirements

  • Each submission should be either a full program or function. If it is a function, it must be runnable by only needing to add the function call to the bottom of the program. Anything else (e.g. headers in C), must be included.
  • There must be a free interpreter/compiler available for your language.
  • If it is possible, provide a link to a site where your program can be tested.
  • Your program must not write anything to STDERR.
  • Your program should take input from a text file or STDIN (or the closest alternative in your language).
  • Standard loopholes are forbidden.
  • Your program must be case-insensitive (tHe, The and the all contribute to the count of the).
  • If there is no most frequent word (see test case #3), your program should output nothing.
  • Your program is allowed to strip special characters. The test cases assume that you do.
  • A word is a group of alphanumeric characters and hyphens, separated from other words with a space.

Test Cases

The man walked down the road.
==> the

-----

Slowly, he ate the pie, savoring each delicious bite. He felt like he was truly happy.
==> he

-----

This sentence has no most frequent word.
==> 

-----

"That's... that's... that is just terrible!" he said.
==> thats / that's

-----

The old-fashioned man ate an old-fashioned cake.
==> old-fashioned

-----

IPv6 looks great, much better than IPv4, except for the fact that IPv6 has longer addresses.
==> IPv6

(The third test case has no output, you may choose either output on the forth)

Scoring

Programs are scored according to bytes. The usual character set is UTF-8, if you are using another please specify.

When the challenge finishes, the program with the least bytes (it's called ), will win.

Submissions

To make sure that your answer shows up, please start your answer with a headline, using the following Markdown template:

# Language Name, N bytes

where N is the size of your submission. If you improve your score, you can keep old scores in the headline, by striking them through. For instance:

# Ruby, <s>104</s> <s>101</s> 96 bytes

If there you want to include multiple numbers in your header (e.g. because your score is the sum of two files or you want to list interpreter flag penalties separately), make sure that the actual score is the last number in the header:

# Perl, 43 + 2 (-p flag) = 45 bytes

You can also make the language name a link which will then show up in the leaderboard snippet:

# [><>](http://esolangs.org/wiki/Fish), 121 bytes

Leaderboard

Here is a Stack Snippet to generate both a regular leaderboard and an overview of winners by language.

/* Configuration */

var QUESTION_ID = 79576; // Obtain this from the url
// It will be like https://XYZ.stackexchange.com/questions/QUESTION_ID/... on any question page
var ANSWER_FILTER = "!t)IWYnsLAZle2tQ3KqrVveCRJfxcRLe";
var COMMENT_FILTER = "!)Q2B_A2kjfAiU78X(md6BoYk";
var OVERRIDE_USER = 53406; // This should be the user ID of the challenge author.

/* App */

var answers = [], answers_hash, answer_ids, answer_page = 1, more_answers = true, comment_page;

function answersUrl(index) {
  return "https://api.stackexchange.com/2.2/questions/" +  QUESTION_ID + "/answers?page=" + index + "&pagesize=100&order=desc&sort=creation&site=codegolf&filter=" + ANSWER_FILTER;
}

function commentUrl(index, answers) {
  return "https://api.stackexchange.com/2.2/answers/" + answers.join(';') + "/comments?page=" + index + "&pagesize=100&order=desc&sort=creation&site=codegolf&filter=" + COMMENT_FILTER;
}

function getAnswers() {
  jQuery.ajax({
    url: answersUrl(answer_page++),
    method: "get",
    dataType: "jsonp",
    crossDomain: true,
    success: function (data) {
      answers.push.apply(answers, data.items);
      answers_hash = [];
      answer_ids = [];
      data.items.forEach(function(a) {
        a.comments = [];
        var id = +a.share_link.match(/\d+/);
        answer_ids.push(id);
        answers_hash[id] = a;
      });
      if (!data.has_more) more_answers = false;
      comment_page = 1;
      getComments();
    }
  });
}

function getComments() {
  jQuery.ajax({
    url: commentUrl(comment_page++, answer_ids),
    method: "get",
    dataType: "jsonp",
    crossDomain: true,
    success: function (data) {
      data.items.forEach(function(c) {
        if (c.owner.user_id === OVERRIDE_USER)
          answers_hash[c.post_id].comments.push(c);
      });
      if (data.has_more) getComments();
      else if (more_answers) getAnswers();
      else process();
    }
  });  
}

getAnswers();

var SCORE_REG = /<h\d>\s*([^\n,]*[^\s,]),.*?(\d+)(?=[^\n\d<>]*(?:<(?:s>[^\n<>]*<\/s>|[^\n<>]+>)[^\n\d<>]*)*<\/h\d>)/;

var OVERRIDE_REG = /^Override\s*header:\s*/i;

function getAuthorName(a) {
  return a.owner.display_name;
}

function process() {
  var valid = [];
  
  answers.forEach(function(a) {
    var body = a.body;
    a.comments.forEach(function(c) {
      if(OVERRIDE_REG.test(c.body))
        body = '<h1>' + c.body.replace(OVERRIDE_REG, '') + '</h1>';
    });
    
    var match = body.match(SCORE_REG);
    if (match)
      valid.push({
        user: getAuthorName(a),
        size: +match[2],
        language: match[1],
        link: a.share_link,
      });
    
  });
  
  valid.sort(function (a, b) {
    var aB = a.size,
        bB = b.size;
    return aB - bB
  });

  var languages = {};
  var place = 1;
  var lastSize = null;
  var lastPlace = 1;
  valid.forEach(function (a) {
    if (a.size != lastSize)
      lastPlace = place;
    lastSize = a.size;
    ++place;
    
    var answer = jQuery("#answer-template").html();
    answer = answer.replace("{{PLACE}}", lastPlace + ".")
                   .replace("{{NAME}}", a.user)
                   .replace("{{LANGUAGE}}", a.language)
                   .replace("{{SIZE}}", a.size)
                   .replace("{{LINK}}", a.link);
    answer = jQuery(answer);
    jQuery("#answers").append(answer);

    var lang = a.language;
    if (/<a/.test(lang)) lang = jQuery(lang).text();
    
    languages[lang] = languages[lang] || {lang: a.language, user: a.user, size: a.size, link: a.link};
  });

  var langs = [];
  for (var lang in languages)
    if (languages.hasOwnProperty(lang))
      langs.push(languages[lang]);

  langs.sort(function (a, b) {
    if (a.lang > b.lang) return 1;
    if (a.lang < b.lang) return -1;
    return 0;
  });

  for (var i = 0; i < langs.length; ++i)
  {
    var language = jQuery("#language-template").html();
    var lang = langs[i];
    language = language.replace("{{LANGUAGE}}", lang.lang)
                       .replace("{{NAME}}", lang.user)
                       .replace("{{SIZE}}", lang.size)
                       .replace("{{LINK}}", lang.link);
    language = jQuery(language);
    jQuery("#languages").append(language);
  }

}
body { text-align: left !important}

#answer-list {
  padding: 10px;
  width: 290px;
  float: left;
}

#language-list {
  padding: 10px;
  width: 290px;
  float: left;
}

table thead {
  font-weight: bold;
}

table td {
  padding: 5px;
}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<link rel="stylesheet" type="text/css" href="//cdn.sstatic.net/codegolf/all.css?v=83c949450c8b">
<div id="answer-list">
  <h2>Leaderboard</h2>
  <table class="answer-list">
    <thead>
      <tr><td></td><td>Author</td><td>Language</td><td>Size</td></tr>
    </thead>
    <tbody id="answers">

    </tbody>
  </table>
</div>
<div id="language-list">
  <h2>Winners by Language</h2>
  <table class="language-list">
    <thead>
      <tr><td>Language</td><td>User</td><td>Score</td></tr>
    </thead>
    <tbody id="languages">

    </tbody>
  </table>
</div>
<table style="display: none">
  <tbody id="answer-template">
    <tr><td>{{PLACE}}</td><td>{{NAME}}</td><td>{{LANGUAGE}}</td><td>{{SIZE}}</td><td><a href="{{LINK}}">Link</a></td></tr>
  </tbody>
</table>
<table style="display: none">
  <tbody id="language-template">
    <tr><td>{{LANGUAGE}}</td><td>{{NAME}}</td><td>{{SIZE}}</td><td><a href="{{LINK}}">Link</a></td></tr>
  </tbody>
</table>

share|improve this question
2  
@GeorgeGibson If that's the case, you might want to add a test case like One fish. Two fish. – would the output be fish or fish.? Also, can the input only have one word? – Sp3000 yesterday
6  
Borderline dupe. – Peter Taylor yesterday
3  
Is that's one word or two (that and s)? – Doorknob yesterday
2  
So, to identify words, we split at spaces, remove non-letters and convert to lowercase? – Dennis yesterday
2  
The sentences A word is a group of alphanumeric, separated by a space. and The only special character that should be included is the hyphen. directly contradict each other. I only read the first one before posting my answer. Also, this part should be included in the rule section, not hidden at the bottom. Test cases that involve digits and/or hyphens would also help. – Dennis yesterday

25 Answers 25

Pyth - 23 bytes

Kc@+GdrzZ)I!tJ.M/KZ{KhJ

Test Suite.

share|improve this answer
1  
The revised rules require preserving digits and hyphens. – Dennis yesterday
    
Please update to comply with the revised rules. – George Gibson 5 hours ago

Jelly, 25 bytes

ṣ⁶f€ØB;”-¤Œl©Qµ®ċЀĠṪịµẋE

Try it online! or verify all test cases.

share|improve this answer

Octave, 115 94 bytes

[a,b,c]=unique(regexp(lower(input('')),'[A-z]*','match'));[~,~,d]=mode(c); try disp(a{d{:}})

Accounts for the case with no most frequent word by using try. In this case it outputs nothing, and "takes a break" until you catch the exception.

Saved 21(!) bytes thanks to Luis Mendo's suggestion (using the third output from mode to get the most common word).


The rules have changed quite a bit since I posted my original answer. I'll look into the regex later.

share|improve this answer
1  
you beat me to it, gonna think for something else now. – Agawa001 2 days ago
    
Apply mode on c maybe? Its third output gives all tied values, if I recall correctly – Luis Mendo yesterday
    
I count 115 bytes. – Cᴏɴᴏʀ O'Bʀɪᴇɴ yesterday
    
I believe this part would cause troubles for the case of a tie: "When there are multiple values occurring equally frequently, mode returns the smallest of those values.". The function will return a single number, thus printing the first word (alphabetical). My solution depends on the multiple results from find. – Stewie Griffin yesterday
1  
@StewieGriffin [~, ~, out] = mode([1 1 2 2 1 2 3 4 5 5]) gives out = {1 2} – Luis Mendo 12 hours ago

Perl 6, 80 bytes

{$_>1&&.[0].value==.[1].value??""!!.[0].key given .lc.words.Bag.sort:{-.value}}

Let's split the answer into two parts...

given .lc.words.Bag.sort:{-.value}

given is a control statement (like if or for). In Perl 6, they're allowed as postfixes. (a if 1, or like here, foo given 3). given puts its topic (right-hand side) into the special variable $_ for its left-hand side.

The "topic" itself lowercases (lc), splits by word (words), puts the values into a Bag (set with number of occurences), then sorts by value (DESC). Since sort only knows how to operate on lists, the Bag is transformed into a List of Pairs here.

$_>1&&.[0].value==.[1].value??""!!.[0].key

a simple conditional (?? !! are used in Perl 6, instead of ? :).

$_ > 1

Just checks that the list has more than one element.

.[0].value==.[1].value

Accesses to $_ can be shortened... By not specifying the variable. .a is exactly like $_.a. So this is effectively "do both top elements have the same number of occurences" – If so, then we print '' (the empty string).

Otherwise, we print the top element's key (the count): .[0].key.

share|improve this answer
6  
It's like half English, half line-noise. Amazing. – cat yesterday
1  
it's funny how it's the OO-style features that look english-y :P – ven yesterday
2  
Also manages to be less readable than Perl 5 while containing more English than Perl 5. D: – cat yesterday
1  
@cat fixed it -- should be totally unreadable now – ven yesterday
4  
value??!! (i know that's a ternary operator, it's just entertaining) – cat yesterday

05AB1E, 30 bytes

Code:

lžj¨„ -«Ãð¡©Ùv®yQOˆ}®¯MQÏDg1Q×

Uses CP-1252 encoding. Try it online!.

share|improve this answer
    
hmm? – TessellatingHeckler 9 hours ago
2  
@TessellatingHeckler It only takes one line of input. Unless you repeatedly use the I command, 05AB1E will only take as much as it needs. – George Gibson 6 hours ago

Pyke, 26 25 bytes

l1dcD}jm/D3Sei/1qIi@j@
(;

Try it here!

Or 23 22 bytes (noncompeting, add node where kills stack if false)

l1cD}jm/D3Sei/1q.Ii@j@

Try it here!

Or with punctuation, 23 bytes (I think this competes? Commit was before the edit)

l1.cD}jm/D3Sei/1q.Ii@j@

Try it here!

share|improve this answer

Pyth, 32 bytes

p?tlJeM.MhZrS@Ls++\-GUTcrz0d8ksJ

Test suite.

share|improve this answer

Ruby, 94 92 102 bytes

Gotta go fast. Returns the word in all uppercase, or nil if there is no most frequent word.

->s{w=s.upcase.tr(?','').scan /\w+/;q=->x{w.count x};(w-[d=w.max_by{|e|q[e]}]).all?{|e|q[e]<q[d]}?d:p}
share|improve this answer
5  
Gotta go fast? – cat yesterday
    
@cat yeah, 'cuz I was FGITW this time – Kevin Lau - not Kenny yesterday
    
It doesn't, because it wasn't in the spec when I made the answer :V – Kevin Lau - not Kenny 8 hours ago

Python 3.5, 142 137 134 112 117 110 127 bytes:

(+17 bytes, because apparently even if there are words more frequent than the rest, but they have the same frequency, nothing should still be returned.)

def g(u):import re;q=re.findall(r"\b['\-\w]+\b",u.lower());Q=q.count;D=[*map(Q,{*q})];return['',max(q,key=Q)][1in map(D.count,D)]

Should now satisfy all conditions. This submission assumes that at least 1 word is input.

Try It Online! (Ideone)

Also, if you want one, here is another version of my function devoid of any regular expressions at the cost of about 43 bytes, though this one is non-competitive anyways, so it does not really matter. I just put it here for the heck of it:

def g(u):import re;q=''.join([i for i in u.lower()if i in[*map(chr,range(97,123)),*"'- "]]).split();Q=q.count;D=[*map(Q,{*q})];return['',max(q,key=Q)][1in map(D.count,D)]

Try this New Version Online! (Ideone)

share|improve this answer
    
From the challenge comments "if there are two words that are more frequent than the rest, but with the same frequency", the output is 'nothing'. – RootTwo yesterday
    
@RootTwo Fixed! :) – R. Kap yesterday
    
@TessellatingHeckler Those are different words though. That's is a contraction for that is whereas thats is not really a word. – R. Kap 8 hours ago
    
@TessellatingHeckler Can you give me some proof of this comment? Because I am going through all the comments on the post and see no such comment. – R. Kap 8 hours ago

R, 115 bytes

function(s)if(sum(z<-(y=table(tolower((x=strsplit(s,"[^\\w']",,T)[[1]])[x>""])))==max(y))<2)names(which(z))else NULL

This is a function that accepts a string and returns a string if a single word appears more often than others and NULL otherwise. To call it, assign it to a variable.

Ungolfed:

f <- function(s) {
    # Create a vector of words by splitting the input on characters other
    # than word characters and apostrophes
    v <- (x <- strsplit(s, "[^\\w']", perl = TRUE))[x > ""]

    # Count the occurrences of each lowercased word
    y <- table(tolower(v))

    # Create a logical vector such that elements of `y` which occur most
    # often are `TRUE` and the rest are fase
    z <- y == max(y)

    # If a single word occurs most often, return it, otherwise `NULL`
    if (sum(z) < 2) {
        names(which(z))
    } else {
        NULL
    }
}
share|improve this answer

PostgreSQL, 246 bytes

WITH z AS(SELECT DISTINCT*,COUNT(*)OVER(PARTITION BY t,m)c FROM i,regexp_split_to_table(translate(lower(t),'.,"''',''),E'\\s+')m)
SELECT t,CASE WHEN COUNT(*)>1 THEN '' ELSE MAX(m)END
FROM z WHERE(t,c)IN(SELECT t,MAX(c)FROM z GROUP BY t)
GROUP BY t  

Output:

enter image description here

Input if anyone is interested:

CREATE TABLE i(t TEXT);

INSERT INTO i(t)
VALUES ('The man walked down the road.'), ('Slowly, he ate the pie, savoring each delicious bite. He felt like he was truly happy.'),
       ('This sentence has no most frequent word.'), ('"That''s... that''s... that is just terrible!" he said. '), ('The old-fashioned man ate an old-fashioned cake.'), 
       ('IPv6 looks great, much better than IPv4, except for the fact that IPv6 has longer addresses.'), ('a   a            a b b b c');


Normally I would use MODE() WITHIN GROUP(...) and it will be much shorter, but it will violate:

If there is no most frequent word (see test case #3), your program should output nothing.

share|improve this answer
    
could not get as low as you, sqlserver doesn't have build in split yet. However the select part is shorter. – t-clausen.dk yesterday

Retina, 97 bytes

The rules keep changing...

T`L`l
[^-\w ]

O`[-\w]+
([-\w]+)( \1\b)*
$#2;$1
O#`[-\w;]+
.*\b(\d+);[-\w]+ \1;[-\w]+$

!`[-\w]+$

Try it online!

Test suite.

share|improve this answer
2  
    
@CᴏɴᴏʀO'Bʀɪᴇɴ Thanks, fixed. – Kenny Lau yesterday
    
And you golfed it 11 bytes ._. impressive – Cᴏɴᴏʀ O'Bʀɪᴇɴ yesterday
    
Also fails for "The old-fashioned man ate an old-fashioned cake." – t-clausen.dk yesterday
    
This doesn't look right either (expecting a to be the most common word there) – TessellatingHeckler 9 hours ago

JavaScript (ES6), 99 bytes

F=s=>(f={},w=c='',s.toLowerCase().replace(/[\w-']+/g,m=>(f[m]=o=++f[m]||1)-c?o>c?(w=m,c=o):0:w=''),w)
#input { width: 100%; }
<textarea id="input" oninput="output.innerHTML=F(this.value)"></textarea>
<div id="output"></div>

share|improve this answer

JavaScript (ES6), 155 bytes

s=>(m=new Map,s.toLowerCase().replace(/[^- 0-9A-Z]/gi,'').split(/\ +/).map(w=>m.set(w,-~m.get(w))),[[a,b],[c,d]]=[...m].sort(([a,b],[c,d])=>d-b),b==d?'':a)

Based on @Blue's Python answer.

share|improve this answer
    
Your regex replace looks like it drops numbers, and will break the IPv6 test case, is that right? – TessellatingHeckler 9 hours ago
    
@TessellatingHeckler The definition of word changed since I originally read the question, but I've updated my answer now. – Neil 4 hours ago

Python, 132 bytes

import collections as C,re
def g(s):(a,i),(b,j)=C.Counter(re.sub('[^\w\s-]','',s.lower()).split()).most_common(2);return[a,''][i==j]

Above code assumes that input has at least two words.

share|improve this answer
    
Got to love that regex, tho. – Blue yesterday

Sqlserver 2008, 250 bytes

DECLARE @ varchar(max) = 'That''s... that''s... that is just terrible!" he said.';

WITH c as(SELECT
@ p,@ x
UNION ALL
SELECT LEFT(x,k-1),STUFF(x,1,k,'')FROM
c CROSS APPLY(SELECT patindex('%[^a-z''-]%',x+'!')k)k
WHERE''<x)SELECT max(p)FROM(SELECT top 1with ties p
FROM c WHERE p>''GROUP BY p
ORDER BY count(*)DESC
)j HAVING count(*)=1

Try it online!

share|improve this answer
    
I don't like variable approach because it is kind of cheating :) One input -> nothing or something, with set-based approach it has to be longer, because you need to add additional GROUP BY, LEFT JOIN, or PARTITION BY Anyway SQL Server has built in SPLIT function. Ungolfed demo feel free to make it as short as possible. – lad2025 yesterday

PHP, 223 bytes

$a=array_count_values(array_map(function($s){return preg_replace('/[^A-Za-z0-9]/','',$s);},explode(' ',strtolower($argv[1]))));arsort($a);$c=count($a);$k=array_keys($a);echo($c>0?($c==1?$k[0]:($a[$k[0]]!=$a[$k[1]]?$k[0]:'')):'');
share|improve this answer

Matlab (222)

a=input('','s');t=@(x)feval(@(y)y(y>32),num2str(lower(x)-0));f=@(x)num2str(nnz(x));e=str2num(regexprep(a,'(\w+)',' ${t($1)} ${f($`)} ${f([$`,$1])}'));c=find(e==mode(e)&e==1/mode(1./e));try disp(a(e(c(1)+1):e(c(1)+2))),end
  • Toolbox is necessary to run this.

  • How does this work, one of the nicest privileges of regex replace in matlab this it field-executes tokens by calling external-environmental functions parameterized by the tokens caught in the inner environment, so any sequence of "Word_A Word_B .." is replaced by integers "A0 A1 A2 B0 B1 B2 ..." where the first integer is the numerica ascii signature of the word, the second is the starting index, the third is the ending index, these last two integers dont reduplicate in the whole sequence so i took this advantage to transpose it to an array, then mode it then search the result in that array, so the starting/ending indices will consequently follow.

  • Edit: and yes, the anomaly of multiple/no most frequent words is dealt with, there is always some tricky way to palliate to all these random rules (i really dont like too much rules but rules are rules + a penalty of 24 bytes) . So as we know that mode function returns the smallest most common number hence ascii-translation of the word, i added another condition to extract the 1/mode(1/all_elements_of_array), the mode returns the max instead of the minimum in that case, inversed it will recover the original number, if it is different than the raw mode function's output then there is multitude of maximums and the last command throws an exception.


20 bytes saved thanks to @StewieGriffin, 30 bytes added reproaches to common-agreed loopholes.

share|improve this answer
    
You'll have my upvote when you (or someone else) show that this actually works, both for inputs that have a most common word, and for inputs that don't. =) (I can't test it, unfortunately) – Stewie Griffin yesterday
    
@StewieGriffin i think the programe misbehaves with sentences with equi-frequence words i will fix that – Agawa001 22 hours ago

Python 2, 218 bytes

Assumes more than 2 words. Getting rid of punctuation destroyed me...

import string as z
def m(s):a=[w.lower()for w in s.translate(z.maketrans('',''),z.punctuation).split()];a=sorted({w:a.count(w)for w in set(a)}.items(),key=lambda b:b[1],reverse=1);return a[0][0]if a[0][1]>a[1][1]else''
share|improve this answer
    
Does this strip ',- etc? – Tim yesterday
    
@Tim No, I did this challenge before the rules were fully fleshed. Will change. – Blue yesterday
    
Can you assign the result of sorted to a tuple rather than having to index into the array manually? – Neil yesterday
    
@Neil you mean just get the first and second items for comparison instead of the entire array? I don't know how to do that – Blue yesterday

Python, 70 bytes

import collections as c
f=lambda x:c.Counter(x.split()).most_common(1)

Takes its input like this:

f("Bird is the word")
share|improve this answer
    
Hi, and welcome to PPCG! We score code-golf challenges by the number of bytes in the answer. I went ahead and edited it for you with the correct information. – Eᴀsᴛᴇʀʟʏ Iʀᴋ 22 hours ago
    
Thanks for that – Wouldn't You Like To Know 20 hours ago
2  
Welcome to PPCG! Unfortunately, your submission does not satisfy all the requirements of this challenge as, first of all, it's NOT case insensitive. For instance, it will NOT count occurrences of the word That as occurrences of the word that since the former begins with an uppercase T and the latter begins with a lowercase t. Also, this does NOT remove all other forms of punctuation except hyphens (-) and, optionally, apostrophes (') and as a result, this would NOT work for the fourth test case given in the question. – R. Kap 18 hours ago
1  
Also, this does NOT output nothing if there is no most frequent word. For instance, using the third test case (This sentence has no most frequent word.) as an example, your function outputs [('This', 1)], when it should instead be outputting nothing. I could go on and on about more issues, so I would recommend fixing them as soon as you can. – R. Kap 18 hours ago

Lua, 232 199 bytes

w,m,o={},0;io.read():lower():gsub("[^-%w%s]",""):gsub("[%w-]+",function(x)if not w[x]then w[x]=0 end w[x]=w[x]+1 end)for k,v in pairs(w)do if m==v then o=''end if(v>m)then m,o=v,k end end io.write(o)
share|improve this answer
    
if not w[x]then w[x]=0 end w[x]=w[x]+1 end -> w[x]=(w[x]or0)+1 – Kenny Lau 1 hour ago

Rexx, 109 bytes

pull s;g.=0;m=0;do i=1 to words(s);w=word(s,i);g.w=g.w+1;if g.w>m;then do;m=g.w;r=w;end;end;if m>1 then say r

Pretty printed...

pull s
g.=0
m=0
do i=1 to words(s)
  w=word(s,i)
  g.w=g.w + 1
  if g.w>m
  then do
    m=g.w
    r=w
  end
end
if m>1 then say r
share|improve this answer

PowerShell (v4), 117 bytes

$y,$z=@($input-replace'[^a-z0-9 \n-]'-split'\s'|group|sort Count)[-2,-1]
($y,($z,'')[$y.Count-eq$z.Count])[!!$z].Name

The first part is easy enough:

  • $input is ~= stdin
  • Regex replace irrelevant characters with nothing, keep newlines so we don't mash two words from the end of a line and the beginning of the next line into one by mistake. (Nobody else has discussed multiple lines, could golf -2 if the input is always a single line).
  • Regex split, Group by frequency (~= Python's collections.Counter), Sort to put most frequent words at the end.
  • PowerShell is case insensitive by default for everything.

Handling if there isn't a most frequent word:

  • Take the last two items [-2,-1] into $y and $z;
  • an N-item list, where N>=2, makes $y and $z the last two items
  • a 1-item list makes $y the last item and $z null
  • an Empty list makes them both null

Use the bool-as-array-index fake-ternary-operator golf (0,1)[truthyvalue], nested, to choose "", $z or $y as output, then take .Name.

PS D:\> "The man walked down the road."|.\test.ps1
The

PS D:\> "Slowly, he ate the pie, savoring each delicious bite. He felt like he was truly happy."|.\test.ps1
he

PS D:\> "`"That's... that's... that is just terrible!`" he said."|.\test.ps1
Thats

PS D:\> "The old-fashioned man ate an old-fashioned cake."|.\test.ps1
old-fashioned

PS D:\> "IPv6 looks great, much better than IPv4, except for the fact that IPv6 has longer addresses."|.\test.ps1
IPv6
share|improve this answer

Perl, 60 bytes

Includes +2 for -an

Also needs the -E option for say, so run like:

perl -anE 's/[^\w-]//g,$q[$a{+lc}++].="$_\n"for@F;say$q[-1]=~/^.+$/g' <<< "The old-fashioned man ate an old-fashioned cake."

Just the code part:

s/[^\w-]//g,$q[$a{+lc}++].="$_\n"for@F;($_)=$q[-1]=~/^.+$/g

Replace \n by a literal newline for the claimed score (or replace "$_\n" by $_.$\)

This may be shortened depending on what the exact definition of a "word" and "special characters" is... Here I took it to mean: "split on whitespace, then remove all characters that are not alphanumeric or hyphen"

share|improve this answer

Perl 5, 96 89 + 2 (-p flag) = 91 bytes

++$h{+lc}for map{/[\w'-]+/g}$_,<>;$m>$e[1]||$e[1]>$m&&(($_,$m)=@e)||($_='')while@e=each%h

Using:

> echo "The man walked down the road." | perl -p script.pl
share|improve this answer
    
Your -p flag should invoke a penalty of 3 bytes. The rules are roughly: Each commandline flag is +1 byte since that is how many extra bytes you need to extend your free -e'code' style commandline. So normally -p is only +1 byte. But here your code has ' so it cannot be run simply from the commandline without escaping. So no combining with -e and the - and the space before the p are extra and must be counted too – Ton Hospel 6 mins ago

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.