What is the most frequent word?

Question

What is the most frequent word?

Given a text file, your program must trawl through it, counting the frequencies of each word, then output the most used word. Because a text file has no fixed length, and so can get very long, your code must be as short as possible.

Rules/Requirements

Each submission should be either a full program or function. If it is a function, it must be runnable by only needing to add the function call to the bottom of the program. Anything else (e.g. headers in C), must be included.
There must be a free interpreter/compiler available for your language.
If it is possible, provide a link to a site where your program can be tested.
Your program must not write anything to STDERR.
Your program should take input from a text file or STDIN (or the closest alternative in your language).
Standard loopholes are forbidden.
Your program must be case-insensitive (tHe, The and the all contribute to the count of the).
If there is no most frequent word (see test case #3), your program should output nothing.
Your program is allowed to strip special characters. The test cases assume that you do.
A word is a group of alphanumeric characters and hyphens, separated from other words with a space.

Test Cases

The man walked down the road.
==> the

-----

Slowly, he ate the pie, savoring each delicious bite. He felt like he was truly happy.
==> he

-----

This sentence has no most frequent word.
==> 

-----

"That's... that's... that is just terrible!" he said.
==> thats / that's

-----

The old-fashioned man ate an old-fashioned cake.
==> old-fashioned

-----

IPv6 looks great, much better than IPv4, except for the fact that IPv6 has longer addresses.
==> IPv6

(The third test case has no output, you may choose either output on the forth)

Scoring

Programs are scored according to bytes. The usual character set is UTF-8, if you are using another please specify.

When the challenge finishes, the program with the least bytes (it's called code-golf), will win.

Submissions

To make sure that your answer shows up, please start your answer with a headline, using the following Markdown template:

# Language Name, N bytes

where N is the size of your submission. If you improve your score, you can keep old scores in the headline, by striking them through. For instance:

# Ruby, <s>104</s> <s>101</s> 96 bytes

If there you want to include multiple numbers in your header (e.g. because your score is the sum of two files or you want to list interpreter flag penalties separately), make sure that the actual score is the last number in the header:

# Perl, 43 + 2 (-p flag) = 45 bytes

You can also make the language name a link which will then show up in the leaderboard snippet:

# [><>](http://esolangs.org/wiki/Fish), 121 bytes

Leaderboard

Here is a Stack Snippet to generate both a regular leaderboard and an overview of winners by language.

/* Configuration */

var QUESTION_ID = 79576; // Obtain this from the url
// It will be like https://XYZ.stackexchange.com/questions/QUESTION_ID/... on any question page
var ANSWER_FILTER = "!t)IWYnsLAZle2tQ3KqrVveCRJfxcRLe";
var COMMENT_FILTER = "!)Q2B_A2kjfAiU78X(md6BoYk";
var OVERRIDE_USER = 53406; // This should be the user ID of the challenge author.

/* App */

var answers = [], answers_hash, answer_ids, answer_page = 1, more_answers = true, comment_page;

function answersUrl(index) {
  return "https://api.stackexchange.com/2.2/questions/" +  QUESTION_ID + "/answers?page=" + index + "&pagesize=100&order=desc&sort=creation&site=codegolf&filter=" + ANSWER_FILTER;
}

function commentUrl(index, answers) {
  return "https://api.stackexchange.com/2.2/answers/" + answers.join(';') + "/comments?page=" + index + "&pagesize=100&order=desc&sort=creation&site=codegolf&filter=" + COMMENT_FILTER;
}

function getAnswers() {
  jQuery.ajax({
    url: answersUrl(answer_page++),
    method: "get",
    dataType: "jsonp",
    crossDomain: true,
    success: function (data) {
      answers.push.apply(answers, data.items);
      answers_hash = [];
      answer_ids = [];
      data.items.forEach(function(a) {
        a.comments = [];
        var id = +a.share_link.match(/\d+/);
        answer_ids.push(id);
        answers_hash[id] = a;
      });
      if (!data.has_more) more_answers = false;
      comment_page = 1;
      getComments();
    }
  });
}

function getComments() {
  jQuery.ajax({
    url: commentUrl(comment_page++, answer_ids),
    method: "get",
    dataType: "jsonp",
    crossDomain: true,
    success: function (data) {
      data.items.forEach(function(c) {
        if (c.owner.user_id === OVERRIDE_USER)
          answers_hash[c.post_id].comments.push(c);
      });
      if (data.has_more) getComments();
      else if (more_answers) getAnswers();
      else process();
    }
  });  
}

getAnswers();

var SCORE_REG = /<h\d>\s*([^\n,]*[^\s,]),.*?(\d+)(?=[^\n\d<>]*(?:<(?:s>[^\n<>]*<\/s>|[^\n<>]+>)[^\n\d<>]*)*<\/h\d>)/;

var OVERRIDE_REG = /^Override\s*header:\s*/i;

function getAuthorName(a) {
  return a.owner.display_name;
}

function process() {
  var valid = [];
  
  answers.forEach(function(a) {
    var body = a.body;
    a.comments.forEach(function(c) {
      if(OVERRIDE_REG.test(c.body))
        body = '<h1>' + c.body.replace(OVERRIDE_REG, '') + '</h1>';
    });
    
    var match = body.match(SCORE_REG);
    if (match)
      valid.push({
        user: getAuthorName(a),
        size: +match[2],
        language: match[1],
        link: a.share_link,
      });
    
  });
  
  valid.sort(function (a, b) {
    var aB = a.size,
        bB = b.size;
    return aB - bB
  });

  var languages = {};
  var place = 1;
  var lastSize = null;
  var lastPlace = 1;
  valid.forEach(function (a) {
    if (a.size != lastSize)
      lastPlace = place;
    lastSize = a.size;
    ++place;
    
    var answer = jQuery("#answer-template").html();
    answer = answer.replace("{{PLACE}}", lastPlace + ".")
                   .replace("{{NAME}}", a.user)
                   .replace("{{LANGUAGE}}", a.language)
                   .replace("{{SIZE}}", a.size)
                   .replace("{{LINK}}", a.link);
    answer = jQuery(answer);
    jQuery("#answers").append(answer);

    var lang = a.language;
    if (/<a/.test(lang)) lang = jQuery(lang).text();
    
    languages[lang] = languages[lang] || {lang: a.language, user: a.user, size: a.size, link: a.link};
  });

  var langs = [];
  for (var lang in languages)
    if (languages.hasOwnProperty(lang))
      langs.push(languages[lang]);

  langs.sort(function (a, b) {
    if (a.lang > b.lang) return 1;
    if (a.lang < b.lang) return -1;
    return 0;
  });

  for (var i = 0; i < langs.length; ++i)
  {
    var language = jQuery("#language-template").html();
    var lang = langs[i];
    language = language.replace("{{LANGUAGE}}", lang.lang)
                       .replace("{{NAME}}", lang.user)
                       .replace("{{SIZE}}", lang.size)
                       .replace("{{LINK}}", lang.link);
    language = jQuery(language);
    jQuery("#languages").append(language);
  }

}

body { text-align: left !important}

#answer-list {
  padding: 10px;
  width: 290px;
  float: left;
}

#language-list {
  padding: 10px;
  width: 290px;
  float: left;
}

table thead {
  font-weight: bold;
}

table td {
  padding: 5px;
}

<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<link rel="stylesheet" type="text/css" href="//cdn.sstatic.net/codegolf/all.css?v=83c949450c8b">
<div id="answer-list">
  <h2>Leaderboard</h2>
  <table class="answer-list">
    <thead>
      <tr><td></td><td>Author</td><td>Language</td><td>Size</td></tr>
    </thead>
    <tbody id="answers">

    </tbody>
  </table>
</div>
<div id="language-list">
  <h2>Winners by Language</h2>
  <table class="language-list">
    <thead>
      <tr><td>Language</td><td>User</td><td>Score</td></tr>
    </thead>
    <tbody id="languages">

    </tbody>
  </table>
</div>
<table style="display: none">
  <tbody id="answer-template">
    <tr><td>{{PLACE}}</td><td>{{NAME}}</td><td>{{LANGUAGE}}</td><td>{{SIZE}}</td><td><a href="{{LINK}}">Link</a></td></tr>
  </tbody>
</table>
<table style="display: none">
  <tbody id="language-template">
    <tr><td>{{LANGUAGE}}</td><td>{{NAME}}</td><td>{{SIZE}}</td><td><a href="{{LINK}}">Link</a></td></tr>
  </tbody>
</table>

@GeorgeGibson If that's the case, you might want to add a test case like One fish. Two fish. – would the output be fish or fish.? Also, can the input only have one word? — Sp3000, yesterday
So, to identify words, we split at spaces, remove non-letters and convert to lowercase? — Dennis♦, yesterday
The sentences A word is a group of alphanumeric, separated by a space. and The only special character that should be included is the hyphen. directly contradict each other. I only read the first one before posting my answer. Also, this part should be included in the rule section, not hidden at the bottom. Test cases that involve digits and/or hyphens would also help. — Dennis♦, yesterday

Maltysen · Answer 1 · 2016-05-08 08:29:21Z

up vote 11 down vote

Pyth - 23 bytes

Kc@+GdrzZ)I!tJ.M/KZ{KhJ

Test Suite.

answered 2 days ago

Maltysen

12k12776

1

The revised rules require preserving digits and hyphens. – Dennis♦ yesterday

Please update to comply with the revised rules. – George Gibson 5 hours ago

add a comment |

Dennis · Answer 2 · 2016-05-09 06:27:50Z

up vote 10 down vote

Jelly, 25 bytes

ṣ⁶f€ØB;”-¤Œl©Qµ®ċÐ€ĠṪịµẋE

Try it online! or verify all test cases.

edited yesterday

answered yesterday

Dennis♦

75.9k13130329

add a comment |

Stewie Griffin · Answer 3 · 2016-05-10 05:13:13Z

up vote 6 down vote

Octave, 115 94 bytes

[a,b,c]=unique(regexp(lower(input('')),'[A-z]*','match'));[~,~,d]=mode(c); try disp(a{d{:}})

Accounts for the case with no most frequent word by using try. In this case it outputs nothing, and "takes a break" until you catch the exception.

Saved 21(!) bytes thanks to Luis Mendo's suggestion (using the third output from mode to get the most common word).

_{The rules have changed quite a bit since I posted my original answer. I'll look into the regex later.}

edited 6 hours ago

answered 2 days ago

Stewie Griffin

5,2921870

1

you beat me to it, gonna think for something else now. – Agawa001 2 days ago

Apply mode on c maybe? Its third output gives all tied values, if I recall correctly – Luis Mendo yesterday

I count 115 bytes. – Cᴏɴᴏʀ O'Bʀɪᴇɴ yesterday

I believe this part would cause troubles for the case of a tie: "When there are multiple values occurring equally frequently, mode returns the smallest of those values.". The function will return a single number, thus printing the first word (alphabetical). My solution depends on the multiple results from find. – Stewie Griffin yesterday

1

@StewieGriffin [~, ~, out] = mode([1 1 2 2 1 2 3 4 5 5]) gives out = {1 2} – Luis Mendo 12 hours ago

| show 2 more comments

ven · Answer 4 · 2016-05-10 11:29:04Z

up vote 5 down vote

Perl 6, 80 bytes

{$_>1&&.[0].value==.[1].value??""!!.[0].key given .lc.words.Bag.sort:{-.value}}

Let's split the answer into two parts...

given .lc.words.Bag.sort:{-.value}

given is a control statement (like if or for). In Perl 6, they're allowed as postfixes. (a if 1, or like here, foo given 3). given puts its topic (right-hand side) into the special variable $_ for its left-hand side.

The "topic" itself lowercases (lc), splits by word (words), puts the values into a Bag (set with number of occurences), then sorts by value (DESC). Since sort only knows how to operate on lists, the Bag is transformed into a List of Pairs here.

$_>1&&.[0].value==.[1].value??""!!.[0].key

a simple conditional (?? !! are used in Perl 6, instead of ? :).

$_ > 1

Just checks that the list has more than one element.

.[0].value==.[1].value

Accesses to $_ can be shortened... By not specifying the variable. .a is exactly like $_.a. So this is effectively "do both top elements have the same number of occurences" – If so, then we print '' (the empty string).

Otherwise, we print the top element's key (the count): .[0].key.

edited 37 mins ago

answered yesterday

ven

61237

6

It's like half English, half line-noise. Amazing. – cat yesterday

1

it's funny how it's the OO-style features that look english-y :P – ven yesterday

2

Also manages to be less readable than Perl 5 while containing more English than Perl 5. D: – cat yesterday

1

@cat fixed it -- should be totally unreadable now – ven yesterday

4

value??!! (i know that's a ternary operator, it's just entertaining) – cat yesterday

add a comment |

George Gibson · Answer 5 · 2016-05-09 15:14:42Z

up vote 4 down vote

05AB1E, 30 bytes

Code:

lžj¨„ -«Ãð¡©Ùv®yQOˆ}®¯MQÏDg1Q×

Uses CP-1252 encoding. Try it online!.

edited 20 hours ago

George Gibson

384114

answered 2 days ago

Adnan

11.7k128113

hmm? – TessellatingHeckler 9 hours ago

2

@TessellatingHeckler It only takes one line of input. Unless you repeatedly use the I command, 05AB1E will only take as much as it needs. – George Gibson 6 hours ago

add a comment |

muddyfish · Answer 6 · 2016-05-08 21:15:43Z

up vote 3 down vote

Pyke, 26 25 bytes

l1dcD}jm/D3Sei/1qIi@j@
(;

Try it here!

Or 23 22 bytes (noncompeting, add node where kills stack if false)

l1cD}jm/D3Sei/1q.Ii@j@

Try it here!

Or with punctuation, 23 bytes (I think this competes? Commit was before the edit)

l1.cD}jm/D3Sei/1q.Ii@j@

Try it here!

edited yesterday

answered yesterday

muddyfish

6,66531351

add a comment |

Kenny Lau · Answer 7 · 2016-05-09 10:41:03Z

up vote 3 down vote

Pyth, 32 bytes

p?tlJeM.MhZrS@Ls++\-GUTcrz0d8ksJ

Test suite.

edited yesterday

answered 2 days ago

Kenny Lau

3,265432

add a comment |

Kevin Lau - not Kenny · Answer 8 · 2016-05-10 03:37:25Z

up vote 3 down vote

Ruby, 94 92 102 bytes

Gotta go fast. Returns the word in all uppercase, or nil if there is no most frequent word.

->s{w=s.upcase.tr(?','').scan /\w+/;q=->x{w.count x};(w-[d=w.max_by{|e|q[e]}]).all?{|e|q[e]<q[d]}?d:p}

edited 8 hours ago

answered 2 days ago

Kevin Lau - not Kenny

1,07119

5

Gotta go fast? – cat yesterday

@cat yeah, 'cuz I was FGITW this time – Kevin Lau - not Kenny yesterday

It doesn't, because it wasn't in the spec when I made the answer :V – Kevin Lau - not Kenny 8 hours ago

add a comment |

Trang Oul · Answer 9 · 2016-05-10 11:32:29Z

up vote 3 down vote

Python 3.5, 142 137 134 112 117 110 127 bytes:

(+17 bytes, because apparently even if there are words more frequent than the rest, but they have the same frequency, nothing should still be returned.)

def g(u):import re;q=re.findall(r"\b['\-\w]+\b",u.lower());Q=q.count;D=[*map(Q,{*q})];return['',max(q,key=Q)][1in map(D.count,D)]

Should now satisfy all conditions. This submission assumes that at least 1 word is input.

Try It Online! (Ideone)

Also, if you want one, here is another version of my function devoid of any regular expressions at the cost of about 43 bytes, though this one is non-competitive anyways, so it does not really matter. I just put it here for the heck of it:

def g(u):import re;q=''.join([i for i in u.lower()if i in[*map(chr,range(97,123)),*"'- "]]).split();Q=q.count;D=[*map(Q,{*q})];return['',max(q,key=Q)][1in map(D.count,D)]

Try this New Version Online! (Ideone)

edited 34 mins ago

Trang Oul

523116

answered yesterday

R. Kap

480212

From the challenge comments "if there are two words that are more frequent than the rest, but with the same frequency", the output is 'nothing'. – RootTwo yesterday

@RootTwo Fixed! :) – R. Kap yesterday

@TessellatingHeckler Those are different words though. That's is a contraction for that is whereas thats is not really a word. – R. Kap 8 hours ago

@TessellatingHeckler Can you give me some proof of this comment? Because I am going through all the comments on the post and see no such comment. – R. Kap 8 hours ago

add a comment |

Alex A. · Answer 10 · 2016-05-08 20:52:33Z

R, 115 bytes

function(s)if(sum(z<-(y=table(tolower((x=strsplit(s,"[^\\w']",,T)[[1]])[x>""])))==max(y))<2)names(which(z))else NULL

This is a function that accepts a string and returns a string if a single word appears more often than others and NULL otherwise. To call it, assign it to a variable.

Ungolfed:

f <- function(s) {
    # Create a vector of words by splitting the input on characters other
    # than word characters and apostrophes
    v <- (x <- strsplit(s, "[^\\w']", perl = TRUE))[x > ""]

    # Count the occurrences of each lowercased word
    y <- table(tolower(v))

    # Create a logical vector such that elements of `y` which occur most
    # often are `TRUE` and the rest are fase
    z <- y == max(y)

    # If a single word occurs most often, return it, otherwise `NULL`
    if (sum(z) < 2) {
        names(which(z))
    } else {
        NULL
    }
}

lad2025 · Answer 11 · 2016-05-09 10:10:39Z

PostgreSQL, 246 bytes

WITH z AS(SELECT DISTINCT*,COUNT(*)OVER(PARTITION BY t,m)c FROM i,regexp_split_to_table(translate(lower(t),'.,"''',''),E'\\s+')m)
SELECT t,CASE WHEN COUNT(*)>1 THEN '' ELSE MAX(m)END
FROM z WHERE(t,c)IN(SELECT t,MAX(c)FROM z GROUP BY t)
GROUP BY t

Output:

Input if anyone is interested:

CREATE TABLE i(t TEXT);

INSERT INTO i(t)
VALUES ('The man walked down the road.'), ('Slowly, he ate the pie, savoring each delicious bite. He felt like he was truly happy.'),
       ('This sentence has no most frequent word.'), ('"That''s... that''s... that is just terrible!" he said. '), ('The old-fashioned man ate an old-fashioned cake.'), 
       ('IPv6 looks great, much better than IPv4, except for the fact that IPv6 has longer addresses.'), ('a   a            a b b b c');

Normally I would use MODE() WITHIN GROUP(...) and it will be much shorter, but it will violate:

If there is no most frequent word (see test case #3), your program should output nothing.

could not get as low as you, sqlserver doesn't have build in split yet. However the select part is shorter. — t-clausen.dk, yesterday

Kenny Lau · Answer 12 · 2016-05-09 10:36:24Z

up vote 2 down vote

Retina, 97 bytes

The rules keep changing...

T`L`l
[^-\w ]

O`[-\w]+
([-\w]+)( \1\b)*
$#2;$1
O#`[-\w;]+
.*\b(\d+);[-\w]+ \1;[-\w]+$

!`[-\w]+$

Try it online!

Test suite.

edited yesterday

answered 2 days ago

Kenny Lau

3,265432

2

Fails for this input. – Cᴏɴᴏʀ O'Bʀɪᴇɴ yesterday

@CᴏɴᴏʀO'Bʀɪᴇɴ Thanks, fixed. – Kenny Lau yesterday

And you golfed it 11 bytes ._. impressive – Cᴏɴᴏʀ O'Bʀɪᴇɴ yesterday

Also fails for "The old-fashioned man ate an old-fashioned cake." – t-clausen.dk yesterday

This doesn't look right either (expecting a to be the most common word there) – TessellatingHeckler 9 hours ago

| show 1 more comment

George Reith · Answer 13 · 2016-05-09 15:18:27Z

up vote 2 down vote

JavaScript (ES6), 99 bytes

F=s=>(f={},w=c='',s.toLowerCase().replace(/[\w-']+/g,m=>(f[m]=o=++f[m]||1)-c?o>c?(w=m,c=o):0:w=''),w)

#input { width: 100%; }

<textarea id="input" oninput="output.innerHTML=F(this.value)"></textarea>
<div id="output"></div>

edited 20 hours ago

answered yesterday

George Reith

1,636411

add a comment |

Neil · Answer 14 · 2016-05-10 07:59:53Z

up vote 2 down vote

JavaScript (ES6), 155 bytes

s=>(m=new Map,s.toLowerCase().replace(/[^- 0-9A-Z]/gi,'').split(/\ +/).map(w=>m.set(w,-~m.get(w))),[[a,b],[c,d]]=[...m].sort(([a,b],[c,d])=>d-b),b==d?'':a)

Based on @Blue's Python answer.

edited 4 hours ago

answered yesterday

Neil

8,397632

Your regex replace looks like it drops numbers, and will break the IPv6 test case, is that right? – TessellatingHeckler 9 hours ago

@TessellatingHeckler The definition of word changed since I originally read the question, but I've updated my answer now. – Neil 4 hours ago

add a comment |

Trang Oul · Answer 15 · 2016-05-10 11:10:14Z

up vote 2 down vote

Python, 132 bytes

import collections as C,re
def g(s):(a,i),(b,j)=C.Counter(re.sub('[^\w\s-]','',s.lower()).split()).most_common(2);return[a,''][i==j]

Above code assumes that input has at least two words.

edited 56 mins ago

Trang Oul

523116

answered yesterday

RootTwo

1211

Got to love that regex, tho. – Blue yesterday

add a comment |

t-clausen.dk · Answer 16 · 2016-05-09 11:35:32Z

up vote 1 down vote

Sqlserver 2008, 250 bytes

DECLARE @ varchar(max) = 'That''s... that''s... that is just terrible!" he said.';

WITH c as(SELECT
@ p,@ x
UNION ALL
SELECT LEFT(x,k-1),STUFF(x,1,k,'')FROM
c CROSS APPLY(SELECT patindex('%[^a-z''-]%',x+'!')k)k
WHERE''<x)SELECT max(p)FROM(SELECT top 1with ties p
FROM c WHERE p>''GROUP BY p
ORDER BY count(*)DESC
)j HAVING count(*)=1

Try it online!

edited yesterday

answered yesterday

t-clausen.dk

2615

I don't like variable approach because it is kind of cheating :) One input -> nothing or something, with set-based approach it has to be longer, because you need to add additional GROUP BY, LEFT JOIN, or PARTITION BY Anyway SQL Server has built in SPLIT function. Ungolfed demo feel free to make it as short as possible. – lad2025 yesterday

add a comment |

MonkeyZeus · Answer 17 · 2016-05-09 13:02:16Z

PHP, 223 bytes

$a=array_count_values(array_map(function($s){return preg_replace('/[^A-Za-z0-9]/','',$s);},explode(' ',strtolower($argv[1]))));arsort($a);$c=count($a);$k=array_keys($a);echo($c>0?($c==1?$k[0]:($a[$k[0]]!=$a[$k[1]]?$k[0]:'')):'');

Agawa001 · Answer 18 · 2016-05-09 14:30:34Z

Matlab (222)

a=input('','s');t=@(x)feval(@(y)y(y>32),num2str(lower(x)-0));f=@(x)num2str(nnz(x));e=str2num(regexprep(a,'(\w+)',' ${t($1)} ${f($`)} ${f([$`,$1])}'));c=find(e==mode(e)&e==1/mode(1./e));try disp(a(e(c(1)+1):e(c(1)+2))),end

Toolbox is necessary to run this.
How does this work, one of the nicest privileges of regex replace in matlab this it field-executes tokens by calling external-environmental functions parameterized by the tokens caught in the inner environment, so any sequence of "Word_A Word_B .." is replaced by integers "A0 A1 A2 B0 B1 B2 ..." where the first integer is the numerica ascii signature of the word, the second is the starting index, the third is the ending index, these last two integers dont reduplicate in the whole sequence so i took this advantage to transpose it to an array, then mode it then search the result in that array, so the starting/ending indices will consequently follow.
Edit: and yes, the anomaly of multiple/no most frequent words is dealt with, there is always some tricky way to palliate to all these random rules (i really dont like too much rules but rules are rules + a penalty of 24 bytes) . So as we know that mode function returns the smallest most common number hence ascii-translation of the word, i added another condition to extract the 1/mode(1/all_elements_of_array), the mode returns the max instead of the minimum in that case, inversed it will recover the original number, if it is different than the raw mode function's output then there is multitude of maximums and the last command throws an exception.

20 bytes saved thanks to @StewieGriffin, 30 bytes added reproaches to common-agreed loopholes.

You'll have my upvote when you (or someone else) show that this actually works, both for inputs that have a most common word, and for inputs that don't. =) (I can't test it, unfortunately) — Stewie Griffin, yesterday
@StewieGriffin i think the programe misbehaves with sentences with equi-frequence words i will fix that — Agawa001, 22 hours ago

Trang Oul · Answer 19 · 2016-05-10 11:49:54Z

up vote 1 down vote

Python 2, 218 bytes

Assumes more than 2 words. Getting rid of punctuation destroyed me...

import string as z
def m(s):a=[w.lower()for w in s.translate(z.maketrans('',''),z.punctuation).split()];a=sorted({w:a.count(w)for w in set(a)}.items(),key=lambda b:b[1],reverse=1);return a[0][0]if a[0][1]>a[1][1]else''

edited 17 mins ago

Trang Oul

523116

answered 2 days ago

Blue

32115

Does this strip ',- etc? – Tim yesterday

@Tim No, I did this challenge before the rules were fully fleshed. Will change. – Blue yesterday

Can you assign the result of sorted to a tuple rather than having to index into the array manually? – Neil yesterday

@Neil you mean just get the first and second items for comparison instead of the entire array? I don't know how to do that – Blue yesterday

add a comment |

Eᴀsᴛᴇʀʟʏ Iʀᴋ · Answer 20 · 2016-05-09 13:57:08Z

up vote 0 down vote

Python, 70 bytes

import collections as c
f=lambda x:c.Counter(x.split()).most_common(1)

Takes its input like this:

f("Bird is the word")

edited 22 hours ago

Eᴀsᴛᴇʀʟʏ Iʀᴋ

1,5401735

answered 23 hours ago

Wouldn't You Like To Know

92

Hi, and welcome to PPCG! We score code-golf challenges by the number of bytes in the answer. I went ahead and edited it for you with the correct information. – Eᴀsᴛᴇʀʟʏ Iʀᴋ 22 hours ago

Thanks for that – Wouldn't You Like To Know 20 hours ago

2

Welcome to PPCG! Unfortunately, your submission does not satisfy all the requirements of this challenge as, first of all, it's NOT case insensitive. For instance, it will NOT count occurrences of the word That as occurrences of the word that since the former begins with an uppercase T and the latter begins with a lowercase t. Also, this does NOT remove all other forms of punctuation except hyphens (-) and, optionally, apostrophes (') and as a result, this would NOT work for the fourth test case given in the question. – R. Kap 18 hours ago

1

Also, this does NOT output nothing if there is no most frequent word. For instance, using the third test case (This sentence has no most frequent word.) as an example, your function outputs [('This', 1)], when it should instead be outputting nothing. I could go on and on about more issues, so I would recommend fixing them as soon as you can. – R. Kap 18 hours ago

add a comment |

TheBlab · Answer 21 · 2016-05-09 17:56:19Z

up vote 0 down vote

Lua, 232 199 bytes

w,m,o={},0;io.read():lower():gsub("[^-%w%s]",""):gsub("[%w-]+",function(x)if not w[x]then w[x]=0 end w[x]=w[x]+1 end)for k,v in pairs(w)do if m==v then o=''end if(v>m)then m,o=v,k end end io.write(o)

edited 18 hours ago

answered 18 hours ago

TheBlab

212

if not w[x]then w[x]=0 end w[x]=w[x]+1 end -> w[x]=(w[x]or0)+1 – Kenny Lau 1 hour ago

add a comment |

aja · Answer 22 · 2016-05-09 22:05:25Z

Rexx, 109 bytes

pull s;g.=0;m=0;do i=1 to words(s);w=word(s,i);g.w=g.w+1;if g.w>m;then do;m=g.w;r=w;end;end;if m>1 then say r

Pretty printed...

pull s
g.=0
m=0
do i=1 to words(s)
  w=word(s,i)
  g.w=g.w + 1
  if g.w>m
  then do
    m=g.w
    r=w
  end
end
if m>1 then say r

TessellatingHeckler · Answer 23 · 2016-05-10 02:29:55Z

PowerShell (v4), 117 bytes

$y,$z=@($input-replace'[^a-z0-9 \n-]'-split'\s'|group|sort Count)[-2,-1]
($y,($z,'')[$y.Count-eq$z.Count])[!!$z].Name

The first part is easy enough:

$input is ~= stdin
Regex replace irrelevant characters with nothing, keep newlines so we don't mash two words from the end of a line and the beginning of the next line into one by mistake. (Nobody else has discussed multiple lines, could golf -2 if the input is always a single line).
Regex split, Group by frequency (~= Python's collections.Counter), Sort to put most frequent words at the end.
PowerShell is case insensitive by default for everything.

Handling if there isn't a most frequent word:

Take the last two items [-2,-1] into $y and $z;
an N-item list, where N>=2, makes $y and $z the last two items
a 1-item list makes $y the last item and $z null
an Empty list makes them both null

Use the bool-as-array-index fake-ternary-operator golf (0,1)[truthyvalue], nested, to choose "", $z or $y as output, then take .Name.

PS D:\> "The man walked down the road."|.\test.ps1
The

PS D:\> "Slowly, he ate the pie, savoring each delicious bite. He felt like he was truly happy."|.\test.ps1
he

PS D:\> "`"That's... that's... that is just terrible!`" he said."|.\test.ps1
Thats

PS D:\> "The old-fashioned man ate an old-fashioned cake."|.\test.ps1
old-fashioned

PS D:\> "IPv6 looks great, much better than IPv4, except for the fact that IPv6 has longer addresses."|.\test.ps1
IPv6

Ton Hospel · Answer 24 · 2016-05-10 10:31:45Z

Perl, 60 bytes

Includes +2 for -an

Also needs the -E option for say, so run like:

perl -anE 's/[^\w-]//g,$q[$a{+lc}++].="$_\n"for@F;say$q[-1]=~/^.+$/g' <<< "The old-fashioned man ate an old-fashioned cake."

Just the code part:

s/[^\w-]//g,$q[$a{+lc}++].="$_\n"for@F;($_)=$q[-1]=~/^.+$/g

Replace \n by a literal newline for the claimed score (or replace "$_\n" by $_.$\)

This may be shortened depending on what the exact definition of a "word" and "special characters" is... Here I took it to mean: "split on whitespace, then remove all characters that are not alphanumeric or hyphen"

Denis Ibaev · Answer 25 · 2016-05-10 11:09:09Z

up vote 0 down vote

Perl 5, 96 89 + 2 (`-p` flag) = 91 bytes

++$h{+lc}for map{/[\w'-]+/g}$_,<>;$m>$e[1]||$e[1]>$m&&(($_,$m)=@e)||($_='')while@e=each%h

Using:

> echo "The man walked down the road." | perl -p script.pl

edited 57 mins ago

answered 1 hour ago

Denis Ibaev

1512

Your -p flag should invoke a penalty of 3 bytes. The rules are roughly: Each commandline flag is +1 byte since that is how many extra bytes you need to extend your free -e'code' style commandline. So normally -p is only +1 byte. But here your code has ' so it cannot be run simply from the commandline without escaping. So no combining with -e and the - and the space before the p are extra and must be counted too – Ton Hospel 6 mins ago

add a comment |

asked	2 days ago
viewed	3492 times
active	today

current community

your communities

more stack exchange communities

What is the most frequent word?

What is the most frequent word?

Rules/Requirements

Test Cases

Scoring

Submissions

Leaderboard

25 Answers 25

Pyth - 23 bytes

Jelly, 25 bytes

Octave, 115 94 bytes

Perl 6, 80 bytes

05AB1E, 30 bytes

Pyke, 26 25 bytes

Pyth, 32 bytes

Ruby, 94 92 102 bytes

Python 3.5, 142 137 134 112 117 110 127 bytes:

R, 115 bytes

PostgreSQL, 246 bytes

Retina, 97 bytes

JavaScript (ES6), 99 bytes

JavaScript (ES6), 155 bytes

Python, 132 bytes

Sqlserver 2008, 250 bytes

PHP, 223 bytes

Matlab (222)

Python 2, 218 bytes

Python, 70 bytes

Lua, 232 199 bytes

Rexx, 109 bytes

PowerShell (v4), 117 bytes

Perl, 60 bytes

Perl 5, 96 89 + 2 (-p flag) = 91 bytes

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged code-golf word or ask your own question.

Visit Chat

Linked

Related

Hot Network Questions

Perl 5, 96 89 + 2 (`-p` flag) = 91 bytes