Problem : Write a Java program to print the duplicate words from a given statement e.g. if given String is "Java and JavaScript are totally different, JavaScript follows Java" then your program should print "Java" and "JavaScript" because those two are 2 duplicate words from given String. You need to consider all cases e.g. given String can be null, empty, may or may not contain any duplicate words, but for simplicity you can assume that sentence will always in English and only use ASCII characters, alphabets and numerals, no special character. If you are practicing these coding problems for interview, I also suggest you to take a look at Cracking the Coding Interview book. It contains 150 Programming Questions and their Solutions, which is good enough to clear most of beginner and intermediate programming job interviews.
Solution : In order to find duplicate words, we first need to divide sentence into words. For that, you can split the String on space using greedy regular expression, so that it can handle multiple white space between words. You can use split() method of java.lang.String class to do that, this method returns an array of words. Once we list of words, we can insert them into HashSet. Since HashSet doesn't allow duplicate and its add() method return false if an object is already exists in HashSet, we can find all duplicate words. Just loop over array, insert them into HashSet using add() method, check output of add() method. If add() returns false then its a duplicate, print that word to console. This is also one of the top 20 String based problems from interviews. You can see that article to more coding problems based upon String.
One of the follow-up question of this is how do you find number of times each duplicate word has appeared in sentence? For example, in our coding problem, your solution should also print count of both Java and JavaScript e.g. Java : 2 and JavaScript : 2 because they have appeared twice in sentence.
You can solve this problem by choosing another hash based data structure like hash table, which maintains key value pair. Java provides several implementation of hash table data structure e.g. HashMap, Hashtable and ConcurrentHashMap, but for general purpose HashMap is good enough. In short, just use HashMap instead of HashSet to keep count of duplicate words in sentence. This is also similar to problem of finding duplicate characters in String. Instead of character, you need to find duplicate words, as shown here.
Another follow-up question related to this problem is how do you remove duplicate words from String in Java? Which is actually the same problem of removing duplicate elements from array. If you know how to solve that, you can easily solve this one as well. If you face any problem, see this solution.
From the output its clear that our program is working as expected, It right print that "two" is the only duplicate word in given String. Nonetheless we are going to write some unit test to further test our solution for different input values.
That's all about how to find duplicate words in a given String in Java. We have used HashSet data structure to solve this problem and our solution has time and space complexity of O(n). For curious developer, can you come up with a solution with better time and space complexity? How about a solution with time complexity in order of O(k) where k is duplicate words? or O(logN)?
Recommended books for Coding Interviews
Solution : In order to find duplicate words, we first need to divide sentence into words. For that, you can split the String on space using greedy regular expression, so that it can handle multiple white space between words. You can use split() method of java.lang.String class to do that, this method returns an array of words. Once we list of words, we can insert them into HashSet. Since HashSet doesn't allow duplicate and its add() method return false if an object is already exists in HashSet, we can find all duplicate words. Just loop over array, insert them into HashSet using add() method, check output of add() method. If add() returns false then its a duplicate, print that word to console. This is also one of the top 20 String based problems from interviews. You can see that article to more coding problems based upon String.
One of the follow-up question of this is how do you find number of times each duplicate word has appeared in sentence? For example, in our coding problem, your solution should also print count of both Java and JavaScript e.g. Java : 2 and JavaScript : 2 because they have appeared twice in sentence.
You can solve this problem by choosing another hash based data structure like hash table, which maintains key value pair. Java provides several implementation of hash table data structure e.g. HashMap, Hashtable and ConcurrentHashMap, but for general purpose HashMap is good enough. In short, just use HashMap instead of HashSet to keep count of duplicate words in sentence. This is also similar to problem of finding duplicate characters in String. Instead of character, you need to find duplicate words, as shown here.
Another follow-up question related to this problem is how do you remove duplicate words from String in Java? Which is actually the same problem of removing duplicate elements from array. If you know how to solve that, you can easily solve this one as well. If you face any problem, see this solution.
Java Program to find duplicate words in String
Here is our solution of problem of finding duplicate words in a sentence in Java. I have used HashSet to find duplicates. Time complexity of this solution is O(n) because we need to iterate over all element in array. You also need a buffer of same size as original array, hence space complexity is also O(n), so it may not be suitable for a really long String. You need more memory to find even a single duplicate word if your String is huge.import java.util.Collections; import java.util.HashSet; import java.util.Set; /** * Java Program to demonstrate how to find duplicate words in String. */ public class DuplicateWordsInString{ public static void main(String[] args) { String test = "This sentence contains two words, one and two"; Set<String> duplicates = duplicateWords(test); System.out.println("input : " + test); System.out.println("output : " + duplicates); } /** * Method to find duplicate words in a Sentence or String * @param input String * @return set of duplicate words */ public static Set<String> duplicateWords(String input){ if(input == null || input.isEmpty()){ return Collections.emptySet(); } Set<String> duplicates = new HashSet<>(); String[] words = input.split("\\s+"); Set<String> set = new HashSet<>(); for(String word : words){ if(!set.add(word)){ duplicates.add(word); } } return duplicates; } } Output : input : This sentence contains two words, one and two output : [two]
JUnit tests
Here is my list of JUnit test class for our solution. We are going to test our solution for empty String, null String, String with only duplicates, String without any duplicates and String which contains multiple spaces between words. Each JUnit tests one input. If your input set is large then you can also consider using parameterized JUnit test.import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertTrue; import java.util.Collections; import java.util.Set; import org.junit.Test; public class DuplicateWordsInStringTest { @Test public void testWithEmptyString(){ Set<String> output = DuplicateWordsInString.duplicateWords(""); assertEquals(Collections.emptySet(), output); } @Test public void testWithNullString(){ Set<String> output = DuplicateWordsInString.duplicateWords(null); assertEquals(Collections.emptySet(), output); } @Test public void testWithDuplicateString(){ Set<String> output = DuplicateWordsInString.duplicateWords("one one one two two"); assertTrue(output.contains("one")); assertTrue(output.contains("two")); assertTrue(output.size() == 2); } @Test public void testWithOutDuplicates(){ Set<String> output = DuplicateWordsInString.duplicateWords("one two three"); assertEquals(Collections.emptySet(), output); } @Test public void testWithMultipleSpaceBetweenWord(){ Set<String> output = DuplicateWordsInString.duplicateWords(" one two three "); assertEquals(Collections.emptySet(), output); } }
That's all about how to find duplicate words in a given String in Java. We have used HashSet data structure to solve this problem and our solution has time and space complexity of O(n). For curious developer, can you come up with a solution with better time and space complexity? How about a solution with time complexity in order of O(k) where k is duplicate words? or O(logN)?
Recommended books for Coding Interviews

No comments :
Post a Comment