One of the best and most engaging technical books I’ve ever read.
Relevant Search demystifies relevance work. Using Elasticsearch, it teaches you how to return engaging search results to your users, helping you understand and leverage the internals of Lucene-based search engines.
1. The search relevance problem
1.1. Your Goal: Gaining The Skills of A Relevance Engineer
1.2. Why is Search Relevance So Hard?
1.2.1. What's a "relevant" search result?
1.2.2. Search: There's No Silver Bullet!
1.3. Gaining insight from relevance research?
1.3.1. Information retrieval
1.3.2. Can we use Information Retrieval to solve relevance?
1.4. How do you solve relevance?
1.5. More than technology: curation, collaboration, & feedback
1.6. Summary
2. Search Under The Hood
2.1. Search 101
2.1.1. What's a Search Document?
2.1.2. Searching the content
2.1.3. Exploring content through search
2.1.4. Getting content into the search engine
2.2. Search Engine Data Structures
2.2.1. The Inverted Index
2.2.2. Other Pieces of the Inverted Index
2.3. Indexing Content: Extraction, Enrichment, Analysis, and Indexing
2.3.1. Extracting Content Into Documents
2.3.2. Enriching Documents to Clean, Augment, and Merge Data
2.3.3. Performing Analysis
2.3.4. Indexing
2.4. Document Search and Retrieval
2.4.1. Boolean Matching: AND/OR/NOT
2.4.2. Boolean Queries in Lucene-Based Search (MUST/MUST_NOT/SHOULD)
2.4.3. Positional and Phrase Matching
2.4.4. Enabling Exploration: Filtering, Facets, and Aggregations
2.4.5. Sorting, Ranked Results, and Relevance
2.5. Summary
3. Debugging your first relevance problem
3.1. Applications to Solr & Elasticsearch: Examples in Elasticsearch
3.2. Our Most Prominent Data Set: TMDB
3.3. Examples Programmed in Python
3.4. Our First Search Application
3.4.1. Our first searches of the TMDB Elasticsearch Index
3.5. Debugging Query Matching
3.5.1. Examining The Underlying Query Strategy
3.5.2. Taking Apart Query Parsing
3.5.3. Debugging Analysis To Solve Matching Issues
3.5.4. Our Query Vs The Inverted Index
3.5.5. Fixing Our Matching By Changing Analyzers
3.6. Debugging Ranking
3.6.1. Decomposing Relevance Score With Lucene's Explain
3.6.2. The Vector-Space Model, The Relevance Explain, and You!
3.6.3. Practical Caveats to the vector space model
3.6.4. Scoring matches to measure relevance
3.6.5. Computing Weights with TF*IDF
3.6.6. Lies, Damned Lies, and Similarity
3.6.7. Factoring in the Search Term’s Importance
3.6.8. Fixing Space Jam vs Alien Ranking
3.7. Solved? Our Work is Never Over!
3.8. Summary
4. Taming Tokens
4.1. Tokens as Document Features
4.1.1. The Matching Process
4.1.2. Tokens, More Than Just Words
4.2. Controlling Precision and Recall
4.2.1. Precision and Recall by Example
4.2.2. Analysis for Precision or Recall
4.2.3. Taking Recall to Extremes
4.3. Precision AND Recall Have Your Cake and Eat it Too
4.3.1. Scoring strength of a feature in a single field
4.3.2. Scoring beyond TF*IDF: multiple search terms and multiple fields
4.4. Analysis Strategies
4.4.1. Dealing with Delimiters
4.4.2. Capturing Meaning with Synonyms
4.4.3. Modeling Specificity in Search
4.4.4. Modeling Specificity with Synonyms
4.4.5. Modeling Specificity with Paths
4.4.6. Tokenize the World!
4.4.7. Tokenizing Integers
4.4.8. Tokenizing Geographic Data
4.4.9. Tokenizing Melodies
4.5. Summary
5. Basic Multifield Search
5.1. Signals and Signal Modeling
5.1.1. What is a Signal?
5.1.2. Starting With The Source Data Model
5.1.3. Implementing a Signal
5.1.4. Programming Relevance via Data Modeling
5.2. TMDB Search, The Final Frontier!
5.2.1. Violating The Prime Directive
5.2.2. Flattening Nested Docs
5.3. Signal Modeling In Field Centric Search
5.3.1. Starting Out With Best Fields
5.3.2. Controlling Field Preference In Search Results
5.3.3. Better Best Fields With More Precise Signals?
5.3.4. Letting Losers Share The Glory: Calibrating Best Fields
5.3.5. Counting Multiple Signals using Most Fields
5.3.6. Boosting in Most-Fields
5.3.7. When Additional Matches Don't Matter
5.3.8. Does Most Fields Count The Right Signals?
5.4. Summary
6. Term-Centric Search
6.1. What is Term-Centric Search?
6.2. Why Do You Need Term-Centric Search?
6.2.1. Hunting for Albino Elephants
6.2.2. Albino Elephant in Star Trek Example
6.2.3. Signal Discordance
6.2.4. The Mechanics of Signal Discordance
6.3. Your First Term-Centric Searches
6.3.1. The Term-Centric Ranking Function
6.3.2. Running a Term-Centric Query Parser (Into The Ground)
6.3.3. Understanding Field Synchronicity
6.3.4. Field Synchronicity and Signal Modeling
6.3.5. Query Parsers and Signal Discordance
6.3.6. Tuning Term-Centric Search
6.4. Solving Signal Discordance in Term-Centric Search
6.4.1. Combining Fields into Custom All Fields
6.4.2. Solving Signal Discordance With Cross Fields
6.5. Combining Field Centric and Term-Centric Strategies: Having Your Cake and Eating It Too
6.5.1. Grouping "Like Fields" Together
6.5.2. Limits of Like Fields
6.5.3. Combining Greedy Naïve Search and Conservative Amplifiers
6.5.4. TermCentric vs FieldCentric and Precision vs Recall
6.5.5. Considering Filtering, Boosting, and Reranking
6.6. Summary
7. Shaping the Relevance Function
7.1. What Do We Mean By Score Shaping?
7.2. Boosting: Shaping by Promoting Results
7.2.1. Boosting: The Final Frontier
7.2.2. When Boosting Add or Multiply? Boolean or Function Query?
7.2.3. You Chose Door A: Additive Boosting with Boolean Queries
7.2.4. You Chose Door B: Introducing Function Queries: Ranking with Math
7.2.5. Hands on with Function Queries: Simple Multiplicative Boosting
7.2.6. Boosting basics: Signals, Signals Everywhere
7.3. Filtering: Shaping by Excluding Results
7.4. Score Shaping Strategies For Satisfying Business Needs
7.4.1. Search ALL THE MOVIES!
7.4.2. Modeling Your Boosting Signals
7.4.3. Building the Ranking Function: Adding High Value Tiers
7.4.4. High Value Tier Scored with A Function Query
7.4.5. Ignoring TF x IDF
7.4.6. Capturing General-Quality Metrics
7.4.7. Achieving Users' Recency Goals
7.4.8. Combining The Function Queries
7.4.9. Putting It All Together!
7.5. Summary
8. Providing relevance feedback
8.1. Relevance Feedback at the Search Box
8.1.1. Immediate Results with SearchasYouType
8.1.2. Help Users Find the Best Query with Search Completion
8.1.3. Correcting Typos and Misspellings with Search Suggest
8.2. Relevance Feedback while Browsing
8.2.1. Building Faceted Browsing
8.2.2. Breadcrumb Navigation
8.2.3. Selecting Alternative Result Ordering
8.3. Relevance Feedback in the Search Results Listing
8.3.1. What Information Should be Presented in Listing Items?
8.3.2. Relevance Feedback through Snippets and Highlighting
8.3.3. Grouping together similar documents
8.3.4. Helping the User When There are no Results
8.4. Summary
9. Designing a Relevance-Focused Search Application
9.1. Yowl! The Awesome New Startup!
9.2. Gather Information and Requirements
9.2.1. Understand Users and Their Information Needs
9.2.2. Understand Business Needs
9.2.3. Identifying Required and Available Information
9.3. Design the Search Application
9.3.1. Visualize the User's Experience
9.3.2. Define and Model Signals
9.3.3. Combine and Balance Signals
9.4. Deploy, Monitor, Improve
9.4.1. Monitor
9.4.2. Identify Problems and Fix them!
9.5. Knowing When Good is Good Enough
9.6. Summary
10. The Relevance Centered Enterprise
10.1. Feedback: the bedrock of the relevance centered enterprise
10.2. Why userfocused culture before datadriven culture?
10.3. Flying relevance blind
10.4. Relevance Feedback Awakenings: Domain Experts and Expert Users
10.5. Relevance Feedback Maturing: Content Curation
10.5.1. The Role Of The Content Curator
10.5.2. The Risk Of Miscommunication With The Content Curator
10.6. Relevance Streamlined: Engineer/Curator Pairing
10.7. Relevance Accelerated: TestDriven Relevance
10.7.1. Understanding Test-Driven Relevance
10.7.2. Using TestDriven Relevance with User Behavioral Data
10.8. Beyond TestDriven Relevance: Learning To Rank
10.9. Summary
11. Semantic And Personalized Search
11.1. Personalizing search based upon user profiles
11.1.1. Gathering user profile information
11.1.2. Tying profile information back to the search index
11.2. Personalizing search based upon user behavior
11.2.1. Introducing Collaborative Filtering
11.2.2. Basic collaborative filtering using cooccurrence counting
11.2.3. Tying user behavior information back to the search index
11.3. Basic methods for building concept search
11.3.1. Building concept signals
11.3.2. Augmenting content with synonyms
11.4. Building concept search using machine learning
11.4.1. The importance of phrases in concept search
11.5. The personalized search conceptual search connection
11.6. Recommendation as a generalization of search
11.6.1. Replacing Search with Recommendation
11.7. Best wishes on your search relevance journey
11.8. Summary
Appendixes
Appendix A: Indexing directly from TMDB
A.1. Set TMDB Key & Load IPython Notebook
A.2. Setting up for the TMDB API
A.3. Crawling the TMDB API
A.4. Indexing TMDB Movies to Elasticsearch
Appendix B: Solr reader's Companion
B.1. Chapter 4: Taming Solr's Terms
B.1.1. Summary of Solr Analysis and Mappings Features
B.1.2. Building Custom Analyzers in Solr
B.1.3. Field Mappings in Solr
B.2. Chapters 5 and 6: Multifield Search in Solr
B.2.1. Summary of Query Feature Mappings
B.2.2. Understanding Query Differences Between Solr and Elasticsearch
B.2.3. Querying Solr: The Ergonomics
B.2.4. Term Centric and Field Centric Search with The edismax Query Parser
B.2.5. All Fields & CrossField Search
B.3. Chapter 7: Shaping Solr’s Relevance Function
B.3.1. Summary of Boosting Feature Mappings
B.3.2. Solr's Boolean Boosting
B.3.3. Solr's Function Queries
B.3.4. Multiplicative Boosting in Solr
B.4. Chapter 8: Relevance Feedback
B.4.1. Summary of Relevance Feedback Feature Mappings
B.4.2. Solr Autocomplete: Match Phrase Prefix
B.4.3. Faceted Browsing in Solr (aka "Solr Facets" not "Elasticsearch Aggregrations")
B.4.4. Field Collapsing
B.4.5. Suggest and Highlight Components
About the Technology
Users are accustomed to and expect instant, relevant search results. To achieve this, you must master the search engine. Yet for many developers, relevance ranking is mysterious or confusing.
About the book
Relevant Search demystifies the subject and shows you that a search engine is a programmable relevance framework. You'll learn how to apply Elasticsearch or Solr to your business's unique ranking problems. The book demonstrates how to program relevance and how to incorporate secondary data sources, taxonomies, text analytics, and personalization. In practice, a relevance framework requires softer skills as well, such as collaborating with stakeholders to discover the right relevance requirements for your business. By the end, you’ll be able to achieve a virtuous cycle of provable, measurable relevance improvements over a search product’s lifetime.
FREE domestic shipping on three or more pBooks
Will help you solve real-world search relevance problems for Lucene-based search engines.
An inspiring book revealing the essence and mechanics of relevant search.
Arms you with invaluable knowledge to temper the relevancy of search results and harness the powerful features provided by modern search engines.