MySQL

MySQL Full-Text Search Interview Questions

8 Questions

FULLTEXT indexes optimize text searches. Support natural language and boolean searches. More efficient than LIKE patterns. Require InnoDB or MyISAM engines.

-- Create FULLTEXT index
CREATE FULLTEXT INDEX idx_title ON articles(title);
CREATE FULLTEXT INDEX idx_content ON articles(title, content);

-- Add to existing table
ALTER TABLE articles ADD FULLTEXT INDEX idx_search(title, content);

-- Basic FULLTEXT search
SELECT * FROM articles WHERE MATCH(title, content) AGAINST('database' IN NATURAL LANGUAGE MODE);

-- Index information
SHOW INDEX FROM articles;
SELECT * FROM INFORMATION_SCHEMA.STATISTICS WHERE TABLE_NAME = 'articles' AND INDEX_TYPE = 'FULLTEXT';

Why it matters: Efficient text search on large text fields.

Real applications: Search on articles, blog posts, product descriptions.

Common mistakes: Using LIKE instead of MATCH, engine compatibility, minimum word length.

NATURAL LANGUAGE MODE: Default, word-based search with ranking. BOOLEAN MODE: Advanced operators (+, -, *, etc). Boolean searches don't rank results.

-- NATURAL LANGUAGE (default)
SELECT * FROM articles WHERE MATCH(title) AGAINST('mysql database');
-- Finds documents with mysql and/or database

-- With relevance ranking
SELECT id, title, MATCH(title) AGAINST('mysql database') as relevance 
FROM articles WHERE MATCH(title) AGAINST('mysql database')
ORDER BY relevance DESC;

-- BOOLEAN mode with operators
SELECT * FROM articles WHERE MATCH(title) AGAINST('+mysql -oracle' IN BOOLEAN MODE);
-- Must have mysql, exclude oracle

-- BOOLEAN operators
+word  -- Must contain
-word  -- Must not contain
word*  -- Wildcard (word%)
"phrase"  -- Exact phrase
(word1 word2)  -- Grouping

-- Examples
SELECT * FROM posts WHERE MATCH(content) AGAINST('+python +machine* -java' IN BOOLEAN MODE);
SELECT * FROM articles WHERE MATCH(title, content) AGAINST('"machine learning"' IN BOOLEAN MODE);

Why it matters: Different search behaviors for different use cases.

Real applications: Simple searches (natural), advanced filters (boolean).

Common mistakes: Confusing ranking in boolean mode, operator syntax.

Stopwords are ignored (the, a, is, etc). Minimum length default 3 chars for InnoDB, 4 for MyISAM. Both configurable but require rebuild.

-- Check stopwords
SELECT * FROM INFORMATION_SCHEMA.INNODB_FT_DEFAULT_STOPWORD;

-- Minimum word length (configuration)
-- innodb_ft_min_token_size (default 3)
-- innodb_ft_max_token_size (default 84)

-- Configuration changes require rebuild
SET GLOBAL innodb_ft_min_token_size = 2;
REPAIR TABLE articles;

-- Common stopwords not indexed
SELECT * FROM articles WHERE MATCH(title) AGAINST('the' IN NATURAL LANGUAGE MODE);  -- No results

-- Workaround: Include in phrase
SELECT * FROM articles WHERE MATCH(title) AGAINST('"the database"' IN BOOLEAN MODE);

-- Check what was indexed
SELECT id, title FROM articles WHERE id = 1;
-- "The Database" - Only "database" indexed, "the" ignored

Why it matters: Understand index coverage and limitations.

Real applications: Search configuration, handling short terms.

Common mistakes: Expecting stopwords to be found, not knowing minimum length.

Combined searches mix FULLTEXT with WHERE clauses for filtering. Can combine with other indexes and conditions for comprehensive searches.

-- FULLTEXT with WHERE
SELECT * FROM articles WHERE MATCH(title) AGAINST('mysql') 
AND created_at >= '2024-01-01' AND status = 'published';

-- Multiple conditions
SELECT id, title, category, MATCH(title, content) AGAINST('django' IN NATURAL LANGUAGE MODE) as relevance
FROM articles
WHERE category = 'Python' AND created_at >= DATE_SUB(CURDATE(), INTERVAL 30 DAY)
AND MATCH(title, content) AGAINST('django')
ORDER BY relevance DESC;

-- OR conditions (use UNION)
SELECT * FROM articles WHERE MATCH(title) AGAINST('mysql' IN BOOLEAN MODE)
UNION
SELECT * FROM articles WHERE MATCH(title) AGAINST('postgres' IN BOOLEAN MODE);

-- Exclude based on category
SELECT * FROM posts WHERE MATCH(content) AGAINST('tutorial') 
AND category NOT IN ('Spam', 'Archive')
AND status = 'active';

Why it matters: Real searches need multiple filtering criteria.

Real applications: Search engines, content filters, advanced filters.

Common mistakes: Unnecessary UNION, not using other indexes appropriately.

Limitations: Minimum word length, stopwords, phrase limitations, no wildcards in middle, configuration changes require rebuild, InnoDB limitations.

-- Limitation 1: Minimum word length
SELECT * FROM table WHERE MATCH(text) AGAINST('io' IN NATURAL LANGUAGE MODE);  -- 2 chars, might not work

-- Limitation 2: Stopwords completely ignored
SELECT * FROM articles WHERE MATCH(title) AGAINST('is always true' IN BOOLEAN MODE);  -- Only "always" indexed

-- Limitation 3: Boolean operators don't work in natural language
SELECT * FROM articles WHERE MATCH(title) AGAINST('+mysql +database' IN NATURAL LANGUAGE MODE);
-- Returns whatever mysql and/or database found, +- ignored

-- Limitation 4: Wildcard only at end
SELECT * FROM articles WHERE MATCH(title) AGAINST('data*' IN BOOLEAN MODE);  -- OK
-- AGAINST('*data' IN BOOLEAN MODE);  -- ERROR

-- Limitation 5: Changes require rebuild
SET GLOBAL innodb_ft_min_token_size = 2;
REPAIR TABLE articles;  -- Must rebuild or set before data insertion

-- Workaround: Use LIKE fallback
SELECT * FROM articles WHERE MATCH(title) AGAINST('sql' IN NATURAL LANGUAGE MODE) 
UNION SELECT * FROM articles WHERE title LIKE '%sql%';

Why it matters: Know when FULLTEXT isn't suitable.

Real applications: Fallback patterns, workarounds.

Common mistakes: Expecting shorter words, not rebuilding after configs.

Relevance scoring indicates match quality. Higher scores for better matches. Use in ORDER BY for ranking results. Combines frequency and distance factors.

-- Get relevance score
SELECT id, title, MATCH(title) AGAINST('python' IN NATURAL LANGUAGE MODE) as score
FROM articles 
WHERE MATCH(title) AGAINST('python' IN NATURAL LANGUAGE MODE)
ORDER BY score DESC;

-- Multiple field relevance
SELECT id, title, content,
       MATCH(title) AGAINST('database') * 10 as title_score,
       MATCH(content) AGAINST('database') as content_score,
       MATCH(title) AGAINST('database') * 10 + MATCH(content) AGAINST('database') as total_score
FROM articles 
WHERE MATCH(title, content) AGAINST('database')
ORDER BY total_score DESC;

-- Relevance with other filters
SELECT id, title, MATCH(title, content) AGAINST('machine learning' IN NATURAL LANGUAGE MODE) as relevance
FROM articles
WHERE created_at >= DATE_SUB(CURDATE(), INTERVAL 90 DAY)
AND MATCH(title, content) AGAINST('machine learning')
ORDER BY relevance DESC LIMIT 10;

-- Threshold filtering
SELECT * FROM articles 
WHERE MATCH(title) AGAINST('django' IN NATURAL LANGUAGE MODE) 
HAVING MATCH(title) AGAINST('django') > 0.5;

Why it matters: Display most relevant results first.

Real applications: Search result ranking, recommendation scoring.

Common mistakes: Not ordering by score, mixing NATURAL and BOOLEAN relevance.

FULLTEXT for large text fields, word-based searches, natural language. LIKE for small texts, pattern matching, prefix searches. FULLTEXT more efficient at scale.

-- Use FULLTEXT: Large text, word searches
SELECT * FROM articles WHERE MATCH(content) AGAINST('kubernetes deployment');  -- Fast

-- Use LIKE: Small fields, partial matching
SELECT * FROM users WHERE email LIKE 'john%';  -- Simple, indexed if prefix
SELECT * FROM products WHERE name LIKE '%phone%';  -- Price: full scan

-- FULLTEXT better for phrases
SELECT * FROM docs WHERE MATCH(content) AGAINST('"machine learning algorithm"' IN BOOLEAN MODE);  -- Efficient

-- LIKE for pattern
SELECT * FROM data WHERE value LIKE '[0-9]%';  -- Pattern matching

-- Performance comparison
-- FULLTEXT: O(log N) with good indexing
-- LIKE %pattern%: O(N) full table scan

-- Hybrid: Use both
SELECT * FROM posts 
WHERE (MATCH(title) AGAINST('nodejs' IN NATURAL LANGUAGE MODE) OR title LIKE 'Node%')
AND created_at >= CURDATE() - INTERVAL 7 DAY;

Why it matters: Choose correct search method for performance.

Real applications: Content search, pattern matching.

Common mistakes: FULLTEXT on small fields, LIKE on large text columns.

Best practices: Create indexes on large text fields, use boolean mode for advanced queries, handle stopwords, monitor performance, rebuild indexes periodically, consider alternatives for complex searches.

-- ✅ Good: FULLTEXT index on larger text fields
CREATE FULLTEXT INDEX idx_long_text ON documents(title, content_body);

-- ❌ Bad: FULLTEXT on tiny fields
CREATE FULLTEXT INDEX idx_tiny ON users(username);  -- Use LIKE or prefix index

-- ✅ Good: Separate indexes for different columns
CREATE FULLTEXT INDEX idx_title ON articles(title);
CREATE FULLTEXT INDEX idx_content ON articles(content);

-- ❌ Bad: Combined irrelevant fields
CREATE FULLTEXT INDEX idx_all ON articles(id, title, content, author_id);

-- ✅ Good: Check performance with EXPLAIN
EXPLAIN SELECT * FROM articles WHERE MATCH(title) AGAINST('mysql' IN NATURAL LANGUAGE MODE);

-- ❌ Bad: Ignore stopwords issue
-- SELECT * FROM table WHERE MATCH(content) AGAINST('is')  -- Empty result

-- Rebuild indexes periodically
OPTIMIZE TABLE articles;  -- Defragments FULLTEXT index

-- Monitor index size
SELECT object_schema, object_name, sys.format_bytes(count_read) as size 
FROM sys.schema_unused_indexes WHERE object_name = 'articles';

Why it matters: Maintain efficient search performance.

Real applications: Production search systems, large content databases.

Common mistakes: Indexes on wrong fields, not monitoring performance, stopword issues.