MySQL

MySQL String Functions Interview Questions

7 Questions

CONCAT combines strings, SUBSTRING extracts portions, LENGTH returns character count. TRIM, LTRIM, RTRIM remove whitespace. These form foundation of text manipulation.

-- Basic functions
SELECT CONCAT('Hello', ' ', 'World');  -- 'Hello World'
SELECT SUBSTRING('MySQL', 1, 3);  -- 'MyS'
SELECT LENGTH('Database');  -- 8
SELECT TRIM('  spaces  ');  -- 'spaces'

-- Practical examples
SELECT CONCAT(first_name, ' ', last_name) as full_name FROM employees;
SELECT SUBSTRING(email, 1, POSITION('@' IN email) - 1) as username FROM users;
SELECT * FROM customers WHERE LENGTH(phone) > 10;

Why it matters: String functions are essential for data formatting and validation.

Real applications: Name formatting, email extraction, data validation.

Common mistakes: 1-based indexing (not 0), not handling NULL values.

REPLACE substitutes text patterns. UPPER/LOWER convert case. Useful for normalization and case-insensitive comparisons.

-- Case and replacement functions
SELECT UPPER('mysql');  -- 'MYSQL'
SELECT LOWER('DATABASE');  -- 'database'
SELECT REPLACE('Hello World', 'World', 'MySQL');  -- 'Hello MySQL'

-- Use for data cleaning
SELECT REPLACE(phone, '-', '') as clean_phone FROM users;
SELECT UPPER(name) FROM customers;
SELECT * FROM products WHERE LOWER(category) = 'electronics';

Why it matters: Normalizing case ensures consistent searches.

Real applications: Data cleaning, search normalization.

Common mistakes: Performance impact with large datasets, case sensitivity handling.

LOCATE/INSTR find position of substring. FIND_IN_SET locates value in comma-separated list. Useful for searching within text.

-- Position finding
SELECT LOCATE('SQL', 'MySQL');  -- 3
SELECT INSTR('Database', 'base');  -- 5
SELECT FIND_IN_SET('amazon', 'apple,amazon,google');  -- 2

-- Use in queries
SELECT * FROM emails WHERE LOCATE('@gmail.com', email) > 0;
SELECT * FROM tags WHERE FIND_IN_SET('important', tag_list) > 0;

Why it matters: Efficient substring searching in queries.

Real applications: Email validation, tag searching, data parsing.

Common mistakes: FIND_IN_SET with wrong delimiter, 1-based results.

LPAD/RPAD pad strings to fixed width. REPEAT repeats string. Useful for formatting output.

-- Padding functions
SELECT LPAD('5', 3, '0');  -- '005'
SELECT RPAD('Test', 10, '.');  -- 'Test......'
SELECT REPEAT('*', 5);  -- '*****'

-- Format numbers with leading zeros
SELECT LPAD(employee_id, 5, '0') FROM employees;
-- Display separator lines
SELECT REPEAT('-', 50) as separator;

Why it matters: Clean formatting for reports and displays.

Real applications: ID formatting, visual separators.

Common mistakes: Working with numeric values, overflow with repeat.

CHAR_LENGTH counts characters (multi-byte aware), contrast with LENGTH (bytes). HEX/UNHEX convert to/from hexadecimal for binary data.

-- Character counting
SELECT LENGTH('你好');  -- 6 bytes
SELECT CHAR_LENGTH('你好');  -- 2 characters
SELECT HEX('MySQL');  -- '4D7953514C'
SELECT UNHEX('4D7953514C');  -- 'MySQL'

-- Practical uses
SELECT * FROM products WHERE CHAR_LENGTH(description) > 100;
-- Store encoded data
SELECT HEX(image_data) FROM product_images;

Why it matters: Correct character counting for multi-byte strings.

Real applications: UTF-8 text, binary data encoding.

Common mistakes: Confusing LENGTH (bytes) and CHAR_LENGTH (chars).

REGEXP matches patterns using regex. More powerful than LIKE, supports character classes, quantifiers, anchors. Performance trade-off vs simple string matching.

-- Basic regex patterns
SELECT * FROM users WHERE email REGEXP '^[A-Za-z0-9._%+-]+@gmail\\.com$';
SELECT * FROM products WHERE name REGEXP '^[A-Z]';  -- Starts with capital
SELECT * FROM phones WHERE phone REGEXP '^\\+?1?[0-9]{10}$';  -- US format

-- Character classes
[a-z] - Lowercase letters
[0-9] - Digits
[A-Za-z] - Any letter
. - Any character
^ - Start of string
$ - End of string
+ - One or more
* - Zero or more
? - Zero or one
{n} - Exactly n times

-- Examples
SELECT * FROM emails WHERE email NOT REGEXP '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+$';
SELECT * FROM names WHERE name REGEXP '^(Mr|Mrs|Ms)\\.';

Why it matters: Pattern matching for validation and searching.

Real applications: Email validation, phone formatting, data quality checks.

Common mistakes: Complex patterns hard to maintain, performance impact on large datasets.

Performance issues include: functions prevent index use, string operations are slower than numeric, REGEXP more expensive than LIKE. Optimize by moving functions to the application layer or calculating values at insert time.

-- Slow: Function prevents index use
SELECT * FROM users WHERE UPPER(email) = 'TEST@GMAIL.COM';

-- Better: Store normalized data
SELECT * FROM users WHERE email = 'test@gmail.com';

-- Slow: Function in WHERE with large dataset
SELECT * FROM orders WHERE CONCAT(year, '-', month) = '2024-1';

-- Better: Store calculated value
SELECT * FROM orders WHERE year_month = '2024-1';

-- Slow: Multiple string operations
SELECT REPLACE(UPPER(TRIM(description)), 'WORD', 'REPLACEMENT')
FROM products WHERE CHAR_LENGTH(description) > 100;

-- Better: Preprocess and cache results
UPDATE products SET normalized_desc = LOWER(TRIM(description));

Why it matters: String function overhead impacts query speed on large datasets.

Real applications: Search optimization, data normalization pipelines, caching computed values.

Common mistakes: Using functions on indexed columns in WHERE, not pre-normalizing data for frequent searches.