C# Regular Expressions
Regex Basics
Regular expressions match patterns in text. Use them for validation, searching, and text transformation.
using System.Text.RegularExpressions;
// Simple match check
bool isMatch = Regex.IsMatch("hello@example.com", @"@.*\."); // true
// Find match
Match match = Regex.Match("Order #12345", @"#(\d+)");
if (match.Success)
{
Console.WriteLine(match.Value); // #12345
Console.WriteLine(match.Groups[1].Value); // 12345
}
// Find all matches
MatchCollection matches = Regex.Matches("a1 b2 c3", @"\w\d");
foreach (Match m in matches)
{
Console.WriteLine(m.Value); // a1, b2, c3
}
Common Patterns
| Pattern | Matches | Example |
|---|---|---|
\d |
Digit | 0-9 |
\w |
Word character | a-z, A-Z, 0-9, _ |
\s |
Whitespace | space, tab, newline |
. |
Any character (except newline) | |
^ |
Start of string | |
$ |
End of string | |
+ |
One or more | \d+ matches 123 |
* |
Zero or more | \d* matches `` or 123 |
? |
Zero or one | \d? matches `` or 1 |
{n} |
Exactly n | \d{3} matches 123 |
{n,m} |
Between n and m | \d{2,4} matches 12 to 1234 |
[abc] |
Character class | matches a, b, or c |
[^abc] |
Negated class | matches anything except a, b, c |
(...) |
Capture group | captures matched text |
(?:...) |
Non-capturing group | groups without capturing |
\| |
Alternation | cat\|dog matches either |
Practical Examples
Validation
public static class Validators
{
// Email (simplified)
private static readonly Regex EmailRegex = new(
@"^[\w\.-]+@[\w\.-]+\.\w+$",
RegexOptions.Compiled);
// Phone (US format)
private static readonly Regex PhoneRegex = new(
@"^\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}$",
RegexOptions.Compiled);
// URL
private static readonly Regex UrlRegex = new(
@"^https?://[\w\.-]+(?:/[\w\.-]*)*/?$",
RegexOptions.Compiled);
public static bool IsValidEmail(string email) => EmailRegex.IsMatch(email);
public static bool IsValidPhone(string phone) => PhoneRegex.IsMatch(phone);
public static bool IsValidUrl(string url) => UrlRegex.IsMatch(url);
}
Extraction
// Extract all URLs from text
var urlPattern = new Regex(@"https?://[^\s]+");
var urls = urlPattern.Matches(htmlContent)
.Select(m => m.Value)
.ToList();
// Extract named groups
var logPattern = new Regex(
@"(?<date>\d{4}-\d{2}-\d{2}) (?<level>\w+): (?<message>.+)");
var match = logPattern.Match("2024-01-15 ERROR: Connection failed");
if (match.Success)
{
string date = match.Groups["date"].Value; // 2024-01-15
string level = match.Groups["level"].Value; // ERROR
string message = match.Groups["message"].Value; // Connection failed
}
Replacement
// Simple replacement
string result = Regex.Replace("Hello World", @"\s+", "-");
// "Hello-World"
// Using captured groups
string masked = Regex.Replace(
"Card: 1234-5678-9012-3456",
@"(\d{4})-(\d{4})-(\d{4})-(\d{4})",
"****-****-****-$4");
// "Card: ****-****-****-3456"
// Using MatchEvaluator
string result = Regex.Replace("prices: $10, $25, $100",
@"\$(\d+)",
m => $"${int.Parse(m.Groups[1].Value) * 2}");
// "prices: $20, $50, $200"
Splitting
// Split on multiple delimiters
string[] parts = Regex.Split("one,two;three four", @"[,;\s]+");
// ["one", "two", "three", "four"]
// Split keeping delimiters
string[] tokens = Regex.Split("a+b-c*d", @"([+\-*])");
// ["a", "+", "b", "-", "c", "*", "d"]
Regex Options
var options = RegexOptions.IgnoreCase // Case-insensitive
| RegexOptions.Multiline // ^ and $ match line boundaries
| RegexOptions.Singleline // . matches newlines
| RegexOptions.Compiled; // Compile for performance
var regex = new Regex(@"pattern", options);
// Inline options
var pattern = @"(?i)case insensitive"; // (?i) enables ignore case
var pattern2 = @"(?m)^line start"; // (?m) enables multiline
Source-Generated Regex (.NET 7+)
Source Generation for Regex
Generated regex compiles the pattern to IL at build time, eliminating runtime compilation overhead and enabling AOT deployment. Use it for frequently-used patterns.
Compile-time generation for better performance and AOT support.
public partial class Patterns
{
[GeneratedRegex(@"^\d{3}-\d{2}-\d{4}$")]
private static partial Regex SsnRegex();
[GeneratedRegex(@"[\w\.-]+@[\w\.-]+\.\w+", RegexOptions.IgnoreCase)]
private static partial Regex EmailRegex();
public static bool IsValidSsn(string ssn) => SsnRegex().IsMatch(ssn);
public static bool IsValidEmail(string email) => EmailRegex().IsMatch(email);
}
Benefits:
- No runtime compilation overhead
- Compile-time pattern validation
- Better performance
- AOT compatible
Performance Considerations
Compile for Reuse
Static Regex Methods (Slow)
- Regex.IsMatch(input, pattern)
- Compiles pattern every call
- Internal cache helps but not guaranteed
- Avoid in hot paths
Compiled/Generated (Fast)
- Static readonly Regex instance
- Compiles once, reuse forever
- Or use [GeneratedRegex]
- Optimal for repeated use
// BAD: Creates new Regex each call
public bool Validate(string input)
{
return Regex.IsMatch(input, @"\d+"); // Compiles pattern each time
}
// GOOD: Reuse compiled Regex
private static readonly Regex NumberRegex = new(@"\d+", RegexOptions.Compiled);
public bool Validate(string input)
{
return NumberRegex.IsMatch(input);
}
// BEST: Source-generated (.NET 7+)
[GeneratedRegex(@"\d+")]
private static partial Regex NumberRegex();
Set Timeout
Catastrophic Backtracking
Certain regex patterns can cause exponential time complexity when matching fails. Always set timeouts for untrusted input or complex patterns to prevent denial-of-service.
// Prevent catastrophic backtracking
var regex = new Regex(
@"(a+)+$", // Potentially dangerous pattern
RegexOptions.None,
TimeSpan.FromSeconds(1));
try
{
regex.Match("aaaaaaaaaaaaaaaaaaaaaaaaaab");
}
catch (RegexMatchTimeoutException)
{
// Handle timeout
}
Avoid Catastrophic Backtracking
// BAD: Nested quantifiers can cause exponential time
var bad = new Regex(@"(a+)+b");
// GOOD: Use atomic groups or possessive quantifiers
var good = new Regex(@"(?>a+)+b"); // Atomic group
// Or restructure the pattern
var better = new Regex(@"a+b");
Common Tasks
Parse Key-Value Pairs
var pattern = new Regex(@"(?<key>\w+)=(?<value>[^;]+)");
var input = "name=John;age=30;city=NYC";
var dict = pattern.Matches(input)
.ToDictionary(
m => m.Groups["key"].Value,
m => m.Groups["value"].Value);
Clean/Normalize Text
// Remove extra whitespace
string cleaned = Regex.Replace(text, @"\s+", " ").Trim();
// Remove non-alphanumeric
string alphaOnly = Regex.Replace(text, @"[^a-zA-Z0-9]", "");
// Normalize line endings
string normalized = Regex.Replace(text, @"\r\n?|\n", Environment.NewLine);
Extract Numbers
var numbers = Regex.Matches("Price: $19.99, Qty: 5", @"\d+\.?\d*")
.Select(m => decimal.Parse(m.Value))
.ToList(); // [19.99, 5]
Version History
| Feature | Version | Significance |
|---|---|---|
| Regex class | .NET 1.0 | Core regex support |
| RegexOptions.Compiled | .NET 1.0 | Performance optimization |
| Named groups | .NET 1.0 | Readable captures |
| Regex timeout | .NET 4.5 | Backtracking protection |
| Source generation | .NET 7 | Compile-time regex |
| Non-backtracking mode | .NET 7 | RegexOptions.NonBacktracking |
Key Takeaways
Use source-generated regex: In .NET 7+, prefer [GeneratedRegex] for compile-time validation and performance.
Always compile for reuse: Create static Regex instances with RegexOptions.Compiled instead of calling static methods.
Set timeouts for untrusted input: Protect against catastrophic backtracking with TimeSpan parameter.
Use named groups: (?<name>...) makes patterns more readable than numbered groups.
Keep patterns simple: Complex patterns are hard to maintain. Consider multiple simple patterns or alternative parsing approaches for complex grammars.
Test edge cases: Empty strings, very long strings, and malformed input can cause unexpected behavior.
Found this guide helpful? Share it with your team:
Share on LinkedIn