C# Strings and Text Processing
String Fundamentals
Strings are immutable—every modification creates a new string object. Use StringBuilder for building strings in loops.
Strings in C# are immutable reference types. Every modification creates a new string object.
string greeting = "Hello";
string modified = greeting + " World"; // Creates new string
// greeting is still "Hello"
// String interning - identical literals share memory
string a = "hello";
string b = "hello";
bool same = ReferenceEquals(a, b); // true - same object
// Runtime strings not interned by default
string c = new string(new[] { 'h', 'e', 'l', 'l', 'o' });
bool notSame = ReferenceEquals(a, c); // false
String Creation
Literals and Verbatim Strings
// Regular string - escape sequences processed
string path = "C:\\Users\\Name\\Documents";
string newline = "Line 1\nLine 2";
string tab = "Col1\tCol2";
// Verbatim string @ - escape sequences not processed
string verbatimPath = @"C:\Users\Name\Documents";
string multiLine = @"First line
Second line
Third line";
// Double quotes in verbatim
string quoted = @"She said ""Hello""";
Raw String Literals (C# 11)
// Raw strings - preserve whitespace and quotes
string json = """
{
"name": "Alice",
"age": 30
}
""";
// Number of quotes determines delimiter
string withQuotes = """"
He said """Hello"""
"""";
// With interpolation
int age = 30;
string interpolated = $"""
{
"name": "Alice",
"age": {age}
}
""";
UTF-8 String Literals (C# 11)
The u8 suffix creates UTF-8 encoded byte sequences directly, avoiding runtime encoding overhead.
// UTF-8 literal produces ReadOnlySpan<byte>
ReadOnlySpan<byte> utf8 = "Hello"u8;
// Useful for HTTP headers, protocols, and APIs expecting UTF-8
stream.Write("Content-Type: application/json\r\n"u8);
// Convert to byte array when needed
byte[] jsonBytes = """{"name":"test"}"""u8.ToArray();
// Combine with raw strings for multi-line UTF-8
ReadOnlySpan<byte> httpResponse = """
HTTP/1.1 200 OK
Content-Type: text/plain
Hello, World!
"""u8;
UTF-8 literals are particularly valuable in network programming, serialization, and any code that communicates with systems expecting UTF-8 encoding.
String Interpolation
string name = "Alice";
int age = 30;
// Basic interpolation
string message = $"Hello, {name}! You are {age} years old.";
// With formatting
decimal price = 19.99m;
string formatted = $"Price: {price:C2}"; // Price: $19.99
DateTime date = DateTime.Now;
string dateStr = $"Date: {date:yyyy-MM-dd}";
// Alignment
string aligned = $"|{name,-10}|{age,5}|"; // |Alice | 30|
// Expressions
string expr = $"Next year: {age + 1}";
string conditional = $"Status: {(age >= 18 ? "Adult" : "Minor")}";
// Raw interpolation for JSON (C# 11)
string jsonTemplate = $$"""
{
"name": "",
"age":
}
""";
String Comparison
Choosing the Right String Comparison
- For identifiers, paths, keys: Use
OrdinalorOrdinalIgnoreCase(fastest) - For user-facing text: Use
CurrentCulture - For persisted data: Use
InvariantCulture
Comparison Types
string a = "hello";
string b = "Hello";
// Ordinal (byte-by-byte) - fastest, case-sensitive
bool ordinal = string.Equals(a, b, StringComparison.Ordinal); // false
// OrdinalIgnoreCase - fast, case-insensitive
bool ordinalIgnore = string.Equals(a, b, StringComparison.OrdinalIgnoreCase); // true
// CurrentCulture - culture-aware
bool culture = string.Equals(a, b, StringComparison.CurrentCulture);
// InvariantCulture - consistent across cultures
bool invariant = string.Equals(a, b, StringComparison.InvariantCultureIgnoreCase);
Comparison Methods
// Equality
bool equal = string.Equals(a, b, StringComparison.OrdinalIgnoreCase);
bool opEqual = a == b; // Uses Ordinal
// Comparison (for sorting)
int result = string.Compare(a, b, StringComparison.OrdinalIgnoreCase);
// < 0: a before b, = 0: equal, > 0: a after b
// Contains, StartsWith, EndsWith
bool contains = a.Contains("ell", StringComparison.OrdinalIgnoreCase);
bool starts = a.StartsWith("he", StringComparison.OrdinalIgnoreCase);
bool ends = a.EndsWith("lo", StringComparison.OrdinalIgnoreCase);
String Manipulation
Basic Operations
string text = " Hello, World! ";
// Case conversion
string upper = text.ToUpper(); // " HELLO, WORLD! "
string lower = text.ToLower(); // " hello, world! "
string upperInvariant = text.ToUpperInvariant(); // Culture-independent
// Trimming
string trimmed = text.Trim(); // "Hello, World!"
string trimStart = text.TrimStart(); // "Hello, World! "
string trimEnd = text.TrimEnd(); // " Hello, World!"
string trimChars = "###Hello###".Trim('#'); // "Hello"
// Padding
string padLeft = "42".PadLeft(5, '0'); // "00042"
string padRight = "Hi".PadRight(5); // "Hi "
// Substring
string sub = "Hello, World!".Substring(7, 5); // "World"
string fromIndex = "Hello, World!"[7..]; // "World!" (range)
// Replace
string replaced = text.Replace("World", "Universe");
string charReplace = text.Replace(',', ';');
Splitting and Joining
// Split
string csv = "apple,banana,cherry";
string[] parts = csv.Split(','); // ["apple", "banana", "cherry"]
// Split with options
string spaced = "a b c";
string[] noEmpty = spaced.Split(' ', StringSplitOptions.RemoveEmptyEntries);
// ["a", "b", "c"]
// Split with multiple separators
string mixed = "a,b;c|d";
string[] mixedParts = mixed.Split(new[] { ',', ';', '|' });
// Split with limit
string limited = "a,b,c,d,e".Split(',', 3); // ["a", "b", "c,d,e"]
// Join
string joined = string.Join(", ", parts); // "apple, banana, cherry"
string joinedArray = string.Join("-", new[] { 1, 2, 3 }); // "1-2-3"
// Concat
string concat = string.Concat("Hello", " ", "World");
string concatArray = string.Concat(new[] { "a", "b", "c" }); // "abc"
Searching
string text = "Hello, World! Hello again!";
// Index of
int index = text.IndexOf("Hello"); // 0
int lastIndex = text.LastIndexOf("Hello"); // 14
int indexIgnoreCase = text.IndexOf("hello", StringComparison.OrdinalIgnoreCase);
// Index of any
int anyIndex = text.IndexOfAny(new[] { 'o', 'e' }); // 1
// Contains
bool contains = text.Contains("World");
bool containsIgnoreCase = text.Contains("world", StringComparison.OrdinalIgnoreCase);
StringBuilder
For building strings incrementally, especially in loops.
var sb = new StringBuilder();
// Append
sb.Append("Hello");
sb.Append(' ');
sb.Append("World");
sb.AppendLine("!");
sb.AppendLine("Second line");
// Append formatted
sb.AppendFormat("Price: {0:C2}", 19.99);
sb.Append($"Date: {DateTime.Now:d}");
// Insert
sb.Insert(0, "Greeting: ");
// Replace
sb.Replace("World", "Universe");
// Remove
sb.Remove(0, 10); // Remove first 10 characters
// Clear
sb.Clear();
// Get result
string result = sb.ToString();
// Capacity management
var sbCapacity = new StringBuilder(initialCapacity: 1000);
sbCapacity.EnsureCapacity(2000);
// Chaining
var chained = new StringBuilder()
.Append("Hello")
.Append(' ')
.Append("World")
.ToString();
When to Use StringBuilder
// BAD - creates many intermediate strings
string result = "";
for (int i = 0; i < 1000; i++)
{
result += i.ToString(); // O(n²) - each += creates new string
}
// GOOD - efficient
var sb = new StringBuilder();
for (int i = 0; i < 1000; i++)
{
sb.Append(i); // O(n)
}
string result = sb.ToString();
// Single concatenation is fine
string simple = a + b + c; // Compiler optimizes this
// Use string.Join for collections
string joined = string.Join(",", items); // Better than loop
Formatting
Composite Formatting
// Format with placeholders
string formatted = string.Format("Hello, {0}! You are {1} years old.", name, age);
// With format specifiers
string currency = string.Format("{0:C}", 1234.56); // $1,234.56
string number = string.Format("{0:N2}", 1234.5678); // 1,234.57
string percent = string.Format("{0:P1}", 0.1234); // 12.3%
string hex = string.Format("{0:X}", 255); // FF
string date = string.Format("{0:yyyy-MM-dd}", DateTime.Now);
// Alignment
string left = string.Format("{0,-10}", "Hi"); // "Hi "
string right = string.Format("{0,10}", "Hi"); // " Hi"
Custom Format Strings
// Numeric
double value = 1234.5678;
value.ToString("F2"); // "1234.57" - fixed point
value.ToString("N2"); // "1,234.57" - number with separators
value.ToString("E2"); // "1.23E+003" - scientific
value.ToString("0.00"); // "1234.57" - custom
value.ToString("#,##0.00"); // "1,234.57" - custom with grouping
// Date/Time
DateTime dt = DateTime.Now;
dt.ToString("yyyy-MM-dd"); // "2024-01-15"
dt.ToString("HH:mm:ss"); // "14:30:45"
dt.ToString("MMMM dd, yyyy"); // "January 15, 2024"
dt.ToString("dddd"); // "Monday"
dt.ToString("o"); // ISO 8601
// TimeSpan
TimeSpan ts = TimeSpan.FromHours(2.5);
ts.ToString(@"hh\:mm\:ss"); // "02:30:00"
IFormattable
public class Temperature : IFormattable
{
public double Celsius { get; }
public Temperature(double celsius) => Celsius = celsius;
public string ToString(string? format, IFormatProvider? formatProvider)
{
return format?.ToUpperInvariant() switch
{
"C" => $"{Celsius:F1}°C",
"F" => $"{Celsius * 9 / 5 + 32:F1}°F",
"K" => $"{Celsius + 273.15:F1}K",
_ => $"{Celsius:F1}°C"
};
}
}
var temp = new Temperature(25);
Console.WriteLine($"{temp:C}"); // "25.0°C"
Console.WriteLine($"{temp:F}"); // "77.0°F"
Parsing
Basic Parsing
// Parse - throws on failure
int number = int.Parse("42");
double d = double.Parse("3.14");
DateTime date = DateTime.Parse("2024-01-15");
// TryParse - safe, returns bool
if (int.TryParse(input, out int result))
{
Console.WriteLine($"Parsed: {result}");
}
else
{
Console.WriteLine("Invalid input");
}
// With format provider
decimal price = decimal.Parse("1,234.56", CultureInfo.InvariantCulture);
// Exact date parsing
DateTime exact = DateTime.ParseExact(
"15/01/2024",
"dd/MM/yyyy",
CultureInfo.InvariantCulture);
Span-Based Parsing (High Performance)
ReadOnlySpan<char> input = "42".AsSpan();
// No allocation parsing
if (int.TryParse(input, out int value))
{
Console.WriteLine(value);
}
// Parse from middle of string without substring
string text = "Value: 42 units";
ReadOnlySpan<char> numberSpan = text.AsSpan(7, 2);
int parsed = int.Parse(numberSpan);
High-Performance String Operations
Span-Based String Manipulation
string text = "Hello, World!";
ReadOnlySpan<char> span = text.AsSpan();
// Slice without allocation
ReadOnlySpan<char> hello = span[..5]; // "Hello"
ReadOnlySpan<char> world = span[7..12]; // "World"
// Searching
int index = span.IndexOf(',');
int lastIndex = span.LastIndexOf('o');
// Comparison
bool equals = span.SequenceEqual("Hello, World!");
bool startsWith = span.StartsWith("Hello");
// Trimming (returns span, no allocation)
ReadOnlySpan<char> trimmed = " hello ".AsSpan().Trim();
string.Create
Create strings efficiently with a buffer.
// Create string with exact length, populate via span
string result = string.Create(10, 42, (chars, state) =>
{
state.TryFormat(chars, out int written);
chars[written..].Fill('0');
});
// More complex example
string formatted = string.Create(20, (name: "Alice", age: 30), (chars, state) =>
{
int pos = 0;
state.name.AsSpan().CopyTo(chars);
pos += state.name.Length;
chars[pos++] = ':';
state.age.TryFormat(chars[pos..], out int written);
});
SearchValues (C# 8.0 / .NET 8)
Optimized searching for multiple values.
// Create once, reuse for multiple searches
private static readonly SearchValues<char> Vowels =
SearchValues.Create("aeiouAEIOU");
public int CountVowels(ReadOnlySpan<char> text)
{
int count = 0;
int index;
while ((index = text.IndexOfAny(Vowels)) >= 0)
{
count++;
text = text[(index + 1)..];
}
return count;
}
CompositeFormat (C# 10 / .NET 6)
Pre-parsed format strings for repeated use.
private static readonly CompositeFormat LogFormat =
CompositeFormat.Parse("[{0:HH:mm:ss}] {1}: {2}");
public void Log(string level, string message)
{
string formatted = string.Format(null, LogFormat, DateTime.Now, level, message);
Console.WriteLine(formatted);
}
Regular Expressions
Basic Patterns
using System.Text.RegularExpressions;
string text = "Contact: john@example.com or jane@test.org";
// Simple match
bool hasEmail = Regex.IsMatch(text, @"\w+@\w+\.\w+");
// Find match
Match match = Regex.Match(text, @"\w+@\w+\.\w+");
if (match.Success)
{
Console.WriteLine(match.Value); // john@example.com
}
// Find all matches
MatchCollection matches = Regex.Matches(text, @"\w+@\w+\.\w+");
foreach (Match m in matches)
{
Console.WriteLine(m.Value);
}
// Replace
string replaced = Regex.Replace(text, @"\w+@\w+\.\w+", "[EMAIL]");
Compiled and Source-Generated Regex
// Compiled regex - faster for repeated use
private static readonly Regex EmailRegex = new(
@"\w+@\w+\.\w+",
RegexOptions.Compiled);
// Source-generated regex (C# 11 / .NET 7) - best performance
[GeneratedRegex(@"\w+@\w+\.\w+", RegexOptions.IgnoreCase)]
private static partial Regex EmailRegexGenerated();
// Usage
bool hasEmail = EmailRegexGenerated().IsMatch(text);
Groups and Captures
string input = "John Doe (john@example.com)";
var pattern = @"(?<name>\w+ \w+) \((?<email>\w+@\w+\.\w+)\)";
Match match = Regex.Match(input, pattern);
if (match.Success)
{
string name = match.Groups["name"].Value; // "John Doe"
string email = match.Groups["email"].Value; // "john@example.com"
}
String Interning
// Manual interning for frequently used runtime strings
string interned = string.Intern(computedString);
// Check if interned
string? existing = string.IsInterned(someString);
// Use cases:
// - Large number of duplicate strings
// - Strings used as dictionary keys repeatedly
// - Configuration values accessed frequently
// Caution: interned strings live for app lifetime
Version History
| Feature | Version | Significance |
|---|---|---|
| String interpolation | C# 6.0 | $ prefix |
| Span |
C# 7.2 | Zero-allocation slicing |
| Index/Range | C# 8.0 | String slicing syntax |
| Interpolated verbatim | C# 8.0 | $@”…” |
| string.Create | .NET Core 2.1 | Buffer-based creation |
| Raw string literals | C# 11 | ””” syntax |
| UTF-8 string literals | C# 11 | “text”u8 for direct UTF-8 bytes |
| Source-generated regex | C# 11 | Compile-time regex |
| SearchValues | .NET 8 | Optimized multi-value search |
Key Takeaways
Strings are immutable: Every modification creates a new string. Use StringBuilder for building strings in loops.
Use the right comparison: Ordinal for identifiers and paths, CurrentCulture for user-facing text.
Prefer Span for parsing: Use ReadOnlySpan<char> to avoid allocations when processing substrings.
string.Join over concatenation loops: More efficient and readable.
Compiled regex for hot paths: Use RegexOptions.Compiled or source-generated regex for performance.
Raw strings for embedded content: Use """ for JSON, SQL, or other content with quotes.
Found this guide helpful? Share it with your team:
Share on LinkedIn