6.6 Cookbook Regular Expressions

To wrap up this overview of how regular expressions are used in C# applications, the following is a set of useful expressions that have been used in other environments.^[1]

^[1] These expressions were taken from the Perl Cookbook by Tom Christiansen and Nathan Torkington (O'Reilly), and updated for the C# environment by Brad Merrill of Microsoft.

Matching roman numerals:

string p1 = "^m*(d?c{0,3}|c[dm])"
  + "(l?x{0,3}|x[lc])(v?i{0,3}|i[vx])$";
string t1 = "vii";
Match m1 = Regex.Match(t1, p1);

Swapping first two words:

string t2 = "the quick brown fox";
string p2 = @"(\S+)(\s+)(\S+)";
Regex x2 = new Regex(p2);
string r2 = x2.Replace(t2, "$3$2$1", 1);

Matching "keyword = value" patterns:

string t3 = "myval = 3";
string p3 = @"(\w+)\s*=\s*(.*)\s*$";
Match m3 = Regex.Match(t3, p3);

Matching lines of at least 80 characters:

string t4 = "********************"
  + "******************************"
  + "******************************";
string p4 = ".{80,}";
Match m4 = Regex.Match(t4, p4);

Extracting date/time values (MM/DD/YY HH:MM:SS):

string t5 = "01/01/01 16:10:01";
string p5 =
  @"(\d+)/(\d+)/(\d+) (\d+):(\d+):(\d+)";
Match m5 = Regex.Match(t5, p5);

Changing directories (for Windows):

string t6 =
  @"C:\Documents and Settings\user1\Desktop\";
string r6 = Regex.Replace(t6,
  @"\\user1\\",
  @"\user2\");

Expanding (%nn) hex escapes:

string t7 = "%41"; // capital A
string p7 = "%([0-9A-Fa-f][0-9A-Fa-f])";
// uses a MatchEvaluator delegate
string r7 = Regex.Replace(t7, p7,
  HexConvert);

Deleting C comments (imperfectly):

string t8 = @"
/*
 * this is an old cstyle comment block
 */
";
string p8 = @"
  /\*  # match the opening delimiter
  .*? # match a minimal numer of characters
  \*/ # match the closing delimiter
";
string r8 = Regex.Replace(t8, p8, "", RegexOptions.Singleline
             | RegexOptions.IgnorePatternWhitespace);

Removing leading and trailing whitespace:

string t9a = "   leading";
string p9a = @"^\s+";
string r9a = Regex.Replace(t9a, p9a, "");
  
string t9b = "trailing  ";
string p9b = @"\s+$";
string r9b = Regex.Replace(t9b, p9b, "");

Turning "\" followed by "n" into a real newline:

string t10 = @"\ntest\n";
string r10 = Regex.Replace(t10, @"\\n", "\n");

Detecting IP addresses:

string t11 = "55.54.53.52";
string p11 = "^" +
  @"([01]?\d\d|2[0-4]\d|25[0-5])\." +
  @"([01]?\d\d|2[0-4]\d|25[0-5])\." +
  @"([01]?\d\d|2[0-4]\d|25[0-5])\." +
  @"([01]?\d\d|2[0-4]\d|25[0-5])" +
  "$";
Match m11 = Regex.Match(t11, p11);

Removing leading path from filename:

string t12 = @"c:\file.txt";
string p12 = @"^.*\\";
string r12 = Regex.Replace(t12, p12, "");

Joining lines in multiline strings:

string t13 = @"this is 
a split line";
string p13 = @"\s*\r?\n\s*";
string r13 = Regex.Replace(t13, p13, " ");

Extracting all numbers from a string:

string t14 = @"
test 1
test 2.3
test 47
";
string p14 = @"(\d+\.?\d*|\.\d+)";
MatchCollection mc14 = Regex.Matches(t14, p14);

Finding all caps words:

string t15 = "This IS a Test OF ALL Caps";
string p15 = @"(\b[^\Wa-z0-9_]+\b)";
MatchCollection mc15 = Regex.Matches(t15, p15);

Finding all lowercase words:

string t16 = "This is A Test of lowercase";
string p16 = @"(\b[^\WA-Z0-9_]+\b)";
MatchCollection mc16 = Regex.Matches(t16, p16);

Finding all initial caps words:

string t17 = "This is A Test of Initial Caps";
string p17 = @"(\b[^\Wa-z0-9_][^\WA-Z0-9_]*\b)";
MatchCollection mc17 = Regex.Matches(t17, p17);

Finding links in simple HTML:

string t18 = @"
<html>
<a href=""http://windows.oreilly.com/news/first.htm"">first tag text</a>
<a href=""http://windows.oreilly.com/news/next.htm"">next tag text</a>
</html>
";
string p18 = @"<A[^>]*?HREF\s*=\s*[""']?"
  + @"([^'"" >]+?)[ '""]?>";
MatchCollection mc18 = Regex.Matches(t18, p18, RegexOptions.IgnoreCase
          | RegexOptions.Singleline);

Finding middle initials:

string t19 = "Hanley A. Strappman";
string p19 = @"^\S+\s+(\S)\S*\s+\S";
Match m19 = Regex.Match(t19, p19);

Changing inch marks to quotation marks:

string t20 = @"2' 2"" ";
string p20 = "\"([^\"]*)";
string r20 = Regex.Replace(t20, p20, "``$1''");

Part II: Programming with the .NET Framework

Chapter 6. String Handling

6.1 String Class

6.2 StringBuilder Class

6.3 Regular Expression Support

6.4 Regular Expression Basics

6.5 Procedural- and Expression-Based Patterns

6.6 Cookbook Regular Expressions

Chapter 7. Collections

Chapter 8. XML I/O

Chapter 9. Networking

Chapter 10. Streams and I/O

Chapter 11. Serialization

Chapter 12. Assemblies

Chapter 13. Reflection

Chapter 14. Custom Attributes

Chapter 15. Memory Management

Chapter 16. Threading

Chapter 17. Integrating with Native DLLs

Chapter 18. Integrating with COM Components

Chapter 19. Diagnostics

Chapter 20. C# Language Reference

Part IV: API Quick Reference

Part V: Appendixes