Effective PowerShell Item 9: Regular Expressions – One of the Power Tools in PowerShell

Windows PowerShell is based on the .NET Framework.  That is, it is built using the .NET Framework and it exposes the .NET Framework to the user.  One very nice feature of the .NET Framework is the Regex class in the System.Text.RegularExpressions namespace.  It is a very capable regular expression engine.  PowerShell uses this regular expression engine in a number of scenarios:

  • -match operator
  • -notmatch operator
  • Select-String -Pattern parameter

Obviously to get the most out of these operators and the Select-String cmdlet it helps to have a good grasp of regular expressions.  PowerShell provides a help topic named "about_Regular_Expression" that you can view like so:

PS C:\> help about_reg*

This topic provides a nice quick reference on the various metacharacters in a regular expression but you are not going to learn a great deal about creating powerful regular expressions.  To learn how to get the most out of regular expressions and hence PowerShell, I highly recommend Jeffrey Friedl’s book Mastering Regular Expressions.  Right now on the Amazon site it has 117 reviews and its rating is 4 1/2 stars out of 5.

There is a shortcoming in PowerShell’s support for regular expressions that you need to know about.  Most other script languages support regular expression syntaxes where you can find all matches in a string.  For example in Perl I could do this:

$_ = "paul xjohny xgeorgey xringoy stu pete brian";  # PERL script
($first, $second, $third) = /x(.+?)y/g;

Unfortunately the Select-String cmdlet doesn’t have this feature – yet.  So for now you can work around this limitation by using the System.Text.RegularExpressions.Regex class directly.  Fortunately you don’t have to type that long class name because PowerShell has a type alias: [regex].  Very convenient!

PS C:\> $str = "paul xjohny xgeorgey xringoy stu pete brian"
PS C:\> $first,$second,$third = ([regex]’x(.+?)y’).matches($str) | foreach {$_.Groups[1].Value}
PS C:\> $first
john
PS C:\> $second
george
PS C:\> $third
ringo

One thing to watch out for is when your regular expression is written to search across line boundaries.  For instance, if you use Get-Content to grab the contents of a file to apply the regular expression against, keep in mind that Get-Content streams the file one line at a time.  For regular expressions that operate across lines you will need to apply the regex to the file contents represented as a single string.  In that case, I would do this:

PS C:> $regex = (?<CMultilineComment>/\*[^*]*\*+(?:[^/*][^*]*\*+)*/)
PS C:> Get-Content foo.c | Join-String -Newline | 
           foreach {$regex.matches($_)} | 
           foreach {$_.Groups["CMultilineComment"].Value}

Note the use of the PowerShell Community Extensions cmdlet "Join-String" which takes the individual strings output by Get-Content and creates a single string separated by newline characters.  Also note that this example shows the usage of a named capture: CMultilineComment. 

Now it would be even better if Select-String supported a "MatchAll" parameter that found all string matches in the specified file or string.  That said, this example does show that when PowerShell is missing a feature, the access that it provides to the .NET Framework is a great escape hatch! 

If I have one beef with regular expressions it is that there are a number of engines and their support for various features and metacharacters varies.  I’m especially annoyed that Visual Studio’s regular expression find & replace doesn’t use the .NET regular expression engine.  I constantly have to switch mental contexts when moving between the two.  Oh well, as long as you stay within PowerShell I think you will find that a good grasp of regular expressions will help you be more productive.

Advertisements
This entry was posted in Effective PowerShell. Bookmark the permalink.

2 Responses to Effective PowerShell Item 9: Regular Expressions – One of the Power Tools in PowerShell

  1. kze says:

    Hi Keith, I have been looking for regex information for powershell for the past couple days and I found this article very helpful, is there any book or resources I can find more about regex class, operator, best practice…etc in powershell? Thanks!
    -kze

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s