Effective PowerShell Item 13: Comparing Arrays in Windows PowerShell

PowerShell has a lot of useful operators such as -contains which tests if an array contains an particular element.  But as far as I can tell PowerShell doesn’t "seem" to provide an easy way to test if two array’s contents are equal.  This if often quite handy and I was a bit surprised by this apparent omission. 

I came upon this need to compare arrays while answering a question on the microsoft.public.windows.powershell newsgroup.  The poster wanted to find UTF8 encoded files by inspecting their BOM or byte order mark.  One relatively straight forward approach to this is:

PS> $preamble = [System.Text.Encoding]::UTF8.GetPreamble()
PS> $preamble | foreach {"0x{0:X2}" -f $_}
0xEF
0xBB
0xBF
PS> $fileHeader = Get-Content Utf8File.txt -Enc byte -Total 3
PS> $fileheader | foreach {"0x{0:X2}" -f $_}
0xEF
0xBB
0xBF

While it is easy enough to visually inspect this and see we have a match, visual inspection doesn’t work in a script.  :-)  You could also test each individual element which isn’t bad for a three element array but when you hit say 10 elements that approach might starting looking tedious. 

You might think that we could just compare these two arrays directly like so:

PS> $preamble -eq $fileHeader | Get-TypeName # Get-TypeName is from the PowerShell Community Extensions
WARNING: Get-TypeName did not receive any input. The input may be an empty collection. You can either 
prepend the collection expression with the comma operator e.g. ",$collection | gtn" or you can pass the
variable or expression to Get-TypeName as an argument e.g. "gtn $collection".
PS> $preamble -eq 0xbb 187

But comparing arrays via the -eq operator doesn’t actually compare the contents of two arrays.  As you can see above, this results in no output.  When the left hand side of the -eq operator is an array, PowerShell return the elements of the array that match the value specified on the right hand side (shown above where I test for -eq to 0xbb).

OK so it looks like we need to roll our own mechanism to compare arrays.  Here is one way:

function AreArraysEqual($a1, $a2) {
    if ($a1 -isnot [array] -or $a2 -isnot [array]) {
      throw "Both inputs must be an array"
    }
    if ($a1.Rank -ne $a2.Rank) {
      return $false 
    }
    if ([System.Object]::ReferenceEquals($a1, $a2)) {
      return $true
    }
    for ($r = 0; $r -lt $a1.Rank; $r++) {
      if ($a1.GetLength($r) -ne $a2.GetLength($r)) {
            return $false
      }
    }

    $enum1 = $a1.GetEnumerator()
    $enum2 = $a2.GetEnumerator()   

    while ($enum1.MoveNext() -and $enum2.MoveNext()) {
      if ($enum1.Current -ne $enum2.Current) {
            return $false
      }
    }
    return $true
}

And it works as expected:

PS> AreArraysEqual $preamble $fileHeader
True

However there turns out to be a way to do this within PowerShell but it isn’t exactly obvious.  At least it wasn’t to me – at first. 

PS> @(Compare-Object $preamble $fileHeader -sync 0).Length -eq 0
True

Good old Compare-Object will compare the arrays and if there are no differences it won’t output anything.  If we wrap the output of Compare-Object in an array subexpression @() then we will get an array with either 0 or more elements.  A simple compare of the length to 0 will confirm that there was no output, hence the arrays are equal. 

[Updated: 5/12/2008 - need to use -SyncWindow 0 to get correct result - thanks Arnoud and Roman]  Let me elaborate more on this updated information.  As Roman points out in the comments on this post, Compare-Object compares two objects to see if they have the same set of elements.  Normally it does not care if the elements are in the same sequence in each object (each array in this case).  For example:

PS> $a1 = 1,1,2
PS> $a2 = 1,2,1
PS> @(Compare-Object $a1 $a2).length -eq 0
True

Obviously that isn’t what we want when comparing arrays for equality.  Fortunately, as Arnoud points out, we can use the SyncWindow parameter with a value 0 to get Compare-Object to "force sequence equality" as Arnoud succinctly phrases it.

How about performance of these two approaches:

PS> $a1 = 1..10000
PS> $a2 = 1..10000
PS> (Measure-Command { AreArraysEqual $a1 $a2 }).TotalSeconds
1.236252
PS> (Measure-Command { @(Compare-Object $a1 $a2 -sync 0).Length -eq 0 }).TotalSeconds
0.3259954

Compare-Object beats out my PowerShell function by a good margin which isn’t too surprising[1].  After all, one is compiled code and the other is interpreted script.  So there you have it.  If you need a quick way to compare to arrays, just remember that arrays are objects too and that is what Compare-Object does best – compare two objects.

[1] – Except for comparing against the same array where my function is two orders of magnitude faster.  It seems that the Compare-Object cmdlet could benefit from a quick System.Object.ReferenceEquals check.  :-)  Admittedly this is a bit of a corner case scenario.

About these ads
This entry was posted in Effective PowerShell. Bookmark the permalink.

6 Responses to Effective PowerShell Item 13: Comparing Arrays in Windows PowerShell

  1. Roman says:

    Mind this: Compare-Object compares "sets", not "sequences", so that these arrays are equal as sets, though they are not equal as arrays:
    $a1 = @(1, 2, 3)$a2 = @(3, 1, 2)Compare-Object $a1 $a2
    (Or am I missing something? If not, then perhaps it would be nice to have an option in Compare-Object to force "sequence" equality).–Thanks,Roman Kuzmin

  2. Arnoud says:

    To Roman\’s comments:
     
    You could force "sequence equality" by setting SyncWindow to 0:
    Compare-Object $a1 $a2 -SyncWindow 0
     
    On the other hand, if you wanted to compare the arrays as sets, regardless of sequence, you may need to sort the input objects (setting the SyncWindow to a large value would be much slower):
     
    $a1 = 1..10000$a2 = 10000..1
    (Compare-Object ($a1 | Sort) ($a2 | Sort)) -eq $null
     
    Regards,
    Arnoud
     

  3. Arnoud says:

    To Roman\’s comments:
     
    You could force "sequence equality" by setting SyncWindow to 0:
    Compare-Object $a1 $a2 -SyncWindow 0
     
    On the other hand, if you wanted to compare the arrays as sets, regardless of sequence, you may need to sort the input objects (setting the SyncWindow to a large value would be much slower):
     
    $a1 = 1..10000$a2 = 10000..1
    (Compare-Object ($a1 | Sort) ($a2 | Sort)) -eq $null
     
    Regards,
    Arnoud
     

  4. Arnoud says:

    To Roman\’s comments:
     
    You could force "sequence equality" by setting SyncWindow to 0:
    Compare-Object $a1 $a2 -SyncWindow 0
     
    On the other hand, if you wanted to compare the arrays as sets, regardless of sequence, you may need to sort the input objects (setting the SyncWindow to a large value would be much slower):
     
    $a1 = 1..10000$a2 = 10000..1
    (Compare-Object ($a1 | Sort) ($a2 | Sort)) -eq $null
     
    Regards,
    Arnoud
     

  5. Roman says:

    Hi Arnoud,
     
    -SyncWindow 0, you say? That\’s exactly what I was looking for!
     

    Thanks,
    Roman Kuzmin

  6. Keith says:

    Arnoud and Roman, thanks for the correction!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s