Tail-Content – Better Performance for Grabbing Last Lines From Large (ASCII) Log Files

Necessity is the mother of invention, or in the case of Windows PowerShell, a new script.  I have a set of 23 large (28 MB) log files on a remote machine in which I need to verify that the last line of each of them is identical.  My first “naive” approach was to do this:

PS> get-content \\server\share\logs\*26_03.csv  | select -last 1

Yeah unfortunately that took so long that I killed it and set out to create a script that would “efficiently” tail a file.  Now my log files were ASCII encoded so it made the task much easier.  Bottom line is that there is FileStream object in .NET that allows you to start at the end of a file and work backwards.  The approach above using Get-Content requires that PowerShell get every single line from a log file with ~1,000,000 lines in it and over the network to boot. 

With FileStream you can read from the end of the file backwards.  Because my files are ASCII, that is very easy to do.  So I created a Tail-Content.ps1 script that you can download.  Note that it doesn’t work on Unicode encoded files and it doesn’t do active tailing.  However, it is very fast for large files.  There are a few interesting parts of the code to examine.  First, if you want to handle paths like PowerShell does, the snippets below show you how to setup your parameters to handle wildcard expansion and literal paths.  This does require that you are on version 2 of PowerShell:

   1: [CmdletBinding(DefaultParameterSetName="Path")]
   2: param(
   3:     [Parameter(Mandatory=$true, 
   4:                Position=0, 
   5:                ParameterSetName="Path", 
   6:                ValueFromPipeline=$true, 
   7:                ValueFromPipelineByPropertyName=$true)]
   8:     [string[]]
   9:     $Path,
  10:     
  11:     [Alias("PSPath")]
  12:     [Parameter(Mandatory=$true, 
  13:                Position=0, 
  14:                ParameterSetName="LiteralPath", 
  15:                ValueFromPipelineByPropertyName=$true)]
  16:     [string[]]
  17:     $LiteralPath,
  18:     
  19:     <elided>
  20: )

Note that the default parameter set is “Path” and the Path parameter accepts pipeline input by name and by value.  This means that raw strings will work as paths assuming they actually contain valid paths.  Also note that both parameters are of type string array.  The LiteralPath parameter is defined in a different, mutually exclusive parameter set named “LiteralPath” and it binds to pipeline input only by property name.  It is important that we also decorated the LiteralPath parameter with the Alias attribute “PSPath”.  This way output of Get-ChildItem (FileInfo) gets bound by property name to the LiteralPath parameter by virtue that PSPath is an alias for the same parameter.  This happens because there is no Path property on FileInfo but there is a PSPath property.  Remember that PowerShell extends the FileInfo type by adding the PSPath NoteProperty.

That sets up the parameters, now here is what you need to do in your Process function to handle Path parameters which could specify paths with wildcards in them:

   1: Process 
   2: {
   3:     if ($psCmdlet.ParameterSetName -eq "Path")
   4:     {
   5:         # In the non-literal case we may need to resolve a wildcarded path
   6:         $resolvedPaths = @()
   7:         foreach ($apath in $Path) 
   8:         {
   9:             $resolvedPaths += @(Resolve-Path $apath | Foreach { $_.Path })
  10:         }
  11:     }
  12:     else 
  13:     {
  14:         $resolvedPaths = $LiteralPath
  15:     }
  16:             
  17:     foreach ($rpath in $resolvedPaths) 
  18:     {
  19:         $PathIntrinsics = $ExecutionContext.SessionState.Path
  20:         
  21:         if ($PathIntrinsics.IsProviderQualified($rpath))
  22:         {
  23:             $rpath = $PathIntrinsics.GetUnresolvedProviderPathFromPSPath($rpath)
  24:         }
  25:         
  26:         Write-Verbose "<cmdlet-name> processing $rpath"
  27:  
  28:         #process file here
  29:     }
  30: }

On line 3 I test which ParameterSet is being used.  If it is the Path parameter set then we need to resolve the paths specified because they may contain wildcards.  I do that on line 9 using Resolve-Path.  Then on line 17 we iterate through each path and process it.  One other detail that you may or may not need to worry about is that $rpath at this point may contain a provider qualified path e.g. Microsoft.PowerShell.Core\FileSystem::C:\foo.txt.  These work fine with PowerShell however if you need to pass this path to a .NET object it won’t recognize that as a valid path.  So on line 21 I check to see if we have a provider qualified path and if I do I get the raw path using $ExecutionContext.SessionState.Path.GetUnresolvedProviderPathFromPSPath as shown on line 23.  The rest of this script just does low-level byte reads from the end of the file. 

I went back and measured my original approach of using { Get-Content \\server\share\logs\*_26_03.csv | Select -Last 1} and it took ~13 minutes.  Using my Tail-Content script, it took < 1 second.  That is a speed up of about 1789x!

psmdtag:dotnet: FileStream
psmdtag:script: Tail-Content
psmdtag:sample: Advanced Function

About these ads
This entry was posted in PowerShell. Bookmark the permalink.

3 Responses to Tail-Content – Better Performance for Grabbing Last Lines From Large (ASCII) Log Files

  1. Garan Keeler says:

    Worked perfectly. Thank you very much!

  2. Ben says:

    Very handy, thanks. Using this to parse the end of very large robocopy log files. One change I made — I modified line 183 to use Write-Output instead of Write-Host, so I can pipe the lines into an array and extract what I need programmatically, rather than just displaying it on the screen.

    • Daniel says:

      Ben,

      Since PowerShell 3.0 came out in September 2012, you can use the “Tail” parameter, or its alias “Last” directly on the Get-Content cmdlet.

      In my testing, on a 955MB log file of around 4.5M lines, Get-Content -Tail took 2.5 milliseconds, and the pipe to Select-Object -Last took 1 minute, 28 seconds.

      Daniel

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s