Tail-Content – Better Performance for Grabbing Last Lines From Large (ASCII) Log Files

Necessity is the mother of invention, or in the case of Windows PowerShell, a new script.  I have a set of 23 large (28 MB) log files on a remote machine in which I need to verify that the last line of each of them is identical.  My first “naive” approach was to do this:

PS> get-content \\server\share\logs\*26_03.csv  | select -last 1

Yeah unfortunately that took so long that I killed it and set out to create a script that would “efficiently” tail a file.  Now my log files were ASCII encoded so it made the task much easier.  Bottom line is that there is FileStream object in .NET that allows you to start at the end of a file and work backwards.  The approach above using Get-Content requires that PowerShell get every single line from a log file with ~1,000,000 lines in it and over the network to boot. 

With FileStream you can read from the end of the file backwards.  Because my files are ASCII, that is very easy to do.  So I created a Tail-Content.ps1 script that you can download.  Note that it doesn’t work on Unicode encoded files and it doesn’t do active tailing.  However, it is very fast for large files.  There are a few interesting parts of the code to examine.  First, if you want to handle paths like PowerShell does, the snippets below show you how to setup your parameters to handle wildcard expansion and literal paths.  This does require that you are on version 2 of PowerShell:

   1: [CmdletBinding(DefaultParameterSetName="Path")]
   2: param(
   3:     [Parameter(Mandatory=$true, 
   4:                Position=0, 
   5:                ParameterSetName="Path", 
   6:                ValueFromPipeline=$true, 
   7:                ValueFromPipelineByPropertyName=$true)]
   8:     [string[]]
   9:     $Path,
  10:     
  11:     [Alias("PSPath")]
  12:     [Parameter(Mandatory=$true, 
  13:                Position=0, 
  14:                ParameterSetName="LiteralPath", 
  15:                ValueFromPipelineByPropertyName=$true)]
  16:     [string[]]
  17:     $LiteralPath,
  18:     
  19:     <elided>
  20: )

Note that the default parameter set is “Path” and the Path parameter accepts pipeline input by name and by value.  This means that raw strings will work as paths assuming they actually contain valid paths.  Also note that both parameters are of type string array.  The LiteralPath parameter is defined in a different, mutually exclusive parameter set named “LiteralPath” and it binds to pipeline input only by property name.  It is important that we also decorated the LiteralPath parameter with the Alias attribute “PSPath”.  This way output of Get-ChildItem (FileInfo) gets bound by property name to the LiteralPath parameter by virtue that PSPath is an alias for the same parameter.  This happens because there is no Path property on FileInfo but there is a PSPath property.  Remember that PowerShell extends the FileInfo type by adding the PSPath NoteProperty.

That sets up the parameters, now here is what you need to do in your Process function to handle Path parameters which could specify paths with wildcards in them:

   1: Process 
   2: {
   3:     if ($psCmdlet.ParameterSetName -eq "Path")
   4:     {
   5:         # In the non-literal case we may need to resolve a wildcarded path
   6:         $resolvedPaths = @()
   7:         foreach ($apath in $Path) 
   8:         {
   9:             $resolvedPaths += @(Resolve-Path $apath | Foreach { $_.Path })
  10:         }
  11:     }
  12:     else 
  13:     {
  14:         $resolvedPaths = $LiteralPath
  15:     }
  16:             
  17:     foreach ($rpath in $resolvedPaths) 
  18:     {
  19:         $PathIntrinsics = $ExecutionContext.SessionState.Path
  20:         
  21:         if ($PathIntrinsics.IsProviderQualified($rpath))
  22:         {
  23:             $rpath = $PathIntrinsics.GetUnresolvedProviderPathFromPSPath($rpath)
  24:         }
  25:         
  26:         Write-Verbose "<cmdlet-name> processing $rpath"
  27:  
  28:         #process file here
  29:     }
  30: }

On line 3 I test which ParameterSet is being used.  If it is the Path parameter set then we need to resolve the paths specified because they may contain wildcards.  I do that on line 9 using Resolve-Path.  Then on line 17 we iterate through each path and process it.  One other detail that you may or may not need to worry about is that $rpath at this point may contain a provider qualified path e.g. Microsoft.PowerShell.Core\FileSystem::C:\foo.txt.  These work fine with PowerShell however if you need to pass this path to a .NET object it won’t recognize that as a valid path.  So on line 21 I check to see if we have a provider qualified path and if I do I get the raw path using $ExecutionContext.SessionState.Path.GetUnresolvedProviderPathFromPSPath as shown on line 23.  The rest of this script just does low-level byte reads from the end of the file. 

I went back and measured my original approach of using { Get-Content \\server\share\logs\*_26_03.csv | Select -Last 1} and it took ~13 minutes.  Using my Tail-Content script, it took < 1 second.  That is a speed up of about 1789x!

psmdtag:dotnet: FileStream
psmdtag:script: Tail-Content
psmdtag:sample: Advanced Function

This entry was posted in PowerShell. Bookmark the permalink.

6 Responses to Tail-Content – Better Performance for Grabbing Last Lines From Large (ASCII) Log Files

  1. Garan Keeler says:

    Worked perfectly. Thank you very much!

  2. Ben says:

    Very handy, thanks. Using this to parse the end of very large robocopy log files. One change I made — I modified line 183 to use Write-Output instead of Write-Host, so I can pipe the lines into an array and extract what I need programmatically, rather than just displaying it on the screen.

    • Daniel says:

      Ben,

      Since PowerShell 3.0 came out in September 2012, you can use the “Tail” parameter, or its alias “Last” directly on the Get-Content cmdlet.

      In my testing, on a 955MB log file of around 4.5M lines, Get-Content -Tail took 2.5 milliseconds, and the pipe to Select-Object -Last took 1 minute, 28 seconds.

      Daniel

  3. Ryan says:

    Hey Keith – I’m at a shop that has some locked down servers 2008 R2 servers that do not have PowerShell 3.0 so your script helped us a ton because we had to parse log files over 10GB. I made one tweak for our purposes – rather than using ‘Write-Host’ to display the lines that are at the tail of the file I used ‘Write-Output’ so that I could pipe the data to Set-Content to make a separate file.

    Thanks again.

  4. Sudeep says:

    I am unable to use tail-content along with GC, appreciate if you could help in this. Basically I would like to select specific value updated latest from log file which keeps changing every min and where my log file size is > 40mb.
    Below is the PS command i am trying to use but not working. Kindly help. Thanks.

    $a=(( gc -100 (Join-Path $location $fileName1) tail-content 100 | ? { $_.Contains( ‘epm,’ ) } |Select-Object -last 1) -split ‘ ‘)[12]

    • rkeithhill says:

      Get-Content now supports tailing natively. Just use: Get-Content (Join-Path $location $filename) -Tail 100 -Wait to tail the file and have it update. If you don’t need to wait for changes to the file, just get rid of the -Wait parameter.

Leave a reply to Ben Cancel reply