Cyber Defense

PowerShell Script To Search Log Files With Regular Expressions

If you are consolidating all of the logs from your servers, firewalls, IDS sensors and other devices into text files on a protected server, how are you going to search all that data? Or if you have textual output from analysis or scanning tools, how can you extract just the lines which match at least one of the regular expression patterns from your set of patterns?

The PowerShell script (Search-TextLog.ps1) uses a text file containing regular expression patterns (signatures.txt) to search a textual log file, or, actually, to search a text file of any type, just so long as each entry to be searched in the textual file/log does not span more than one line. Each line from the log is compared against every regex in the patterns file. The patterns file must contain a description of what each regex indicates if a match to that pattern is found. When the search completes, a summary report is shown with the count of matches to each pattern. The output is an array of objects with three properties on which you can filter: Count, Description, and the regex Pattern itself.

Get the script from the SEC505 scripts zip file on the Downloads page of this blog (look in the Day6-PowerShell folder inside the zip).

searchtextlog1

Alternatively, if you use the -ShowMatchedLines switch with the script, the summary will not be shown, but every line from the log which matched at least one pattern will be outputted (and the line will be output only once, no matter how many additional patterns it might have also matched).

searchtextlog2

The attached zip with the script includes a sample log file to search (iis.log) and a sample patterns file (signatures.txt) with which to search it. You can also get it from the Downloads page. (Please note that the sample patterns and their descriptions are just for illustration, please don't get vexed about their details, they're not exactly for a production HIDS, they're just to convey the idea.)

The bad thing about the script, though, is its slow performance in comparison to grep. And there's no way around it either, the price of all the object-orientation in PowerShell is slow performance. The script should be used instead of grep when you want to further pipe or manipulate the match objects, when the performance difference is negligible, or when grep can't be used for some odd reason. Also, take care to write clean regex patterns which don't tax the CPU unnecessarily, which is true in any case, and can make a big difference in processing time.

The Patterns File (signatures.txt)

In real life you would have a different patterns file for each type of log you wanted to search, e.g., syslog, web, ftp, smtp, firewall, etc. You will look for different things in different types of files/logs, hence, you'll have different regular expressions for each. The format of each line in the patterns file must be "<regexpattern> tab(s) <description>". Blank lines and lines which begin with hashmarks (#) or semicolons (;) are ignored. If you are going to use the -ShowMatchedLines switch a lot, you can optimize your searches by putting the patterns which will match the most entries near the top of the patterns file.

1 Comments

Posted July 28, 2009 at 2:58 PM | Permalink | Reply

PatrickMc

Useful article. Thanks.
In addition to powershell, we use (I am a system admin) biterscripting for parsing and reporting on web server logs. I started with the sample script posted at http://www.biterscripting.com/SS_WebLogParser.html and modified it to meet my requirements.
Patrick

Post a Comment






Captcha


* Indicates a required field.