Every once in a while I find myself in a conversation about scanning for viruses from code (yes, my life is that exciting). The scenario often goes like this:
A middle-manager, having recently learned about viruses from his son’s copy of Wired magazine, realizes you’re saving user-uploaded files to your web server, and asks if you’re performing virus scans on the uploaded files. You panic, mumble something about how it’s “in the works,” and rush off to look for an open source virus scanning component.
You frantically search Google for “virus scan from [language of your choice]” but the results are dismal. You try 5 or 6 other searches and they all yield the same result: people like yourself asking this same question on forum after forum, with no helpful answers.
A while ago I went down this rabbit trail (sans the middle-manager) trying to scan for viruses from ASP.NET / C#. After working on it for a few days I arrived at the following conclusions:
Symantec and McAfee hose you
One of these companies doesn’t have any kind of API, the other has a command line tool that gives you the false confidence that you can hit it from code, but 7 hours later you’ll realize that although it runs fine on the command line, executing it from code inexplicably results in absolutely no output. In other words, you can run it but you’ll have to guess what the results were. (I forget which company does which, but spend 2 or 3 hours on their websites and you’ll find the answer).
There are a bunch of virus scanning companies you’ve never head of
Seriously. I had no idea there are so many companies in this space.
Sophos has a good command line interface
After pulling my hair out with Symantec/McAfee for a couple days it was a snap to get a call working from .NET to Sophos’ command line interface. In a couple hours I had a fully functional virus scanner. Hooray!
Sophos’ server license is expensive
Sophos runs about $100 for a license, but as soon as you install it on a server they charge for a server license, which is around $1000. Ouch.
It’s likely this is all a waste of your time
If you have one or more real-time anti-virus scanning engines running on your server, as soon as you save an infected file to the file system it will disappear (quarantined by your a/v software). And since you can’t scan a file without first saving it to the file system, none of this is really necessary. The quick and dirty approach is to save the file to the file system and check if it exists. If it does, no virus. If it’s gone, virus.
Of course, there’s a big assumption there that your real-time a/v software is running. Using this method depends on your risk tolerance: for 1% of the work you can have 99.5% of the security. If you (or your middle-manager) need that last .05%, see below.
For the last .05%, try MetaScan
After a few days of working through this issue I found a product that essentially wraps the major virus scanning engines and provides you with an API. It’s not cheap (pricing is here), but a better approach than cobbling something together on your own, and it allows you to scan the same file using multiple engines.
I haven’t actually used it so I can’t recommend it per se, but currently it’s the only game in town so worth a look. Note: this is not a sponsored post and I receive no kickbacks from MetaScan.
A reader wrote in and said the following. I have not tested this (this code was deleted years ago), but I wanted to update the post in case someone else runs into this issue in the future:
In your attempts to try every working sample online, you probably failed to realize that the issue wasn’t with the code, but rather with the process model running the command line executable. If the process model was running under the context of a user account that did not have permission to run the executable, it would result in the mysterious “no output”. Making sure the account on your application pool was a user with access to run the command line tool should have resolved that.