Bayesian Filter

A Bayesian filter is a computer program using Bayesian logic or Bayesian analysis, which are synonymous terms. It is used to evaluate the header and content of email messages and determine whether or not it constitutes spam – unsolicited email or the electronic equivalent of hard copy bulk mail or junk mail). A Bayesian filter is best used along with anti-virus programs.

A Bayesian filter works with probabilities of specific words appearing in the header or content of an email. Certain words indicate a high probability that the email is spam, such as Viagra and refinance. The filter does not start out knowing the likelihood that a word indicates a high probability of spam. Users must manually identify the email as spam. When enough occurrences of the word are found and the email is identified as spam, the Bayesian filter “learns” to identify the word using likelihood functions. It does the same with many other words and phrases. Over time, the Bayesian filter becomes more and more effective at identifying spam for a particular user. When the probability reaches a certain threshold, such as 95 percent, then the email is identified as spam and often moved to a junk folder (or sometimes even deleted automatically). The user can periodically view it and decide whether or not to delete it. Alternately, some spam programs will move it to a quarantine location where users can view the email and review the software’s decision.

Initial “training” can often be refined to reduce false positives or false negatives when wrong judgments are found. This allows the software’s Bayesian filter to adapt to the constantly evolving nature of spam.

Some spam filters also use heuristics along with the Bayesian filter. Pre-defined rules are setup by the user to further increase the accuracy of identifying email as spam. These rules may involve the number of occurrences of a given word, eliminate or ignore neutral words like “the,” “a” or “some ” or identify sequences of works such as “Viagra is good for,” as opposed to applying a likelihood function to all four individual words.

Spammers may use a technique called Bayesian poisoning to degrade the effectiveness of spam filters using Bayesian filtering. Some techniques include injecting legitimate text from news or literary sources, using random innocuous words infrequently found in spam or even replacing text with pictures.

Many email clients disable displaying pictures for security reasons. Thus, the spam may reach fewer recipients.

A Bayesian filter using Bayesian logic can be used to classify any sort of data. Medicine, science, and engineering all have found uses. Interestingly, scientific researchers have speculated that even the human brain may use Bayesian logic methodology to classify stimuli and determine specific response behaviors.

Post a Comment

0 Comments