2008年7月11日星期五

Tutorials and Reviews on Boosting

Cited from:http://www.cs.ucsd.edu/~aarvey/boosting_papers.html

Tutorials and Reviews on Boosting

  • A short introduction to boosting by Freund and Schapire, 1999. An introduction to the theory and application of boosting.
  • AdaBoost in action by Freund. An applet that shows how AdaBoost behaves during the training phase.
  • Schapire's "boosting approach to machine learning".
  • Ron Meir and Gunnar Ratsch's introduction to boosting and leveraging.

Reading Lists

I've highlighted the most important papers, even if they are ealier in the progression.

The AdaBoost Approach

  • A decision-theoretic generalization of on-line learning and an application to boosting. Freund and Schapire 1995/7. You have to look at this paper, but you don't have to read it.
  • Improved Boosting Algorithms Using Confidence-rated Predictions. Schapire and Singer 1999. Sections 1-4 are an absolute must read. Very concise and packed with useful interpretations.
  • The boosting approach to machine learning: An overview. Schapire 2002. Section 6. Supplement with the original Friedman, Hastie, Tibshirani paper if desired. Describes an alternative and gentler loss (a.k.a. potential, potential loss, cost, etc) function.
  • Boosting the Margin: A New Explanation for the Effectiveness of Voting Methods. Schapire et al. 1998. Sections 1,5,6. Understanding the idea is more important than the actual proofs.
  • Boosting Algorithms as Gradient Descent. Mason et al. 2000. Similar in spirit to the view of Friedman, Hastie, and Tibshirani 1998. Sections 1 and 2 develop the AnyBoost framework, which is a helpful generalizations to AdaBoost.

The Boost By Majority Approach

  • Boosting a weak learning algorithm by majority. Freund 1990/5. The first part of section 2 (before 2.1) describes the "boost by majority game".
  • Improved Boosting Algorithms Using Confidence-rated Predictions. Schapire and Singer 1999. Sections 1-4 are an absolute must read. Very concise and packed with useful interpretations.
  • An adaptive version of the boost by majority algorithm. Freund 1999/2000. IMHO, the biggest jump in boosting since Schapire's original algorithm. There are two main parts to the paper: 1) infinitely many iterations with infinitely small movements and 2) setting alpha by satisfying the constraints: a) expected value of $h(x)y$ is 0 w.r.t. the new weighting and b) the average difference in potential. If you understand these two ideas and their motivations (from Schapire and Singer 99 and Freund 1990/95) the algorithm falls out naturally.
  • Drifting games. Schapire 2000. A very nice generalization of the boost by majority paradigm to the more "natural" space. This interpretation is also useful for understanding BrownBoost.
  • Continuous Drifting games. Freund and Opper 2002. An extension of Schapire's drifting games to continuous domains. Both BrownBoost and NormalBoost are natural consequences of this work.

The Bregmen Divergences, Geometry, and Optimization Approach

  • Boosting as Entropy Projection. Kivinen and Warmuth 1999. An interesting and very different approach that minimizes convex divergence measures.
  • Logistic Regression, AdaBoost and Bregman Distances. Collins, Schapire, Singer 2000. Similar to Kivinen and Warmuth; however, it provides a few very practical results for converting AdaBoost to LogitBoost using an alternative loss function. Also ties in well with some of the statistical approaches to boosting.
  • Linear Programming Boosting via Column Generation. Demiriz, Bennett, and Shawe-Taylor 2002. Introduction of LPBoost and general relationship to linear programming and boosting. A very complete discussion, showing how LPBoost can be used in many of the classic statistical settings.
  • Totally Corrective Boosting Algorithms that Maximize the Margin. Warmuth, Liao, and Ratsch 2006. Puts the LPBoost and geometric (Bregman divergence of Kivinen and Warmuth) together. Instead of minimizing the correlation of the hypothesis and the subsequent weighting of the training set, TotalBoost minimizes the corrleations between all previous hypotheses and the next weighting. This leads to sparser sets of hypotheses and a very nifty iteration bound.
  • Boosting Algorithms for Maximizing the Soft Margin. Warmuth, Glocer, and Ratsch 2007. Simple moral: TotalBoost overfits, SoftBoost is noise resistant. SoftBoost comes with an iteration bound and noise resistance using slack variables in SVM literature.

Statistical Approaches to Boosting

  • Additive Logistic Regression: a Statistical View of Boosting. Friedman, Hastie, Tibshirani 1999. An interesting paper that casts boosting into the classic log likelihood model that all of statistics follows. The resulting algorithmic contribution, LogitBoost, can be implement in a AdaBoost framework (LogLossBoost in JBoost) using an alternative weighting. Be sure to read rejoinders as some of them gave me a chuckle.
  • Friedman has a paper on \epsilon boosting which sounds very interesting, though I have yet to read it.
  • Tong Zhang has a lot of very good papers. I'll list the highlights when I finish reading them... this may take a while.
  • The question of consistency was a big deal in the statistics community and was approached by a large variety of people. They showed variants of boosting were consistent, but never AdaBoost. I believe Peter Bartlett gets credit for finally showing that AdaBoost is consistent.
  • Evidence Contrary to the Statistcal View of Boosting. Mease and Wyner 2008. This paper is extremely important to read if you plan on doing research in boosting for practical purposes. It is not bulletproof (I'm sure someone will be able to find issues with it), but it is a well thought out and executed idea that uses empirical evidence instead of theory to drive boosting research. Some of the details of the paper may be proven slightly incorrect, but I believe taht the overall idea will stand up to scrutiny.

Misc Background Reading

  • The strength of weak learnability. Schapire 1990. Cool algorithm, cool analysis, cool write up. Unfortunately rendered useless by majority vote methods.
  • Any of the tutorials. There are a lot of really good boosting tutorials that cover the whole spectrum of applied to theoretical.
  • The alternating decision tree learning algorithm. Freund and Mason 1999. An interesting way to use boosting with decision trees/stumps.
  • Any of the empirical comparisons of different voting methods.
  • The ideas of bagging and arcing are frequently mentioned in boosting papers. These are ideas of the extremely well respected statistician Leo Breiman. While they haven't caught on in the same way that boosting has (see "Boosting the margin..." for an explanation) they are interesting ideas and bagging (along with bootstrapping) is widely used for variance reduction of any estimator.

Boosting Software

In a non-accidental ordering:
  1. JBoost. Powerful, extendible, highly tunable, efficient, and integratable into your own software. A bit of a learning curve, but well worth it.
  2. BoosTexter (also an implementation available in JBoost). Schapire's code for the text categorizing BoosTexter.
  3. GBM for R by Greg Ridgeway. A powerful and standardized module used mostly by statisticians.
  4. WEKA. User friendly.

0 评论: