Kaggle

Kaggle is the world’s largest community of data scientists. They compete with each other to solve complex data science problems, and the top competitors are invited to work on the most interesting and sensitive business problems from some of the world’s biggest companies through Masters competitions.

Kaggle provides cutting-edge data science results to companies of all sizes. We have a proven track-record of solving real-world problems across a diverse array of industries including life sciences, financial services, energy, information technology, and retail.

The confidence to go forward and compete for some serious $$$$$$.

Frequently Asked Questions - Titanic: Machine Learning from Disaster | Kaggle
nil
Further Reading / Watching - Titanic: Machine Learning from Disaster | Kaggle

readings

  1. Getting Started with Pandas – Predicting SAT Scores for New York City Schools | no free hunch

The historical data has been split into two groups, a ‘training set’ and a ‘test set’. For the training set, we provide the outcome ( ‘ground truth’ ) for each passenger. You will use this set to build your model to generate predictions for the test set.

For each passenger in the test set, you must predict whether or not they survived the sinking ( 0 for deceased, 1 for survived ). Your score is the percentage of passengers you correctly predict.

The Kaggle leaderboard has a public and private component. 50% of your predictions for the test set have been randomly assigned to the public leaderboard ( the same 50% for all users ). Your score on this public portion is what will appear on the leaderboard. At the end of the contest, we will reveal your score on the private 50% of the data, which will determine the final winner. This method prevents users from ‘overfitting’ to the leaderboard.

Tutorials

  1. http://nbviewer.jupyter.org/github/agconti/kaggle-titanic/blob/master/Titanic.ipynb
  2. http://nbviewer.jupyter.org/github/jrjohansson/scientific-python-lectures/blob/master/Lecture-4-Matplotlib.ipynb
  3. http://nbviewer.jupyter.org/github/agconti/BlueBook/blob/master/BlueBook.ipynb
  4. https://github.com/wehrley/wehrley.github.io/blob/master/SOUPTONUTS.md

Notes

Let \((x1, x2, …, xn)\) be an independent and identically distributed sample drawn from some distribution with an unknown density \(ƒ\). We are interested in estimating the shape of this function \(ƒ\). Its kernel density estimator is

\[\hat{f}_h(x) = \frac{1}{n}\sum_{i=1}^n K_h (x - x_i) = \frac{1}{nh} \sum_{i=1}^n K\Big(\frac{x-x_i}{h}\Big), \]

Kernel density estimation of 100 normally distributed random numbers using different smoothing bandwidths.

Kernel density estimation of 100 normally distributed random numbers using different smoothing bandwidths.

where \(K(•)\) is the kernel — a non-negative function that integrates to one and has mean zero — and h > 0 is a smoothing parameter called the bandwidth. A kernel with subscript h is called the scaled kernel and defined as \(Kh(x) = 1/h K(x/h)\). Intuitively one wants to choose h as small as the data allow, however there is always a trade-off between the bias of the estimator and its variance; more on the choice of bandwidth below.

Comparison of the histogram (left) and kernel density estimate (right) constructed using the same data. The 6 individual kernels are the red dashed curves, the kernel density estimate the blue curves. The data points are the rug plot on the horizontal axis.

Comparison of the histogram (left) and kernel density estimate (right) constructed using the same data. The 6 individual kernels are the red dashed curves, the kernel density estimate the blue curves. The data points are the rug plot on the horizontal axis.

TODO

  1. hrojas / Learn Pandas — Bitbucket
  2. Learn Pandas 01
  3. Learn Pandas 02
  4. Learn Pandas 03
  5. Learn Pandas 04
  6. Learn Pandas 05
  7. Learn Pandas 06
  8. Learn Pandas 07
  9. Learn Pandas 08
  10. Learn Pandas 09
  11. Learn Pandas 10
  12. Learn Pandas 11
  13. Pandas Bootcamp

refs and see also

  1. Kernel density estimation - Wikipedia, the free encyclopedia

「Load Disqus | 加载评论」