Free books available online
- 
		       Applied Data Science , Columbia University, Ian Langmore and Daniel Krasner
- 
	    		Understanding Machine Learning: From Theory to Algorithms , Shai Shalev-Shwartz and Shai Ben-David
- 
	    		A Course in Machine Learning , Hal Daume III (draft)
- 
	    		Deep Learning , Yoshua Bengio (draft), 2016
- 
	    	  Neural Networks and Deep Learning , Michael Nielsen, 2017
- 
	    		Intermediate Python , Muhammad Yasoob Ullah Khalid, 2015
- 
			Think Bayes , Allen B. Downey, Green Tea Press, 2012
- 
		The Elements of Statistical Learning , Hastie, Tibshirani, Friedman, Springer, 2011
- 
		 An Introduction to Statistical Learning James, Witten, Hastie, Tibshirani, Springer 2013
- 
		Bayesian Reasoning and Machine Learning , David Barber, Cambridge University Press, 2012.
- 
		Introduction to Information Retrieval , Manning, Raghavan and Schutze, Cambridge University Press, 2008.
- 
		Mining of Massive Datasets , Rajamaran and Ullman, Cambridge University Press, 2011.
- 
		Information Theory, Inference and Learning Algorithms , David Mackay, Cambridge University Press, 2007
- 
		Introduction to Machine Learning , Alex Smola, (full draft, very good).
- 
		Text Processing in Python , David Mertz, Addison Wesley, 2003.
- A whole collection of nice open-access AI books from Intech.
- Interactive Data Visualization for the Web (O'Reilly), good book on the d3 javascript library, based on the following tutorials
Datasets
- Amazon Customer Reviews Dataset
- The Billion Prices project
- Public Datasets on AWS
- UCI Machine Learning Repository
- Berkeley Earth
- Awesome public datasets - a compilation by Xiaming Chen
- ICEWS- Integrated Crisis Early Warning System
- Phoenix Dataset - a near real-time event dataset
- NY City Motor Vehicle Collisions data
- Big Data: 35 Brilliant and Free Data Sources for 2016
- Stanford Large Network Dataset Collection
- Smart* Data Set for Sustainability
- NREL Wind Data
- GroupLens Research Data Sets (Recommendation systems, etc)
- Datasets for network analysis
- Global Database of Events, Language and Tone (Featured in SBP'14)
- NOAA/NGDC - Earth Observation Group - Defense Meteorological Satellite Progam, Boulder
- The Corpus of Historical American English (COHA)
- Project Tycho - Public Health Data for Science and Policy Making
- AMiner Citation Network Dataset
- KEEL-dataset repository
- Machine Learning data set repository
- City-sized portions of OpenStreetMap, served weekly ;-) (note to self) for Basemap, you want the "IMPOSM SHP" format
Links
- Tutorial for setting up Hadoop 2.x (most existing tutorials seem to target version 1.x or earlier..!
- Teaching materials for machine learning. Notes, slides, homework material..
- GloVe: Global Vectors for Word Representation





 
      