更新时间:2021-07-02 20:09:52
封面
版权信息
Credits
Preface
Part 1. Module 1
Chapter 1. Getting Started with Predictive Modelling
Introducing predictive modelling
Applications and examples of predictive modelling
Python and its packages – download and installation
Python and its packages for predictive modelling
IDEs for Python
Summary
Chapter 2. Data Cleaning
Reading the data – variations and examples
Various methods of importing data in Python
The read_csv method
Use cases of the read_csv method
Case 2 – reading a dataset using the open method of Python
Case 3 – reading data from a URL
Case 4 – miscellaneous cases
Basics – summary dimensions and structure
Handling missing values
Creating dummy variables
Visualizing a dataset by basic plotting
Chapter 3. Data Wrangling
Subsetting a dataset
Generating random numbers and their usage
Grouping the data – aggregation filtering and transformation
Random sampling – splitting a dataset in training and testing datasets
Concatenating and appending data
Merging/joining datasets
Chapter 4. Statistical Concepts for Predictive Modelling
Random sampling and the central limit theorem
Hypothesis testing
Chi-square tests
Correlation
Chapter 5. Linear Regression with Python
Understanding the maths behind linear regression
Making sense of result parameters
Implementing linear regression with Python
Model validation
Handling other issues in linear regression
Chapter 6. Logistic Regression with Python
Linear regression versus logistic regression
Understanding the math behind logistic regression
Implementing logistic regression with Python
Model validation and evaluation
Chapter 7. Clustering with Python
Introduction to clustering – what why and how?
Mathematics behind clustering
Implementing clustering using Python
Fine-tuning the clustering
Chapter 8. Trees and Random Forests with Python
Introducing decision trees
Understanding the mathematics behind decision trees
Implementing a decision tree with scikit-learn
Understanding and implementing regression trees
Understanding and implementing random forests
Chapter 9. Best Practices for Predictive Modelling
Best practices for coding
Best practices for data handling
Best practices for algorithms
Best practices for statistics
Best practices for business contexts
Appendix A. A List of Links
Part 2. Module 2
Chapter 1. From Data to Decisions – Getting Started with Analytic Applications
Designing an advanced analytic solution
Case study: sentiment analysis of social media feeds
Case study: targeted e-mail campaigns
Chapter 2. Exploratory Data Analysis and Visualization in Python
Exploring categorical and numerical data in IPython
Time series analysis
Working with geospatial data
Introduction to PySpark
Chapter 3. Finding Patterns in the Noise – Clustering and Unsupervised Learning
Similarity and distance metrics
Affinity propagation – automatically choosing cluster numbers
k-medoids
Agglomerative clustering
Streaming clustering in Spark
Chapter 4. Connecting the Dots with Models – Regression Methods
Linear regression
Tree methods
Scaling out with PySpark – predicting year of song release
Chapter 5. Putting Data in its Place – Classification Methods and Analysis
Logistic regression