Spark机器学习

正版保障假一赔十可开发票

作者: （英）彭特里思　著
出版社: 东南大学出版社
ISBN: 9787564160913

出版时间: 2015-11
装帧: 平装
开本: 其他

作者: （英）彭特里思　著
出版社: 东南大学出版社

ISBN: 9787564160913
出版时间: 2015-11

装帧: 平装
开本: 其他

售价 40.75 6.0折

定价￥68.00

品相全新

优惠

运费

本店暂时无法向该地区发货

延迟发货说明

时间：

说明：

上书时间2023-11-22

数量: 仅1件在售，欲购从速

立即购买加入购物车收藏

卖家超过10天未登录

商品详情
店铺评价

手机购买

微信扫码访问

商品分类：

工程技术

货号：

8650070

商品描述：

作者简介
彭特里思，如果你是一名Scala、Java或Python开发人员，对机器学习和数据分析饶有兴趣，并热衷于学习如何使用spa rk框架将常见机器学习技术运用干大规模应用，那么这本书就是写给你的。如果对spark有基本的理解自然会有益处，但这并不是必需的。

目录
Preface

Chapter 1: Getting Up and Running with Spark

?Installing and setting up Spark locally

?Spark clusters

?The Spark programming model

?SparkContext and SparkConf

?The Spark shell

?Resilient Distributed Datasets

?Creating RDDs

?Spark operations

?Caching RDDs

?Broadcast variables and accumulators

?The first step to a Spark program in Scala

?The first step to a Spark program in Java

?The first step to a Spark program in Python

?Getting Spark running on Amazon EC2

?Launching an EC2 Spark cluster

?Summary

Chapter 2: Designing a Machine Learning System

?Introducing MovieStream

?Business use cases for a machine learning system

?Personalization

?Targeted marketing and customer segmentation

?Predictive modeling and analytics

?Types of machine learning models

?The components of a data-driven machine learning system

?Data ingestion and storage

?Data cleansing and transformation

?Model training and testing loop

?Model deployment and integration

?Model monitoring and feedback

?Batch versus real time

?An architecture for a machine learning system

?Practical exercise

?Summary

Chapter 3: Obtaining, Processing, and Preparing Data

with Spark

?Accessing publicly available datasets

?The MovieLens lOOk dataset

?Exploring and visualizing your data

?Exploring the user dataset

?Exploring the movie dataset

?Exploring the rating dataset

?Processing and transforming your data

?Filling in bad or missing data

?Extracting useful features from your data

?Numerical features

?Categorical features

?Derived features

?Transforming timestamps into categorical features

?Text features

?Simple text feature extraction

?Normalizing features

?Using MLlib for feature normalization

?Using packages for feature extraction

?Summary

Chapter 4: Building a Recommendation Engine with Spark

?Types of recommendation models

?Content-based filtering

?Collaborative filtering

?Matrix factorization

?Extracting the right features from your data

?Extracting features from the MovieLens 100k dataset

?Training the recommendation model

?Training a model on the MovieLens 100k dataset

?Training a model using implicit feedback data

?Using the recommendation model

?User recommendations

?Generating movie recommendations from the MovieLens 100k dataset

?Item recommendations

?Generating similar movies for the MovieLens 100k dataset

?Evaluating the performance of recommendation models

?Mean Squared Error

?Mean average precision at K

?Using MLlibs built-in evaluation functions

?RMSE and MSE

?MAP

?Summary

Chapter 5: Building a Classification Model with Spark

?Types of classification models

?Linear models

?Logistic regression

?Linear support vector machines

?The nafve Bayes model

?Decision trees

?Extracting the right features from your data

?Extracting features from the Kaggle/StumbleUpon

?evergreen classification dataset

?Training classification models

?Training a classification model on the Kaggle/StumbleUpon

?evergreen classification dataset

?Using classification models

?Generating predictions for the Kaggle/StumbleUpon

?evergreen classification dataset

?Evaluating the performance of classification models

?Accuracy and prediction error

?Precision and recall

?ROC curve and AUC

?Improving model performance and tuning parameters

?Feature standardization

?Additional features

?Using the correct form of data

?Tuning model parameters

?Linear models

?Decision trees

?The nafve Bayes model

?Cross-validation

?Summary

Chapter 6: Buildin a~ssion Model with Spark

?Types of regression models

?Least squares regression

?Decision trees for regression

?Extracting the right features from your data

?Extracting features from the bike sharing dataset

?Creating feature vectors for the linear model

?Creating feature vectors for the decision tree

?Training and using regression models

?Training a regression model on the bike sharing dataset

?Evaluating the performance of regression models

?Mean Squared Error and Root Mean Squared Error

?Mean Absolute Error

?Root Mean Squared Log Error

?The R-squared coefficient

?Computing performance metrics on the bike sharing dataset

?Linear model

?Decision tree

?Improving model performance and tuning parameters

?Transforming the target variable

?Impact of training on log-transformed targets

?Tuning model parameters

?Creating training and testing sets to evaluate parameters

?The impact of parameter settings for linear models

?The impact of parameter settings for the decision tree

?Summary

Chapter 7: Building a Clustering Model with Spark

?Types of clustering models

?K-means clustering

?Initialization methods

?Variants

?Mixture models

?Hierarchical clustering

?Extracting the right features from your data

?Extracting features from the MovieLens dataset

?Extracting movie genre labels

?Training the recommendation model

?Normalization

?Training a clustering model

?Training a clustering model on the MovieLens dataset

?Making predictions using a clustering model

?Interpreting cluster predictions on the MovieLens dataset

?Interpreting the movie clusters

?Evaluating the performance of clustering models

?Internal evaluation metrics

?External evaluation metrics

?Computing performance metrics on the MovieLens dataset

?Tuning parameters for clustering models

?Selecting K through cross-validation

?Summary

Chapter 8: Dimensionality Reduction with Spark

?Types of dimensionality reduction

?Principal Components Analysis

?Singular Value Decomposition

?Relationship with matrix factorization

?Clustering as dimensionality reduction

?Extracting the right features from your data

?Extracting features from the LFW dataset

?Exploring the face data

?Visualizing the face data

?Extracting facial images as vectors

?Normalization

?Training a dimensionality reduction model

?Running PCA on the LFW dataset

?Visualizing the Eigenfaces

?Interpreting the Eigenfaces

?Using a dimensionality reduction model

?Projecting data using PCA on the LFW dataset

?The relationship between PCA and SVD

?Evaluating dimensionality reduction models

?Evaluating k for SVD on the LFW dataset

?Summary

Chapter 9: Advanced Text Processing with Spark

?Whats so special about text data?

?Extracting the right features from your data

?Term weighting schemes

?Feature hashing

?Extracting the TF-IDF features from the 20 Newsgroups dataset

?Exploring the 20 Newsgroups data

?Applying basic tokenization

?Improving our tokenization

?Removing stop words

?Excluding terms based on frequency

?A note about stemming

?Training a TF-IDF model

?Analyzing the TF-IDF weightings

?Using a TF-IDF model

?Document similarity with the 20 Newsgroups dataset and

?TF-IDF features

?Training a text classifier on the 20 Newsgroups dataset

?using TF-IDF

?Evaluating the impact of text processing

?Comparing raw features with processed TF-IDF features on the

?20 Newsgroups dataset

?Word2Vec models

?Word2Vec on the 20 Newsgroups dataset

?Summary

Chapter 10: Real-time Machine Learning withSpark Streaming

?Online learning

?Stream processing

?An introduction to Spark Streaming

?Input sources

?Transformations

?Actions

?Window operators

?Caching and fault tolerance with Spark Streaming

?Creating a Spark Streaming application

?The producer application

?Creating a basic streaming application

?Streaming analytics

?Stateful streaming

?Online learning with Spark Streaming

?Streaming regression

?A simple streaming regression program

?Creating a streaming data producer

?Creating a streaming regression model

?Streaming K-means

?Online model evaluation

?Comparing model performance with Spark Streaming

?Summary

Index

内容摘要
Apachespark是一款全新开发的分布式框架，特别对低延迟任务和内存数据存储进行了优化。它结合了速度、可扩展性、内存处理以及容错性，是极少数适用于并行计算的框架之一，同时还非常易于编程，拥有一套灵活、表达能力丰富、功能强大的API设计。
彭特里思编写的《Spark机器学习(影印版)(英文版)》指导你学习用于载入及处理数据的sparkAPl的基础知识，以及如何为各种机器学习模型准备适合的输入数据：另有详细的例子和实际生活中的真实案例来帮助你学习包括推荐系统、分类、回归、聚类、降维在内的常见机器学习模型，你还会看到如大规模文本处理之类的高级主题、在线机器学习的相关方法
配送说明

...
相似商品
为你推荐

孔网分类

图书

图书

Spark机器学习

孔网啦啦啦啦啦纺织女工火锅店第三课