Preface
1Introduction
1.1WhatIsDataMining?
1.2MotivatingChallenges
1.3TheOriginsofDataMining
1.4DataMiningTasks
1.5ScopeandOrganizationoftheBook
1.6BibliographicNotes
1.7Exercises
2Data
2.1TypesofData
2.1.1AttributesandMeasurement
2.1.2TypesofDataSets
2.2DataQuality
2.2.1MeasurementandDataCollectionIssues
2.2.2IssuesRelatedtoApplications
2.3DataPreprocessing
2.3.1Aggregation
2.3.2Sampling
2.3.3DimensionalityReduction
2.3.4FeatureSubsetSelection
2.3.5FeatureCreation
2.3.6DiscretizationandBinarization
2.3.7VariableTransformation
2.4MeasuresofSimilarityandDissimilarity
2.4.1Basics
2.4.2SimilarityandDissimilaritybetweenSimpleAttributes.
2.4.3DissimilaritiesbetweenDataObjects
2.4.4SimilaritiesbetweenDataObjects
2.4.5ExamplesofProximityMeasures
2.4.6IssuesinProximityCalculation
2.4.7SelectingtheRightProximityMeasure
2.5BibliographicNotes
2.6Exercises
3ExploringData
3.1TheIrisDataSet
3.2SummaryStatistics
3.2.1FrequenciesandtheMode
3.2.2Percentiles
3.2.3MeasuresofLocation:MeanandMedian
3.2.4MeasuresofSpread:RangeandVariance
3.2.5MultivariateSummaryStatistics
3.2.6OtherWaystoSummarizetheData
3.3Visualization
3.3.1MotivationsforVisualization
3.3.2GeneralConcepts
3.3.3Techniques
3.3.4VisualizingHigher-DimensionalData
3.3.5Do'sandDon'ts
3.4OLAPandMultidimensionalDataAnalysis
3.4.1RepresentingIrisDataasaMultidimensionalArray
3.4.2MultidimensionalData:TheGeneralCase
3.4.3AnalyzingMultidimensionalData
3.4.4FinalCommentsonMultidimensionalDataAnalysis
3.5BibliographicNotes
3.6Exercises
Classification:
4BasicConcepts,DecisionTrees,andModelEvaluation
4.1Preliminaries
4.2GeneralApproachtoSolvingaClassificationProblem
4.3DecisionTreeInduction
4.3.1HowaDecisionTreeWorks
4.3.2HowtoBuildaDecisionTree
4.3.3MethodsforExpressingAttributeTestConditions.
4.3.4MeasuresforSelectingtheBestSplit
4.3.5AlgorithmforDecisionTreeInduction
4.3.6AnExample:WebRobotDetection
4.3.7CharacteristicsofDecisionTreeInduction
4.4ModelOverfitting
4.4.1OverfittingDuetoPresenceofNoise
4.4.2OverfittingDuetoLackofRepresentativeSamples.
4.4.3OverfittingandtheMultipleComparisonProcedure
4.4.4EstimationofGeneralizationErrors
4.4.5HandlingOverfittinginDecisionTreeInduction..
4.5EvaluatingthePerformanceofaClassifier
4.5.1HoldoutMethod
4.5.2RandomSubsampling
4.5.3Cross-Validation
4.5.4Bootstrap
4.6MethodsforComparingClassifiers
4.6.1EstimatingaConfidenceIntervalforAccuracy
4.6.2ComparingthePerformanceofTwoModels
4.6.3ComparingthePerformanceofTwoClassifiers
4.7BibliographicNotes
4.8Exercises
5Classification:AlternativeTechniques
6AssociationAnalysis:BasicConceptsandAlgorithms