Course Description

Most of the core business data in an enterprise are stored in relational databases that support Structure Query Language (SQL). When data scientists perform tasks such as data cleanse, validation, manipulation and feature variable calculation using SQL within database environments, they can achieve important advantages such as more compact code, easier deployment and higher security in comparison to moving the data outside of the database to a separate analytics environment.

In this course, students will learn practical Oracle SQL skills to solve problems such as:
  • Data Validation
  • Data Summary
  • Missing Value Handling
  • Detect and Remove Duplicates
  • Binning Variable Based on Equal Frequency
  • Build Good Variables for Predictive Models or Business Rules, e.g., RFM Analysis, Time Elapse Since Last Purchase, Number of Transactions in Last 3 Days, Moving Average Purchase Amount in Last 7 Days
  • Random Sampling
  • Gain Chart
  • Histogram
  • Using View to Organize Process Flows

There are totally about 4 hours and 30 minutes video presentations. SQL scripts that create data sets and perform the Data Science tasks are provided. Slides are included so that students can easily find the topics that they interested in.

This course requires that students already have basic knowledge about SQL.





Ph.D.

Jay Zhou

The instructor, Dr. Jiang Zhou , has nearly two decades of experience building predictive analytics solutions across industries including telecommunication, banking and insurance. These solutions have resulted in over $200 million savings for clients. Dr. Zhou has founded/co-founded 3 data analytics companies, i.e., Business Data Miners (http://businessdataminers.com/), ZSAnalytics (http://zsanalytics.com/) and 慧信 (Smart Credit). He is the Chief Analytics Officer of ZSAnalytics and the Chief Technology Officer of 慧信. Previously, he was a senior modeling engineer at Nestor, Inc., the chief statistician at Lightbridge, Inc., a vice president at Citizens Bank and a consulting member of technical staff at Oracle Inc. He has made presentations about data analytics at American Bankers Association(ABA) conferences, Predictive Analytics World ( "Building In-Database Predictive Scoring Model: Check Fraud Detection Case Study" at Predictive Analytics World), universities and other events. Dr. Zhou has been involved in 3 head to head competitions to build the best predictive models, i.e., a customer credit risk model for a top 3 cell phone company, a bank card fraud detection model for a top 15 bank, and a direct sales model for a marketing company. Dr. Zhou's models have won all 3 competitions.In addition to his technical skills, he is, as one of his clients put it, "a great trainer, and a good presenter of theoretical data mining concepts so that they can be understood by most". He is the author of the blog www.deep-data-mining.com that is regarded as one of the best on data analytics and data mining.Dr. Zhou is the primary author of an award-wining scientific paper, "Using genetic learning artificial neural networks for spatial decision making in GIS" (Nov., 1996, PE & RS). His scientific paper titled "A wavelet transform method to merge Landsat TM and SPOT panchromatic data" (1998,International Journal of Remote Sensing) is one of the most cited in the field.Dr. Zhou obtained a B.S. from Zhejiang University, M.S. from Beijing University, and a Ph.D. from the University of Connecticut.

Course curriculum

  • 1

    Introduction

    • Structure of the Course Materials

    • Video-About the Instructor

    • Video-Content of the Course

    • Slides- About the Instructor

    • Slides- Content of the Course

  • 2

    About the SQL Scripts and Testing Data Used in the Course

    • SQL Script Files -Video Introduction

    • SQL Script Files

  • 3

    Why SQL for Data Science?

    • Video- What is SQL and Why SQL for Data Science Part 1

    • Video- What is SQL and Why SQL for Data Science Part 2

    • Video- What is SQL and Why SQL for Data Science Part 3

    • Video- What is SQL and Why SQL for Data Science Part 4

    • Slides- What is SQL and Why SQL for Data Science?

  • 4

    Typical Tasks for Data Science Project

    • Video- Typical Tasks for Data Science Project Part 1

    • Video- Typical Tasks for Data Science Project Part 2

    • Slides- Typical Tasks for Data Science Project

  • 5

    SQL for Data Validation and Understanding

    • Video- SQL for Data Validation and Understanding Part 1

    • Video- SQL for Data Validation and Understanding Part 2

    • Video- SQL for Data Validation and Understanding Part 3

    • Video- SQL for Data Validation and Understanding Part 3b

    • Video- SQL for Data Validation and Understanding Part 4

    • Video- SQL for Data Validation and Understanding Part 5

    • Video- SQL for Data Validation and Understanding Part 6

    • Video- SQL for Data Validation and Understanding Part 7

    • Video- SQL for Data Validation and Understanding Part 8

    • Video- SQL for Data Validation and Understanding Part 9

    • Video- SQL for Data Validation and Understanding Part 10

    • Video- SQL for Data Validation and Understanding Part 11

    • Video- SQL for Data Validation and Understanding Part 12

    • Video- SQL for Data Validation and Understanding Part 13

    • Video- SQL for Data Validation and Understanding Part 14

    • Slides- About the Data

    • Slides- Count the Number of Records

    • Slides- Calculate Minimum, Maximum, Mean, Median and Number of NULL Values

    • Slides- Calculate the Number of Unique Values

    • Slides- Calculate Total and Percent Amount

    • Slides- Calculate the Histogram for a Variable

    • Slides- Generate Unique Rank for a Variable

    • Slides- Find the Most Frequent Value

    • Slides- Calculate Standard Deviation and Variance

    • Slides- Calculate Correlation Coefficient

    • Slides- Calculate Cumulative Distribution

    • Slides- Calculate Model Score Gain Chart

  • 6

    SQL for Data Cleansing and Preparation

    • Video- SQL for Data Cleansing and Preparation Part 1

    • Video- SQL for Data Cleansing and Preparation Part 2

    • Video- SQL for Data Cleansing and Preparation Part 3

    • Video- SQL for Data Cleansing and Preparation Part 4

    • Video- SQL for Data Cleansing and Preparation Part 5

    • Video- SQL for Data Cleansing and Preparation Part 6

    • Video- SQL for Data Cleansing and Preparation Part 7

    • Video- SQL for Data Cleansing and Preparation Part 8

    • Video- SQL for Data Cleansing and Preparation Part 9

    • Video- SQL for Data Cleansing and Preparation Part 10

    • Video- SQL for Data Cleansing and Preparation Part 11

    • Video- SQL for Data Cleansing and Preparation Part 12

    • Video- SQL for Data Cleansing and Preparation Part 13

    • Video- SQL for Data Cleansing and Preparation Part 14

    • Video- SQL for Data Cleansing and Preparation Part 15

    • Slides- Detect and Remove Duplicates

    • Slides- Handling Missing Values

    • Slides- Binning Variable (user-defined bin boundaries, equal length and equal frequence)

    • Slide- Fix Issues With Character Strings

    • Slides- Mask Sensitive Data

    • Slides -Keep the Most Recent Transactions for Each Account

    • Slides- Random Sampling

  • 7

    Feature Variable Calculation

    • Video- Feature Variable Calculation Part 1

    • Video- Feature Variable Calculation Part 2

    • Video- Feature Variable Calculation Part 3

    • Video- Feature Variable Calculation Part 4

    • Video- Feature Variable Calculation Part 5

    • Video- Feature Variable Calculation Part 6

    • Video- Feature Variable Calculation Part 7

    • Video- Feature Variable Calculation Part 8

    • Video- Feature Variable Calculation Part 9

    • Slides- Feature Variable Calculation- Recency

    • Slides- Feature Variable Calculation- Rolling Summary

    • Slides- Feature Variable Calculation- Process Flow

    • Slides- Feature Variable Calculation- More on Random Sampling

  • 8

    Summaries and Highlights

    • Video- Summaries and Highlights

    • Slides- Summaries and Highlights

  • 9

    Next Steps

    • Build and Deploy Predictive Models Inside Database

Pricing options

Explain how different pricing options might be valuable to different segments of your audience.