HDP Academic Analyst: Data Science | IT Training & Certification | Info Trek
Respect Your Dreams
Follow through on your goals with courses

HDP Academic Analyst: Data Science

Location

Format What’s this?
  1. 3 Days
  1. All of our private classes are customized to your organization's needs.
  2. Click on the button below to send us your details and you will be contacted shortly.
Request more information

Inquiry for: Myself    My Company

By providing your contact details, you agree to our Privacy Policy

 

 

 

Thank You

Our learning consultant will get back to you in 1 business day

HDP Academic Analyst: Data Science

WHAT YOU WILL LEARN

This course is designed for students preparing to become familiar with the processes and practice of data science, including machine learning and natural language processing. Included are: tools and programming languages (Python, IPython, Mahout, Pig, NumPy, Pandas, SciPy, Scikit-learn), the Natural Language Toolkit (NLTK), and Spark MLlib.

AUDIENCE

This course is excellent for Computer science and data analytics students who need to apply data science and machine learning on Hadoop.

PREREQUISITES

Students must have experience with at least one programming or scripting language, knowledge in statistics and/or mathematics, and a basic understanding of big data and Hadoop principles.

CERTIFICATION

Hortonworks offers a comprehensive certification program that identifies you as an expert in Apache Hadoop.

COURSE OBJECTIVES

Upon completion of this program, participants should be able to:

  • Recognize use cases for data science
  • Describe the architecture of Hadoop and YARN
  • Describe supervised and unsupervised learning differences
  • List the six machine learning tasks
  • Use Mahout to run a machine learning algorithm on Hadoop
  • Use Pig to transform and prepare data on Hadoop
  • Write a Python script
  • Use NumPy to analyze big data
  • Use the data structure classes in the pandas library
  • Write a Python script that invokes SciPy machine learning
  • Describe options for running Python code on a Hadoop cluster
  • Write a Pig User-Defined Function in Python
  • Use Pig streaming on Hadoop with a Python script
  • Write a Python script that invokes scikit-learn
  • Use the k-nearest neighbor algorithm to predict values
  • Run a machine learning algorithm on a distributed data set
  • Describe use cases for Natural Language Processing (NLP)
  • Perform sentence segmentation on a large body of text
  • Perform part-of-speech tagging
  • Use the Natural Language Toolkit (NLTK)
  • Describe the components of a Spark application
  • Write a Spark application in Python
  • Run machine learning algorithms using Spark MLlib

Expand All

Modules

Setting Up a Development Environment
Using HDFS Commands
Using Mahout for Machine Learning
Getting Started with Pig
Exploring Data with Pig
Using the IPython Notebook
Data Analysis with Python
Interpolating Data Points
Define a Pig UDF in Python
Streaming Python with Pig
K-Nearest Neighbor and K-Means Clustering
K-Means Clustering
Using NLTK for Natural Language Processing
Classifying Text using Naive Bayes
Spark Programming and Spark MLlib
Running Data Science Algorithms using Spark MLib
To Be Confirm

To Be Confirm

Read More

Course Reviews

No Remarks

0

0 Ratings