Location
-
Format
What’s this? Ways to train
Classroom
Live, instructor-led training in a standard, professional classroom environmentVirtual
Live, instructor-led training conducted over the internet, with hands-on labsOnline
An online, HTML5, self-paced learning experience available for all coursesOn-site
Private training for your entire team, delivered at your location, a training center, or onlineVideo classroom
Learn more about our training formats
High-definition video of our most popular courses, streamed to your laptop or personal device
-
3 Days
-
All of our private classes are customized to your organization's needs.
-
Click on the button below to send us your details and you will be contacted shortly.
Already purchased this offering? Log in
Request more information
Inquiry for: Myself My Company
By providing your contact details, you agree to our Privacy Policy
Thank You
Our learning consultant will get back to you in 1 business day
HDP Academic Analyst: Data Science
WHAT YOU WILL LEARN
This course is designed for students
preparing to become familiar with the processes and practice of data science,
including machine learning and natural language processing. Included are: tools
and programming languages (Python, IPython, Mahout, Pig, NumPy, Pandas, SciPy,
Scikit-learn), the Natural Language Toolkit (NLTK), and Spark MLlib.
AUDIENCE
This course is excellent for Computer
science and data analytics students who need to apply data science and machine
learning on Hadoop.
PREREQUISITES
Students must have experience with at least
one programming or scripting language, knowledge in statistics and/or
mathematics, and a basic understanding of big data and Hadoop principles.
CERTIFICATION
Hortonworks offers a comprehensive
certification program that identifies you as an expert in Apache Hadoop.
COURSE OBJECTIVES
Upon completion of this program,
participants should be able to:
- Recognize use cases for data science
- Describe the architecture of Hadoop and YARN
- Describe supervised and unsupervised learning differences
- List the six machine learning tasks
- Use Mahout to run a machine learning algorithm on Hadoop
- Use Pig to transform and prepare data on Hadoop
- Write a Python script
- Use NumPy to analyze big data
- Use the data structure classes in the pandas library
- Write a Python script that invokes SciPy machine learning
- Describe options for running Python code on a Hadoop cluster
- Write a Pig User-Defined Function in Python
- Use Pig streaming on Hadoop with a Python script
- Write a Python script that invokes scikit-learn
- Use the k-nearest neighbor algorithm to predict values
- Run a machine learning algorithm on a distributed data set
- Describe use cases for Natural Language Processing (NLP)
- Perform sentence segmentation on a large body of text
- Perform part-of-speech tagging
- Use the Natural Language Toolkit (NLTK)
- Describe the components of a Spark application
- Write a Spark application in Python
- Run machine learning algorithms using Spark MLlib
Modules
Course Reviews
0
0 Ratings