23-01-2018 13:05:03
Participants will learn so solve real-world problems and after the course, participants would be able to carry on tasks involving Big Data analysis in their organization, from designing “Big Data” architecture to analysis of company specific data.

Vælg tid og sted

Hvornår?

5. feb. 2018 - 09:00 - 16:00
6. feb. 2018 - 09:00 - 16:00
7. feb. 2018 - 09:00 - 16:00
Tilmeldingsfrist
1. feb. 2018 - 23:59
Arrangementsnummer
324924

Hvor?

Tivoli Hotel & Congress Center
Arni Magnussons Gade 2
1577 København V

Pris

Deltager, ikke medlem af IDA
kr. 13.500 ekskl. moms
Medlem
kr. 11.500 ekskl. moms

Hvornår?

11. apr. 2018 - 09:00 - 16:00
12. apr. 2018 - 09:00 - 16:00
13. apr. 2018 - 09:00 - 16:00
Tilmeldingsfrist
6. apr. 2018 - 23:59
Arrangementsnummer
324926

Hvor?

Scandic Hotel Aarhus City
Østergade 10
8000 Aarhus C

Pris

Deltager, ikke medlem af IDA
kr. 13.500 ekskl. moms
Medlem
kr. 11.500 ekskl. moms

Hvornår?

14. maj. 2018 - 09:00 - 16:00
15. maj. 2018 - 09:00 - 16:00
16. maj. 2018 - 09:00 - 16:00
Tilmeldingsfrist
11. maj. 2018 - 23:59
Arrangementsnummer
324925

Hvor?

Tivoli Hotel & Congress Center
Arni Magnussons Gade 2
1577 København V

Pris

Deltager, ikke medlem af IDA
kr. 13.500 ekskl. moms
Medlem
kr. 11.500 ekskl. moms

WHY PARTICIPATE IN THIS COURSE?

Knowing data structures and algorithms help us to write effective applications, programing languages gave us expression power, design patterns are often answers to our common problems. All that is still not enough when dealing with ever growing data amounts. Even medium size datasets can cause troubles when our analytics become complex. This course teaches you how to effectively process and analyze data in parallel using open source tools like Hadoop and Spark. This course will help you to use the right tools and develop own applications that use effectively large unstructured data in order to reach the competitive edge.

COURSE DELIVERED IN 3 DAYS EMPHASIZES ON

  • Participants will obtain practical, hands-on experience with open source tools and frameworks for analyzing big data
  • Participants will get an understanding of methods applied for data analysis in distributed environments, such as cloud, hadoop cluster
  • Participants will learn how to solve common real-world problems where fast processing of larger data amounts is crucial, such as web crawlers, recommendation engine, etc.
  • After the course, participant would be able to carry on tasks involving big data analysis in his organization, from designing “big data” architecture to analysis of company specific data

PREREQUISITES

This 3-day hands-on training course is best suited for software engineers, developers or data analysts

No prior knowledge of Hadoop or Spark is required. In this instructor-led course, participants will go through hands-on sessions with planned exercises.

Participants are expected to bring own laptop (with min. 8GB RAM) to the class, everything else needed for course is provided.

COURSE CONTENT

Day 1: OSS Tools & Frameworks for Big Data 

  • Challenges in big data processing
  • Distributed storage & computing in Apache Hadoop & Apache Spark
  • MapReduce paradigms; Transformations and Actions
  • Methods for effective data processing vs. traditional database systems and ETL
  • Cloud computing; processing on preinstalled hadoop cluster
  • Ingesting Data from External Sources and Relational Databases
  • Big Data ecosystem

Day 2: Data Processing, Analysis & Visualization with R and SQL

  • Interactive data analysis in R, Spark SQL
  • Environments & tools for data processing & analysis, Interactive Notebooks
  • Visualization techniques for big data: plots, trends, geospatial, multidimensional data
  • From analysis to production: DataFrames & closures, intro to functional programming

Day 3: Introduction to Machine Learning

  • Practical clustering, finding patterns in data, outlier detection
  • Feature selection for data analysis: correlation, principal components, distance measures, bias
  • Classification & Regression models for real world problems: recommendations, error detection
  • Training and running ML models using Spark MLlib
  • Performance tuning, debugging and optimizing our distributed processing

INSTRUCTOR

Vladimir Smida is a Big Data Engineer with a background in Computer Science, artificial intelligence and Machine Learning. Over the last 10 years, Vladimir has architected and developed enterprise production systems on Hadoop and Spark, in-memory real-time trading agents, NoSQL solutions. Vladimir has been working as Software Architect and Data Scientist Consultant for biggest IT companies in the world. Since 2014, Vladimir leads trainings in Big Data and Data Science. Whether dealing with predictive models or technology choice, he always seeks for the best, trend-aware solution to given task.

THE PRICE INCLUDES:

Before the course

  • Short questionnaire about what you expect from the course

During the course

  • 3 course days - 9.00-16.00 – breakfast from 8.30
  • 1 instructor
  • All meals included
  • Course material

After the course

  • Course certificate

Please note that your registration for participation is binding.

luk
close