Module 1) Introduction to data integration and Cloud Data Fusion
- Understand the need for data integration,
- List the situations/cases where data integration can help businesses,
- List the available data integration platforms and tools,
- Identify the challenges with data integration
- Understand the use of Cloud Data Fusion as a data integration platform
- Create a Cloud Data Fusion instance,
- Familiarize with core framework and major components in Cloud Data Fusion
Module 2) Building Pipelines
- Understand Cloud Data Fusion architecture
- Define what a data pipeline is
- Understand the DAG representation of a data pipeline,
- Learn to use Pipeline Studio and its components
- Design a simple pipeline using Pipeline Studio,
- Deploy and execute a pipeline
Module 3) Designing complex pipelines
- Perform branching, merging, and join operations.
- Execute pipeline with runtime arguments using macros.
- Work with error handlers.
- Execute pre- and post-pipeline executions with help of actions and notifications.
- Schedule pipelines for execution.
- Import and export existing pipelines.
Module 4) Pipeline execution environment
- Understand the composition of an execution environment.
- Configure your pipeline’s execution environment, logging, and metrics.
- Understand concepts like compute profile and provisioner.
- Create a compute profile.
- Create pipeline alerts.
- Monitor the pipeline under execution.
Module 5) Building Transformations and Preparing Data with Wrangler
- Understand the use of Wrangler and its main components.
- Transform data using Wrangler UI.
- Transform data using directives/CLI methods.
- Create and use user-defined directives.
Module 6) Connectors and streaming pipelines
- Connectors
- DLP
- Reference architecture for streaming applications Building streaming pipelines
Module 7) Metadata and data lineage
- List types of metadata.
- Differentiate between business, technical, and operational metadata.
- Understand what data lineage is. Understand the importance of maintaining data lineage.
- Differentiate between metadata and data lineage