iiDrak Data Pipeline

A modern Data Lakehouse solution with Open and Unified data processing platform for Data Lake and Data warehouse.

iiDrak Data Pipeline

Overview

iiDrak distributed data pipeline platform combines the power of RAFT consensus protocol with a true no-code/low-code experience. The intuitive visual canvas and extensible plugin architecture enable teams to build enterprise-grade data pipelines without writing code, while maintaining the flexibility to add custom components as needed.

Key Features

No-Code Visual Pipeline Builder

Plugin-Based Architecture

Component Types

  1. Source Connectors
  2. Transformations
  3. Destinations

Building Pipelines

Visual Pipeline Creation

  1. Component Selection
  2. Pipeline Configuration
  3. Testing and Validation

Example: Building a Data Warehouse Pipeline

Visual Steps:

  1. Add Source
  2. Add Transformations
  3. Add Destination

Plugin Development

Component Plugin Architecture

plugin/

── manifest.json       # Plugin metadata and dependencies

── icon.svg           # Component icon for canvas

── config-schema.json # Configuration UI definition

── src/        # custom logic

 

Plugin Capabilities

Cost-Effectiveness

Development Savings

Operational Benefits

Resource Optimization

Competitive Advantages

Ease of Use

Extensibility

Enterprise Features

Use Cases

Real-time Analytics Pipeline

  1. Visual Configuration

Multi-Source ETL

Canvas Setup

Future-Proof Architecture

Scalability

Integration

Security

Build basic ETL pipeline in minutes

Pre-Requisite:

  1. ABFS storage configured as part of connector. We will use this to stage the live events data.
  2. Create Lakehouse

Let's create a simple streaming data pipeline to capture events from website analytics. To begin with let's create a sample analytics table

create table awscheck.analytics.webanalytics (

    user_id string,

    browseragent string,

    timestamp bigint,

    url string,

    event_name string,

    event_value string

)

In the above example we created table to capture few basic information such as user's browser, which event was performed(such as click, scroll etc.,) and the event value (ex: button id etc.,)

Now let's create a simple data pipeline by following below steps:

Step 1. Data Pipeline -> ** + Data Pipeline**

Step 2. Enter the pipeline name

 

Alt text

 

Step 3. From the left bar, under *Triggers** section drag and drop the HTTP Server component

Step 4. Double click on the component and enter the URL path ex: "/api/v1/events", PORT number, ex: 9090, Method as POST, content type as application/json and click on save.

Alt text

Step 5. Drag and drop Azure Blob Storage component from Data Source section and select the configured ABFS connector from the drop down.

Step 6. Finally drag and drop Iceberg component from Data Sink section and select Catalog, namespace and table (awscheck -> analytics -> webanalytics) in this case.

Alt text

After performing the above steps, you're now ready to start the pipeline. Once the pipeline is running, events can be posted using REST APIs as shown below:

URL: http://<GATEWAY_ENDPOINT>/api/v1/events

Method: POST

Headers: {'Content-Type': 'application/json'}

Payload:

{

    "user_id": "19834",

    "browseragent": "Safari",

    "timestamp": 1729852377000,

    "url": "https://mywebsite.tracking.com/signup",

    "event_name": "click",

    "event_value": "Signup Button"

}