iiDrak Data Pipeline

A modern Data Lakehouse solution with Open and Unified data processing platform for Data Lake and Data warehouse.

iiDrak Data Pipeline

Overview

iiDrak distributed data pipeline platform combines the power of RAFT consensus protocol with a true no-code/low-code experience. The intuitive visual canvas and extensible plugin architecture enable teams to build enterprise-grade data pipelines without writing code, while maintaining the flexibility to add custom components as needed.

Key Features

No-Code Visual Pipeline Builder

Intuitive Canvas Interface

Plugin-Based Architecture

Expandable Component Library

Component Types

Source Connectors
Transformations
Destinations

Building Pipelines

Visual Pipeline Creation

Component Selection
Pipeline Configuration
Testing and Validation

Example: Building a Data Warehouse Pipeline

Visual Steps:

Add Source
Add Transformations
Add Destination

Plugin Development

Component Plugin Architecture

plugin/

├── manifest.json # Plugin metadata and dependencies

├── icon.svg # Component icon for canvas

├── config-schema.json # Configuration UI definition

└── src/ # custom logic

Plugin Capabilities

Custom UI components
Proprietary protocols
Complex transformations
Custom validation rules
Specialized connectors

Cost-Effectiveness

Development Savings

No-Code Solution

Operational Benefits

Visual Management

Resource Optimization

Smart Execution

Competitive Advantages

Ease of Use

True no-code experience
Visual debugging and testing
Interactive documentation
Built-in best practices

Extensibility

Open plugin architecture
Community marketplace
Custom component development
Flexible deployment options

Enterprise Features

Role-based access control
Audit logging
Pipeline versioning
Environment management

Use Cases

Real-time Analytics Pipeline

Visual Configuration

Multi-Source ETL

Canvas Setup

Add multiple source connectors
Configure visual joins and aggregations
Set up incremental loading
Define error handling visually

Future-Proof Architecture

Scalability

Add nodes through Admin UI
Visual cluster monitoring
Automated workload distribution
Built-in performance optimization

Integration

Extensive connector library
Custom connector development
API-first architecture
Webhook support

Security

Visual access control management
Encrypted configuration storage
Audit trail visualization
Compliance reporting

Build basic ETL pipeline in minutes

Pre-Requisite:

ABFS storage configured as part of connector. We will use this to stage the live events data.
Create Lakehouse

Let's create a simple streaming data pipeline to capture events from website analytics. To begin with let's create a sample analytics table

create table awscheck.analytics.webanalytics (

user_id string,

browseragent string,

timestamp bigint,

url string,

event_name string,

event_value string

)

In the above example we created table to capture few basic information such as user's browser, which event was performed(such as click, scroll etc.,) and the event value (ex: button id etc.,)

Now let's create a simple data pipeline by following below steps:

Step 1. Data Pipeline -> ** + Data Pipeline**

Step 2. Enter the pipeline name

Alt text

Step 3. From the left bar, under *Triggers** section drag and drop the HTTP Server component

Step 4. Double click on the component and enter the URL path ex: "/api/v1/events", PORT number, ex: 9090, Method as POST, content type as application/json and click on save.

Alt text

Step 5. Drag and drop Azure Blob Storage component from Data Source section and select the configured ABFS connector from the drop down.

Step 6. Finally drag and drop Iceberg component from Data Sink section and select Catalog, namespace and table (awscheck -> analytics -> webanalytics) in this case.

Alt text

After performing the above steps, you're now ready to start the pipeline. Once the pipeline is running, events can be posted using REST APIs as shown below:

URL: http://<GATEWAY_ENDPOINT>/api/v1/events

Method: POST

Headers: {'Content-Type': 'application/json'}

Payload:

{

"user_id": "19834",

"browseragent": "Safari",

"timestamp": 1729852377000,

"url": "https://mywebsite.tracking.com/signup",

"event_name": "click",

"event_value": "Signup Button"

}

Transforming petabytes to insights

iiDrak Data Pipeline