iiDrak Data Pipeline
A modern Data Lakehouse solution with Open and Unified data processing platform for Data Lake and Data warehouse.
iiDrak Data Pipeline
Overview
iiDrak distributed data pipeline platform combines the power of RAFT consensus protocol with a true no-code/low-code experience. The intuitive visual canvas and extensible plugin architecture enable teams to build enterprise-grade data pipelines without writing code, while maintaining the flexibility to add custom components as needed.
Key Features
No-Code Visual Pipeline Builder
- Intuitive Canvas Interface
Plugin-Based Architecture
- Expandable Component Library
Component Types
- Source Connectors
- Transformations
- Destinations
Building Pipelines
Visual Pipeline Creation
- Component Selection
- Pipeline Configuration
- Testing and Validation
Example: Building a Data Warehouse Pipeline
Visual Steps:
- Add Source
- Add Transformations
- Add Destination
Plugin Development
Component Plugin Architecture
plugin/
├── manifest.json # Plugin metadata and dependencies
├── icon.svg # Component icon for canvas
├── config-schema.json # Configuration UI definition
└── src/ # custom logic
Plugin Capabilities
- Custom UI components
- Proprietary protocols
- Complex transformations
- Custom validation rules
- Specialized connectors
Cost-Effectiveness
Development Savings
- No-Code Solution
Operational Benefits
- Visual Management
Resource Optimization
- Smart Execution
Competitive Advantages
Ease of Use
- True no-code experience
- Visual debugging and testing
- Interactive documentation
- Built-in best practices
Extensibility
- Open plugin architecture
- Community marketplace
- Custom component development
- Flexible deployment options
Enterprise Features
- Role-based access control
- Audit logging
- Pipeline versioning
- Environment management
Use Cases
Real-time Analytics Pipeline
- Visual Configuration
Multi-Source ETL
Canvas Setup
- Add multiple source connectors
- Configure visual joins and aggregations
- Set up incremental loading
- Define error handling visually
Future-Proof Architecture
Scalability
- Add nodes through Admin UI
- Visual cluster monitoring
- Automated workload distribution
- Built-in performance optimization
Integration
- Extensive connector library
- Custom connector development
- API-first architecture
- Webhook support
Security
- Visual access control management
- Encrypted configuration storage
- Audit trail visualization
- Compliance reporting
Build basic ETL pipeline in minutes
Pre-Requisite:
- ABFS storage configured as part of connector. We will use this to stage the live events data.
- Create Lakehouse
Let's create a simple streaming data pipeline to capture events from website analytics. To begin with let's create a sample analytics table
create table awscheck.analytics.webanalytics (
user_id string,
browseragent string,
timestamp bigint,
url string,
event_name string,
event_value string
)
In the above example we created table to capture few basic information such as user's browser, which event was performed(such as click, scroll etc.,) and the event value (ex: button id etc.,)
Now let's create a simple data pipeline by following below steps:
Step 1. Data Pipeline -> ** + Data Pipeline**
Step 2. Enter the pipeline name
Step 3. From the left bar, under *Triggers** section drag and drop the HTTP Server component
Step 4. Double click on the component and enter the URL path ex: "/api/v1/events", PORT number, ex: 9090, Method as POST, content type as application/json and click on save.
Step 5. Drag and drop Azure Blob Storage component from Data Source section and select the configured ABFS connector from the drop down.
Step 6. Finally drag and drop Iceberg component from Data Sink section and select Catalog, namespace and table (awscheck -> analytics -> webanalytics) in this case.
After performing the above steps, you're now ready to start the pipeline. Once the pipeline is running, events can be posted using REST APIs as shown below:
URL: http://<GATEWAY_ENDPOINT>/api/v1/events
Method: POST
Headers: {'Content-Type': 'application/json'}
Payload:
{
"user_id": "19834",
"browseragent": "Safari",
"timestamp": 1729852377000,
"url": "https://mywebsite.tracking.com/signup",
"event_name": "click",
"event_value": "Signup Button"
}