iiDrak Data Platform Setup
A modern Data Lakehouse solution with Open and Unified data processing platform for Data Lake and Data warehouse.
Get Started: iiDrak Data Platform Setup
The article helps you to kickstart your iiDrak Data Platform journey. The steps include
- Configure connectors
- Create your lakehouse
- Create Tables
- Load and query raw data
- Pre-created S3 or ABFS storage in case of cloud and Shared NFS in case of on-premise setup
- Access to data storage and permission to configure resources
Connectors helps you to create and store re-usable connection components. This can be shared in lakehouse creation, data pipeline, AI Studio etc.,. Steps to create a connector:
- Navigate to Settings -> Connectors Configuration Tab -> Click on [+ Configuration] button
- Select Source: This is to select type of connector. Connection properties are automatically displayed based on the source selected. In the current scenario let's use S3 connection as an example.
- Enter connector name (Let's name it as lhstorage1)
- Enter Access Key, Secret Key, Region and Bucket Name and click on create
Lakehouse setup is the first step in setting up your environment. With just few clicks you should be able to have a working DB and tables up and running. The steps includes:
- Enter a readable name for Lakehouse
- Select storage location - Refers to where the data to be stored/accessed from. It can be S3, ABFS or Shared Storage. Select S3 for our example.
- S3: This can be configured connector(lhstorage1 to be selected from the drop down)
- Providers: Based on the deployment configuration, the executors can be selected from Azure, AWS or GCP.
- Executors: This option allows users to create new executors or use already existing clusters. In case of shared clusters, an approval notification would be initiated to the creator of the cluster. If the cluster was created by the same user trying to share, there is no further action to be taken. Once the owner of the cluster approves, the resource can be shared across multiple Lakehouse clusters. It is suggested to share a single cluster max across 5 Lakehouse clusters.
- Once dedicated cluster is selected, user will be prompted to select
- Create
Create Tables
In this example, we highlight the flexibility of the platform and the ease with which data and executors across different cloud providers and work together. We use S3 bucket in this example as a storage path while using executors(Spark Cluster) from Azure
Once the lakehouse is online, simple navigate to SQL -> SQL Lab and execute a create table query.
create table awscheck.hybrid.demotable (
name string,
id int
)
Refresh the Catalogs and you can see the table created (Along with the namespace)
Raw data such as CSV or JSON stored within the Object storage can also be directly queried within the SQL editor. The local object browser(Upcoming feature)
INSERT OR IGNORE INTO awscheck.aws.catalog_test
SELECT data, id
FROM json."s3a://iceberg-s3test/*.json";