Nexaris - Wa-Sul AI Studio

A modern Data Lakehouse solution with Open and Unified data processing platform for Data Lake and Data warehouse.

Nexaris  - Wa-Sul AI Studio

Introduction

WU SUL AI Studio is a powerful module of iiDrak that offers advanced AI/ML tools for creating and deploying machine learning models, performing LLM and retrieval-augmented generation (RAG) tasks, Vision based tasks such as optical character recognition (OCR) and so on. It empowers users to efficiently retrieve and process data, build custom machine learning workflows, and extract information from various data sources. Some of the salient features are:

  1. Component based architecture
  2. Plug and play and interface
  3. Generation of optimised code internally for execution
  4. Download respective Jupyter notebook code for local development/enhancement
  5. Realtime testing.
  6. Continous enhacements and improvements to stay upto date with the current trends.

ML Experiment - Machine Learning (ML) Model Creation and Deployment

Overview

AI Studio allows users to train and deploy machine learning models for various applications such as regression, classification, and clustering. The platform provides an easy-to-use interface for selecting data sources, transforming datasets, and choosing the right machine learning algorithms.

A screenshot of a computer

AI-generated content may be incorrect.

Key Features

Steps for ML Model Creation

  1. Create AI Studio: Clicking on “+ AI STUDIO” will open a panel where you can input the required information. Once completed, a studio will be created, and you will be redirected to a blank canvas.
  2. Select Data Source: After creating the studio, users can add a data source. There are three options available: external database, query, and warehouse. Users can simply drag and drop their selected data source onto the canvas.

By selecting the external database option, you will be prompted to enter details such as the database fields, connection credentials, and the type of database. If you choose the query option, you will need to write your SQL query. For the warehouse option, you can select from available warehouses.

If you don't have connection credentials, navigate to Settings -> Connectors Configurations, and click the Configuration button to add your connection credentials.

You can choose a data source from the available options and enter the required credentials in this section.

  1. Data Transformation: Customize the data by selecting relevant columns, splitting columns, dropping columns, merging columns, handling outliers, and performing other transformations as needed. Users can drag and drop this component onto the canvas to transform the data sourced from their selected data source.
  2. Algorithm Selection: Once data preparation is complete, Choose the best machine learning algorithm based on the use case. Upcoming releases will focus on prompting user the best ML model and cleanup/transformation techniques for better accuracy.
  3. Run AI Studio Flow: When you click the Run button, the AI model will be executed. Additionally, each component on the canvas can be run independently. To view logs for a specific component, simply click on the success or failure icons associated with it.
  4. Model Deployment: Access your trained model through the Experiments section to review its performance and deploy it for production use.

Model Experiments

Access Trained Models: In the iiDrak AI section, you can explore experiments to review and manage your trained models effectively. Additionally, you have the option to download model details as a CSV file.

Model Overview: Users can view detailed information and the current status of their models, as well as register new models using the Register modal button

Metrics and Artifacts: Examine key performance metrics and access artifacts such as model files for future use.

A screenshot of a computer

AI-generated content may be incorrect.

Example Workflow 1

Optical Character Recognition (OCR) with AI Studio

Overview

AI Studio's OCR capabilities integrate with cloud based solutions such as AWS Textract or open source models such as Tesseractand various data sources such as S3, Azure Blob Storage, and ABFS. Users can extract text and structured data from scanned documents, including tables and forms. Users can define specific rules and categories for classifying documents. Extracted data is then stored in Iceberg tables for advanced querying and analytics.

A screenshot of a computer

AI-generated content may be incorrect.

Key Features

Classification Example: "Classify the document as follows: Toll Violation as 0, Government ID as 1, Medical Prescription as 2, Parking/Parking Violation as 3, and Speeding/SPEEDING as 4. Return only the digits as JSON output with the key 'id' for the number and the key 'type' for the mentioned type."

Steps for OCR

Example Workflow 2

Retrieval-Augmented Generation (RAG) with AI Studio

Overview

Retrieval-Augmented Generation (RAG) allows users to connect to multiple data sources, retrieve relevant information, and enhance the results with advanced AI techniques such as embeddings. RAG helps users interact with large datasets, perform efficient searches, and gain actionable insights through natural language queries.

Key Features

Steps for RAG Flow