How to guides

Data Hub

The Data Hub provides comprehensive data integration and management capabilities for seamlessly integrating and unifying enterprise data. It centralizes data management for enhanced operational efficiency, enables effortless upload and organization of documents at scale, and unlocks insights with robust business intelligence reporting tools.

Overview

The Data Hub serves as the central platform for all data-related operations in Vue.ai, offering three main components:

Connection Manager: Connect to any data source or destination with support for 250+ data sources and 200+ destinations
Document Manager: Intelligent document processing with AI-powered extraction and classification capabilities
Dataset Manager: Comprehensive dataset management with profiling, versioning, and relationship modeling

Connection Manager

The Connection Manager is the I/O of the Vue Platform. It enables connecting to any data source or destination to process data using a simple user interface, supporting data in all formats, sizes, and from any data system. Connector Manager supports data in all formats, sizes, and from any data system. Data is brought into the system in the form of datasets.

With Sources, users can:

Establish Connection to Any Data Source: Read data from over 250 supported data sources out of the box
Custom Sources: Build custom sources via the Connector Development Kit (CDK), a low-code interface

Managing Sources

With Destinations, users can:

Establish Connection to Any Data Destination: Write data to over 200 supported data destinations out of the box
Custom Destinations: Build custom destinations via the Connector Development Kit (CDK), a low-code interface

Managing Destinations

With Connections, users can:

Establish Link Between Source & Destination: Create connections between any source and destination
Configure Sync Frequency: Set how often data should be synchronized
Define the Stream Configuration: Specify the stream and its configuration for syncing

Managing Connections

Data Ingestion Using Connectors

This comprehensive guide assists in understanding the configuration of data sources and destinations and the establishment of connections for seamless data flow in Vue.ai's Connection Manager.

Getting Started

Prerequisites Before beginning, ensure that:

Basic data concepts like schemas, appending, and de-duplication are understood
Connector Concepts: sources, destinations, and CRON expressions are understood
Administrator access to the Vue.ai platform is available
Credentials for the data sources and destinations to be configured are available

Familiarity with basic data concepts like schemas, appending, and de-duplication is required.

Configuring a Source

Navigation Navigate to Data Hub → Connection Manager → Sources
Create Source On the Source Listing page, click Create New
- Enter a unique name for the source
- Select a source type (e.g., PostgreSQL, Google Analytics)
- Provide necessary credentials in the configuration form
Test Connection Verify the source connection by selecting Test Connection

Configuring a Destination

Navigation Go to Data Hub → Connection Manager → Destination
Create Destination Click Create New on the Destination Listing page
- Enter a unique name
- Select the destination type (e.g., Vue Dataset)
Test Connection Verify the destination configuration

Establishing a Connection

Navigation Go to Data Hub → Connection Manager → Connections
Create Connection Select Create New on the Connection Listing page
- Enter a connection name
- Choose the source and destination
Configure Settings
- Data Sync Frequency: Choose Manual or Scheduled (configure CRON expressions if needed)
- Select streams or schemas for data transfer
- Specify sync options: Full Refresh or Incremental
Run Connection Select Create Connection and execute it

Sources Configuration

HubSpot Data Source Configuration

This guide provides step-by-step instructions on configuring HubSpot as a data source, covering prerequisites, authentication methods, and configuration steps for seamless integration.

Prerequisites Before beginning the integration, ensure the following are available:

HubSpot Developer Account
Access to HubSpot API Keys or OAuth Credentials

Ensure access to a HubSpot Developer Account and HubSpot API Keys or OAuth Credentials before starting.

Authentication Methods

HubSpot supports two authentication methods for data source configuration:

OAuth
Private App Authentication

OAuth Authentication

Credentials Needed:

Client ID
Client Secret
Refresh Token

To obtain OAuth Credentials:

Access the HubSpot Developer Account Navigate to Apps within the account
Identify an App with Required Scopes Create or identify an app with the required scopes:
- tickets
- e-commerce
- media_bridge.read
- crm.objects.goals.read
- timeline
- crm.objects.marketing_events.write
- crm.objects.custom.read
- crm.objects.feedback_submissions.read
- crm.objects.custom.write
- crm.objects.marketing_events.read
- crm.pipelines.orders.read
- crm.schemas.custom.read
In the app screen, navigate to the Auth Section to locate the Client ID and Client Secret
Open the Sample Install URL (OAuth) and authenticate your HubSpot account. Copy the authorization code from the redirect URL
Use the code to obtain a Refresh Token by executing the following cURL command

curl --location 'https://api.hubapi.com/oauth/v1/token' \
--header 'Content-Type: application/x-www-form-urlencoded' \
--data-urlencode 'grant_type=authorization_code' \
--data-urlencode 'client_id=<placeholder_client_id>' \
--data-urlencode 'client_secret=<placeholder_client_secret>' \
--data-urlencode 'redirect_uri=<placeholder_redirect_uri>' \
--data-urlencode 'code=<placeholder_code>'

Private App Authentication

Credentials Needed:

Private App Access Token

To set up Private App Authentication:

Navigate to Private Apps Settings Go to Settings > Integrations > Private Apps in the HubSpot account
Locate and select the desired Private App, then click View Access Token to copy the token

For more details, visit the HubSpot API Documentation.

Google Sheets Data Source Configuration

A comprehensive guide to configuring Google Sheets as a data source, covering prerequisites, authentication methods, configuration steps, and supported functionalities.

Prerequisites Before beginning, ensure the following prerequisites are met:

A Google Cloud Project with the Google Sheets API enabled
A service account key or OAuth credentials for authentication
Access to the Google Sheet intended for integration

Ensure access to a Google Cloud Project with the Google Sheets API enabled, a service account key or OAuth credentials for authentication, and the Google Sheet intended for integration, before starting.

Overview

Google Sheets, due to its flexibility and ease of use, is a popular choice for data ingestion. The platform supports two authentication methods—Service Account Key and OAuth Authentication—allowing secure connection of spreadsheets to the data ingestion tool.

Configuration Steps

Prerequisites
- Enable the Google Sheets API for your project carefully
- Obtain a Service Account Key or OAuth credentials
- Ensure the spreadsheet permissions allow access to the service account or OAuth client
Choose Authentication Method
Service Account Key Authentication:
1. Create a service account and grant appropriate roles (Viewer role recommended) in Google Cloud Console
If the spreadsheet is viewable by anyone with its link, no further action is needed. If not, give your Service Account access to your spreadsheet.
1. Generate a JSON key by clicking on Add Key under Keys tab
2. Grant the service account viewer access to the Google Sheet
OAuth Authentication:
1. Create OAuth 2.0 Credentials by going to APIs & Services → Credentials
2. Click Create Credentials → OAuth Client ID
3. Configure the OAuth consent screen:
  - Provide app name, support email, and authorized domains
  - Select Application Type as Web application
  - Add your application's Redirect URI
Generate Authorization URL
Use the following format:
```
https://accounts.google.com/o/oauth2/auth?
client_id={CLIENT_ID}&
response_type=code&
redirect_uri={REDIRECT_URI}&
scope=https://www.googleapis.com/auth/spreadsheets https://www.googleapis.com/auth/drive&
access_type=offline&
prompt=consent
```
Exchange Authorization Code
Make a POST request to:
```
https://oauth2.googleapis.com/token
```
The response will include:
```
{
"access_token": "ya29.a0AfH6SMCexample",
"expires_in": 3599,
"refresh_token": "1//0exampleRefreshToken",
"scope": "https://www.googleapis.com/auth/spreadsheets https://www.googleapis.com/auth/drive",
"token_type": "Bearer"
}
```
Configure Data Source
- Select Google Sheets as the source type
- Provide authentication details:
  - For Service Account Key, paste the JSON key
  - For OAuth, provide the Client ID, Client Secret, and Refresh Token
- Enter additional configuration details, such as the spreadsheet link and row batch size
Based on Google Sheets API limits documentation, it is possible to send up to 300 requests per minute, but each individual request has to be processed under 180 seconds, otherwise the request returns a timeout error. Consider network speed and number of columns of the Google Sheet when deciding a row_batch_size value. The default value is 200, but if the sheet exceeds 100,000 records, consider increasing the batch size.
Test Connection Verify integration by testing the connection. If the connection fails, recheck your credentials, API settings, and project permissions.

Additional Information

Supported sync modes: Full Refresh (Overwrite and Append)
Supported streams: Each sheet is synced as a separate stream, and each column is treated as a string field

API limits: Google Sheets API allows 300 requests per minute with a 180-second processing window per request. Adjust batch sizes accordingly.

PostgreSQL Data Source Configuration

Step-by-step instructions for configuring PostgreSQL as a data source with secure connections using SSL modes and SSH tunneling, and understanding advanced options like replication methods.

Prerequisites Before beginning, ensure the following details are available:

Database Details: Host, Port, Database Name, Schema
Authentication: Username and password

Ensure access to database details and authentication credentials before starting.

Overview

PostgreSQL, a robust and versatile relational database system, supports various integration methods for data sources. This guide explains essential configurations, optional security features, and advanced options such as replication and SSH tunneling.

Configuration Steps

Select PostgreSQL as the Source Type
Fill in the required details
- Host: Provide the database host
- Port: Specify the database port (default: 5432)
- Database Name: Name of the database to connect
- Schema: Schema in the database to use
- Username: Database username
- Password: Database password
Additional Security Configuration (Optional)
- SSL Mode: Choose from the available modes (e.g., require, verify-ca)
- SSH Tunnel Method: Select the preferred SSH connection method if required
Advanced Options (Optional) Replication Method: PostgreSQL supports two replication methods: Change Data Capture (CDC) and Standard (with User-Defined Cursor)
1. Change Data Capture (CDC):
  - Uses logical replication of the Postgres Write-Ahead Log (WAL) to incrementally capture deletes using a replication plugin
  - Recommended for:
    - Recording deletions
    - Large databases (500 GB or more)
    - Tables with a primary key but no reasonable cursor field for incremental syncing
2. Standard (with User-Defined Cursor):
  - Allows incremental syncing using a user-defined cursor field (e.g., updated_at)

SSL Modes

PostgreSQL supports multiple SSL connection modes for enhanced security:

disable: Disables encrypted communication between the source and Airbyte
allow: Enables encrypted communication only when required by the source
prefer: Allows unencrypted communication only when the source doesn't support encryption
require: Always requires encryption. Note: The connection will fail if the source doesn't support encryption
verify-ca: Always requires encryption and verifies that the source has a valid SSL certificate
verify-full: Always requires encryption and verifies the identity of the source

SSH Tunnel Configuration (Optional)

To enhance connectivity, PostgreSQL supports SSH tunneling for secure database connections:

No Tunnel: Direct connection to the database
SSH Key Authentication: Use an RSA Private Key as your secret for establishing the SSH tunnel
Password Authentication: Use a password as your secret for establishing the SSH tunnel

Supported Sync Methods

The PostgreSQL source connector supports the following sync methods:

Mode	Description
Full Refresh	Fetches all data and overwrites the destination
Incremental	Fetches only new or updated data since the last sync

Commonly used SSL modes are 'require' and 'verify-ca.' SSH tunneling is optional and typically used for enhanced security when direct database access is restricted.

Amazon Redshift Source Configuration

Step-by-step instructions for configuring Amazon Redshift as a data source, covering prerequisites, authentication methods, and configuration steps for seamless integration.

Prerequisites Before beginning, ensure the availability of the following:

Host: The hostname of the Amazon Redshift cluster
Port: The port number for the Amazon Redshift cluster (default is 5439)
Database Name: The name of the Redshift database to connect to
Schemas: The schemas in the specified database to access
Username: The Redshift username for authentication
Password: The Redshift password for authentication

Ensure access to the necessary prerequisites and authentication details for successful configuration.

Configuration Steps

Select Amazon Redshift as the Source Type
Provide Configuration Details
- Enter the hostname of the Redshift cluster in the Host field
- Enter the port number (default: 5439) in the Port field
- Enter the database name in the Database Name field
- List the schemas to access in the database in the Schemas field
- Enter the Redshift username in the Username field
- Enter the Redshift password in the Password field
Test the Connection Ensure that the credentials and configuration are correct
Ensure that network settings, such as firewalls or security groups, allow connections to the Redshift cluster.

Advanced Configuration Options

SSL Configuration

SSL Mode: Choose between disable, allow, prefer, require, verify-ca, or verify-full
Certificate: Upload SSL certificate if required by your Redshift cluster

Connection Pooling

Pool Size: Configure connection pool size for optimal performance
Timeout: Set connection timeout values
Retry Policy: Configure retry attempts for failed connections

Schema Selection

Include/Exclude: Use patterns to include or exclude specific schemas
Wildcards: Support for wildcard patterns in schema selection
Case Sensitivity: Configure case-sensitive schema matching

Supported Sync Modes

The Amazon Redshift source connector supports the following sync modes:

Mode	Description
Full Refresh	Fetches all data and overwrites the destination
Incremental	Fetches only new or updated data since the last sync

Amazon Redshift requires username and password authentication for connecting to the database. Ensure that the Redshift credentials have the necessary permissions to access the database and schemas.

Destinations Configuration

Redshift Destination Configuration

Step-by-step instructions for configuring Amazon Redshift as a destination with S3 staging for efficient data loading.

Prerequisites Before beginning, ensure the following are available:

An active AWS account
A Redshift cluster
An S3 bucket for staging data
Appropriate AWS credentials and permissions

Required Credentials include:

Redshift Connection Details:
- Host
- Port
- Username
- Password
- Schema
- Database
S3 Configuration:
- S3 Bucket Name
- S3 Bucket Region
- Access Key Id
- Secret Access Key

Redshift replicates data by first uploading to an S3 bucket and then issuing a COPY command, following Redshift's recommended best practices.

AWS Configuration

Set up Redshift Cluster
- Log into the AWS Management Console
- Navigate to the Redshift service
- Create and activate a Redshift cluster if needed
- Configure VPC settings if Airbyte exists in a separate VPC
Configure S3 Bucket
- Create a staging S3 bucket
- Ensure the bucket is in the same region as the Redshift cluster
- Set up appropriate bucket permissions

Permission Setup

Execute the following SQL statements for required permissions:

GRANT CREATE ON DATABASE database_name TO read_user;  
GRANT usage, create on schema my_schema TO read_user;  
GRANT SELECT ON TABLE SVV_TABLE_INFO TO read_user;

Supported Sync Methods

The Redshift destination connector supports the following sync methods:

Mode	Description
Full Refresh	Fetches all data and overwrites the destination
Incremental - Append Sync	Fetches only new or updated data and appends it to the destination
Incremental - Append + Deduped	Fetches new or updated data, appends it to the destination, and removes duplicates

Data Specifications Naming Conventions for Standard Identifiers require them to start with a letter/underscore, contain alphanumeric characters, have a length of 1-127 bytes, and contain no spaces or quotation marks. Delimited Identifiers are enclosed in double quotes, can contain special characters, and are case-insensitive.

Data Size Limitations include a maximum of 16MB for raw JSON records, a 65,535 bytes limit for VARCHAR fields, and handling of oversized records by nullifying values exceeding VARCHAR limits while preserving Primary Keys and cursor fields when possible.

PostgreSQL Destination Configuration

Step-by-step instructions for configuring Postgres as a destination with secure connections and performance optimization.

Prerequisites Before beginning, ensure the following are available:

A PostgreSQL server version 9.5 or above
Database details and authentication credentials
Proper network access configuration

PostgreSQL, while an excellent relational database, is not a data warehouse. It should only be considered for small data volumes (less than 10GB) or for testing purposes. For larger data volumes, a data warehouse like BigQuery, Snowflake, or Redshift is recommended.

Database User Setup

A dedicated user should be created with the following command:

CREATE USER read_user WITH PASSWORD '<password>'; 
GRANT CREATE, TEMPORARY ON DATABASE <database> TO read_user;

The user needs permissions to:

Create tables and write rows
Create schemas

Configuration Steps

Provide Connection Details
- Host: Database server hostname
- Port: Database port (default: 5432)
- Database Name: Target database
- Username: Database username
- Password: Database password
- Default Schema Name: Schema(s) for table creation
Security Configuration (Optional)
- SSL Mode: Choose appropriate encryption level
- SSH Tunnel Method: Select if required
- JDBC URL Parameters: Add custom connection parameters

Data Type Mapping and Raw Tables Structure are provided. Each stream creates a raw table with specific columns. Final Tables Mapping, Supported Sync Modes, and Naming Conventions are also detailed.

Vue Data Catalog Destination Configuration

Step-by-step instructions for configuring Vue Data Catalog as a destination with multiple access modes and performance optimization.

Prerequisites Before beginning, ensure the following are available:

Access to Enterprise AI Orchestration Platform | Vue.ai
Necessary permissions to create and manage datasets
Understanding of the data structure and volume

Dataset Creation Methods A dataset can be created through:

Enterprise AI Orchestration Platform
Datasets API
Vue SDK

Configuration Steps

Choose Dataset Access Mode
- File-based (CSV, JSON, Parquet, Delta)
- Relational Database (PostgreSQL)
- Polyglot (combination of both)
Configure Storage Settings
- For file-based: S3 or Azure Container configuration
- For relational: PostgreSQL database details
- For polyglot: Both storage configurations
Set Performance Parameters
- Buffer Size
- CPU Limit
- Memory Limit
Configure Data Processing Options
- Writing mode (append, append-dedupe, overwrite)
- Schema handling preferences
- Data type mappings

Supported Datatypes For File Datasets (Delta)

Input Datatype	Output Datatype
string	pyarrow string
integer	pyarrow int64
number	pyarrow float64
boolean	pyarrow bool_
timestamp	pyarrow timestamp(nanosecond)

Supported Datatypes For Relational Database

Input Datatype	Output Datatype for PostgreSQL
string	BIGINTEGER
integer	INTEGER
float	DOUBLE PRECISION
bool	BOOLEAN
datetime	TIMESTAMP

Document Manager

The Document Manager provides comprehensive capabilities for intelligent document processing (IDP), from defining document types and taxonomies to executing complex extraction workflows. It enables automated extraction and processing of structured and unstructured documents.

Key Capabilities:

OCR Processing: Convert images and PDFs to machine-readable text
Auto-Classification: Automatically identify document types
Data Extraction: Extract specific fields and values from documents
Review Workflow: Human-in-the-loop validation and correction
Batch Processing: Handle large volumes of documents efficiently

Advanced Features:

Live OCR: Annotate and extract data in real-time
Auto-Classification Models: Identify the correct document type automatically
Data Enrichment Techniques: Use methods like STP and matching to further enrich and organize extracted data
One-Click Features: Utilize one-shot learning and zero-shot learning for high output accuracy

Core Functionalities:

Document Type Management: Create & manage taxonomy and register new document types
Document Processing: Upload documents, review extracted data, and annotate extracted data
Performance Analytics: Analyze model performance and accuracy metrics based on provided feedback

Document Type

This guide will walk you through the step-by-step process of creating and registering a new Document Type. This is the foundational step for teaching the AI how to extract data from your specific documents.

Objective: To create a reusable template (Document Type) that can accurately extract data from a specific kind of document, such as a driver's license or an invoice.

Prerequisites:

Access to Document Manager
You must have at least one high-quality example image or PDF of the document you want to process.

Step 1: Navigate to the Document Type Manager

From the main dashboard, hover over Data Hub in the top navigation bar.
In the dropdown menu, under Document Manager, click on Document Type.

Navigating to the Document Type section.

This will take you to the "All Document Type" page, which lists all existing document types in your account.

The Document Type listing page.

Step 2: Create and Configure the New Document Type

Click the + Create New Document Type button.
On the "Upload Document" screen, fill in the initial details:
- Document Type Name: Give your template a clear, unique name (e.g., US Drivers License - CA).
- Layout: Select the layout that best describes your document (e.g., Structured).
- Tags (Optional): Add any relevant tags for organization.
In the "UPLOAD FILE" section, drag and drop your example document or click browse to upload it.
Click Next Step.

Configuring the new Document Type and uploading a sample document.

Step 3: Review the Initial (0-Shot) Extraction

After uploading, you are taken to the annotation interface. The system automatically performs a 0-shot extraction—an initial attempt to identify and extract data without any prior training.

The initial 0-shot extraction result in the annotation interface.

On the right, you'll see two tabs representing the 0-shot results:

Taxonomy (The Field Names)	Document Extraction (The Field Values)
The `Taxonomy` tab lists the names of the attributes the AI believes are present. This is your starting point for building the schema.	The `Document Extraction` tab shows the actual data extracted for each attribute, along with a confidence score.

Your goal is to refine this initial result into a perfect, reusable taxonomy.

Step 4: Refine the Taxonomy

Now, you will edit, add, or delete attributes to match your exact requirements.

Editing Standard Attributes

For each attribute you want to keep or modify:

Click on the attribute in the list. The configuration panel will open on the right.
Define its properties:
- Attribute Name: Change the raw name (e.g., DOB) to a user-friendly one (e.g., Date of Birth).
- Annotation: Adjust the bounding box on the document image if it's incorrect.
- Select Type: Choose the correct data type (e.g., Date, Free Form Text). This is critical for validation and formatting.
- Description / Instruction: Add context for the model and human reviewers.
Click Save.

Editing a `Date` attribute	Editing a `Free Form Text` attribute

Configuring Table Attributes

If your document contains a table, the process is more detailed:

When you create or edit a Table attribute, first define its approximate Columns and Rows in the right-hand panel. Then draw a bounding box around the entire table.
Click the Manage button under "Configure Columns" to define the table's internal schema.
In this view, you can define each column's Header, Alias, Data Type, and more. This creates a standardized output for your table data.

Step 5: Verify the Final Taxonomy and Extraction

Once you have configured all your attributes, perform a final review.

Switch to the Taxonomy tab. It should now show your clean, finalized list of attribute names.
Switch to the Document Extraction tab. This view shows the extracted values based on your refined taxonomy. Check that the values are correct and properly formatted. Note the use of tags (date, name, etc.) for filtering.

Step 6: Register the Document Type

When you are fully satisfied with the taxonomy and the extraction results, you are ready to finalize the Document Type.

Click the Register button in the top-right corner of the page.
The status of your Document Type will change from Draft to Registered.

Congratulations! Your Document Type is now a live, reusable model that can be used to automatically process new documents of the same kind.

Document Extraction

This guide provides step-by-step instructions for uploading, processing, and reviewing documents using the platform's user interface.

Step 1: Navigate the Documents Hub

The Documents Hub is your central dashboard for all processed documents. You can access it from Data Hub > Document Manager > Documents.

The Documents Hub listing all uploaded documents

From here, you can search, filter, assign documents for review, and access key actions like Annotate or View Job. The "View Job" action takes you to the Automation Hub to see the specific workflow run for that document.

The "View Job" action links to the Automation Hub workflow run.

Step 2: Upload New Documents

Click the + Upload Documents button to open the upload modal.
Provide a Document Batch Name and optional Tags for organization.
Choose a Document Type:
- Select a specific type if all documents are the same.
- Choose Auto Classify to let the system identify the type for each document automatically.
Drag and drop your files or browse to upload.

The Upload Documents modal.

Choose a specific Document Type or use Auto Classify.

Step 3: Review and Annotate Extraction Results

After processing, click the Annotate action for a document to open the review interface. This screen is divided into three panels for an efficient workflow.

The main annotation interface with its three-panel layout.

Left Panel (Navigator): Click page thumbnails to jump between pages.
Center Panel (Viewer): Interact with the document image and its bounding boxes.
Right Panel (Results): View and edit the extracted data.

The left panel provides page navigation.

Correcting Data

If an extracted value is incorrect:

Click the attribute in the right panel to open the edit view.
You can edit the text directly, re-draw the bounding box on the document, or provide natural language feedback to the model.

Editing an attribute and providing natural language feedback.

Step 4: Reviewing Extracted Tables

Table data has a specialized review interface.

Merged View: For tables spanning multiple pages, the system presents a single merged table first, which you can expand to see the tables from individual pages.
Review Views: You can switch between two views:
1. Spreadsheet View: A clean grid for easy scanning and editing. You can sort, filter, and even perform quick calculations like summing selected cells.
2. Cell View ("Show Crops"): Displays the actual image snippet for each cell, perfect for verifying difficult-to-read characters.

Spreadsheet View (with Column Management)	Cell View (Visual Crops)

Step 5: Finalize the Review

Once all corrections are made, click Save and Exit. The document's status will update to Reviewed, and your corrections will be used to improve the model over time.

Dataset Manager

The Dataset Manager provides a centralized platform for uploading, organizing, and managing datasets efficiently. It supports multiple data formats and provides comprehensive data profiling capabilities.

Key Features:

Multi-Format Support: CSV, Delta, Parquet, JSON, and more
Data Profiling: Automatic analysis of data quality and statistics
Dataset Groups: Organize related datasets with ER diagrams
Version Control: Track dataset changes and maintain history
Access Control: Manage permissions and sharing settings

Core Capabilities:

Data Onboarding: Upload files simply and efficiently in all formats, sizes, and from any data system
Data Processing: Automatically profile and sample data to make it ready for consumption
Data Unification: Bring together data from different systems into Vue for unified analysis
Workflow Integration: Use data to build automated workflows and reports
Relationship Management: Form relationships between data using ER diagrams and summarize datasets within groups

Data Processing Pipeline: Once data is brought into the system, it is:

Profiled: Analyze data characteristics and quality
Sampled: Extract representative data samples
Available for Use: Utilize data in building automated workflows

Data Ingestion

Learn how to upload and manage datasets effectively in the Vue.AI platform.

Getting Started

Prerequisites

Access to Vue.AI Dataset Manager
Data files prepared for upload
Understanding of your data schema and relationships

Supported File Formats

CSV: Comma-separated values (primary format)
Delta: Delta Lake format for big data
Parquet: Columnar storage format
JSON: JavaScript Object Notation
Excel: .xlsx files (converted to CSV)

File Size Limits

Individual files: 50MB - 2GB depending on format
Batch upload: Up to 10GB total
Streaming ingestion: Unlimited with appropriate setup

Upload Process

Navigate to Dataset Manager
- Go to Data Hub → Dataset Manager → Datasets
- Click "Upload Dataset" or use drag-and-drop interface
File Selection and Configuration
- Select files from your local system
- Choose file format and encoding settings
- Configure column separators and delimiters
- Set header row and data type detection options
Schema Configuration
- Review auto-detected column types
- Modify data types as needed (String, Integer, Float, Date, Boolean)
- Set primary keys and unique constraints
- Configure null value handling
Data Validation
- Preview sample data before upload
- Validate data quality and format consistency
- Review data profiling statistics
- Address any validation warnings
Upload and Processing
- Initiate the upload process
- Monitor upload progress and status
- Review upload summary and any errors
- Confirm successful dataset creation

Dataset Groups and Organization

Creating Dataset Groups

Group related datasets for better organization
Create Entity-Relationship (ER) diagrams
Define relationships between datasets
Set group-level permissions and access controls

ER Diagram Configuration

Identify primary and foreign key relationships
Create visual representations of data connections
Configure join conditions and relationship types
Enable cross-dataset queries and analysis

Organizational Features

Folder-based organization structure
Tag-based categorization system
Search and filter capabilities
Metadata management and documentation

Data Profiling and Quality

Automatic Profiling

Column statistics (min, max, mean, median)
Data type distribution and consistency
Null value analysis and missing data patterns
Unique value counts and cardinality

Data Quality Metrics

Completeness: Percentage of non-null values
Validity: Data format and type compliance
Consistency: Cross-column validation
Accuracy: Data range and constraint validation

Sampling Methods

Random sampling for large datasets
Stratified sampling for representative analysis
Time-based sampling for temporal data
Custom sampling rules and configurations

Metrics and Visualization

The Dataset Manager includes comprehensive reporting and visualization capabilities with extensive chart and control options.

Report Creation

Getting Started with Reports

Navigate to Dataset Manager → Metrics Overview
Select datasets for analysis
Choose visualization types and configurations
Configure filters and interactive controls

Chart Types Available

Bar, Line, and Area Charts

Compare values across categories
Show trends over time
Display cumulative data patterns
Configure multiple data series
Customize colors, labels, and data point styling

Scatter Plots

Analyze relationships between variables
Identify correlations and outliers
Configure bubble sizing and colors
Add trend lines and regression analysis
Enable interactive point selection and tooltips

Donut and Pie Charts

Show proportional data distribution
Compare category percentages
Configure color schemes and labels
Add interactive drilling capabilities
Support for exploded views and animations

Tables and Pivot Tables

Display detailed data with sorting and filtering
Create cross-tabulation analysis
Configure aggregation functions (sum, avg, count, etc.)
Export data in various formats (CSV, Excel, PDF)
Conditional formatting and custom styling options

Funnel Charts

Analyze conversion rates and processes
Track multi-step workflows
Identify bottlenecks and drop-off points
Configure stage labels and metrics
Support for both standard and multi-level funnels

Matrix Visualizations

Heat map representations of data
Cross-category analysis
Color-coded value ranges
Interactive cell exploration
Customizable color gradients and thresholds

KPI Metrics

Single-value displays for key indicators
Comparison with targets and benchmarks
Trend indicators and change calculations
Alert configuration for threshold breaches
Support for custom formulas and calculations

Advanced Visualization Features

Interactive Controls

Dropdown Controls: Filter data by category values with multi-select capability
Date Range Controls: Time-based filtering with preset ranges and custom selection
Range Slider Controls: Numeric value filtering with histogram background display
Text Search: Real-time filtering based on text input
Cascading Filters: Dynamic filter relationships based on selection

Dashboard Management

Drag-and-drop Layout: Flexible widget positioning with responsive grid system
Template System: Save and reuse dashboard configurations
Real-time Updates: Live data refresh with configurable intervals
Collaboration Features: Share dashboards with permission-based access
Export Options: Generate PDF reports, scheduled deliveries, and API access

Performance Optimization

Data Caching: Intelligent caching strategies for large datasets
Query Optimization: Automatic query optimization and indexing
Load Balancing: Distribute processing across multiple resources
Incremental Updates: Process only changed data for improved performance

Interactive Controls Configuration

Dropdown Controls

Filter data by category values with multi-select capability
Dynamic option loading based on data availability
Cascading filter relationships for complex filtering scenarios
Custom styling and validation rules

Date Range Controls

Time-based filtering with multiple preset options
Custom date selection with calendar interface
Relative date calculations (Last 7 days, Month to Date, etc.)
Time zone support and localization

Range Slider Controls

Numeric value filtering with min/max range selection
Real-time data updates with histogram background display
Step configuration for discrete value selection
Multiple range support for complex filtering

Layout Management

Drag-and-drop widget positioning with responsive grid system
Full-screen and widget sizing options for different display modes
Template-based dashboard creation for consistent designs
Mobile-responsive layouts for on-the-go access

Sharing and Collaboration

Share dashboards with team members using permission-based access
Configure view and edit permissions with role-based access control
Export dashboards as PDFs, images, or interactive web links
Schedule automated report delivery via email or webhooks

Performance Optimization

Data refresh scheduling with configurable intervals
Intelligent caching strategies for large dataset handling
Query optimization and automatic indexing for faster response times
Real-time vs batch processing options based on use case requirements

Automation Hub

The Automation Hub provides powerful workflow orchestration capabilities with a comprehensive library of nodes for building end-to-end automation solutions. Design advanced analytics and machine learning workflows tailored to your needs.

Streamline the design and execution of workflows with advanced automation capabilities, enabling scalable and efficient data and computational processes
Create custom nodes and automate processes for specific problem statements

Agent Building

Build intelligent agents through a low-code/no-code setup on the Vue Platform Automation Hub.

Agent Service Guide

This guide will assist in using the Agent Builder and building agents using the builder interface.

Prerequisites

Understanding of the concepts and components involved in workflow creation
An understanding and a clear plan to create agents
Ensure access to the Agent and Workflows before starting

Navigation Path Navigate to Home/Landing Page → Automation Hub → Workflow Manager → Agents

Navigation to Agents

Agent Listing Page This leads to the Agents Listing Page, where existing agents can be accessed and new agents can be created. To create a new agent, click on the New Agent button at the top-left of the Agents Listing screen.

Agents Listing

Agent Canvas - Top Bar The top bar provides essential information and controls:

Deployment Status: Indicates whether the agent is currently deployed or not
Refresh Button: Allows you to refresh the agent's state to reflect the deployment status
Workflow Navigation: Button to quickly navigate to the workflow associated with the agent
Update Status: Shows when the agent has unsaved changes
Update Button: Enables you to update the agent to the latest version or configuration

New Agent CTA

Agent Details Section This section contains three key components:

Agent Name: The display name of the agent for identification
Agent ID: A unique identifier assigned to the agent
Instructions: The system prompt that guides the behavior and response patterns of the agent

Agent Details

Agent Settings Section Configure the agent with the following options:

Model: Specifies the LLM that will serve as the brain of the agent
Temperature: Controls creativity/randomness of responses (0-1, default 0.7)
History Context: When enabled, uses recent chat history for answers
Similarity Context: When enabled, refers to similar older chats for responses
Top K: Number of recent chats for reference (when History Context enabled)
Top P: Similarity threshold for older chats (when Similarity Context enabled)

Agent Configuration and Settings

Chat Window

Users can interact with the agent through real-time communication
Give prompts, ask questions, upload files, and give commands
Chat generates output in easily readable formats like tables and charts
Reset chat history using the reset chat button for testing from scratch

Agent Chat Window

Actions Actions allow agents to interact with external entities. Three types of actions:

Workflow Actions: Attach predefined workflows to automate multi-step processes
- Configure Name, Description, Run Type (Async/Sync), Workflow, Input Schema
API Actions: Integrate external APIs for system interactions
- Support for HTTP Object and HTTP Curl formats
Agent Actions: Link other agents for collaborative systems
- Configure Name, Description, Run Type, and target Agent

If no actions are configured, the agent will perform chat completion using its base LLM knowledge.

Workflow Manager

The Workflow Manager provides comprehensive tools for creating, deploying, and managing automated workflows with an intuitive canvas-based interface.

Orchestration

Welcome to the Workflow Orchestration: A Guide to Utilizing the Workflow Canvas! This guide will assist in understanding the key functionalities of the Workflow Canvas and learning how to leverage the Workflow Canvas to create efficient workflows.

Who is this guide for? This guide is designed for users of the Workflow Canvas.

Ensure access to the Workflow Concepts documentation is available before starting.

Overview

The Workflow Canvas provides a straightforward means for users to connect nodes, enabling seamless automation of tasks and data processing. It allows users to create workflows, configure settings, deploy them, and monitor executions in real-time.

Prerequisites Before beginning, ensure the following has been reviewed:

Workflow Concepts documentation

Step-by-Step Instructions

Navigation Path: Home/Landing Page → Automation Hub → Workflow Manager → Workflows.

Vue.ai Landing Page

This path leads to the Workflows Listing Page, where existing workflows can be accessed and new workflows can be created.

Workflow Listing

Creating a New Workflow

To start a new workflow:

Click the New Workflow button at the top-left of the Workflows Listing screen. This will open the Workflow Canvas interface.

The Workflow Canvas Top Bar

The top bar of the Workflow Canvas provides essential workflow information and controls, including:

Workflow Canvas Top Bar Functionalities

Workflow Name: Newly created workflows are named "workflow_#" by default. Use the edit button to give it a more meaningful name.
Workflow Status: Indicates the current state of the workflow, with common statuses like DRAFT, DEPLOYING, DEPLOYED, and FAILED.
Gear Icon - Workflow Configurations: Opens a settings menu where you can specify the workflow's runtime engine, schedule it, or choose to run it on sample data or the full dataset.

Workflow Configurations Settings Screen

Full Screen Icon: Switches the canvas to full-screen mode for a more focused view.
Save Button: Saves the workflow manually, though autosave is also enabled.
Deploy Button: Deploys the workflow to the selected engine.
Run Button: Becomes active after deployment is successful, initiating a job that can be viewed in real time.

Workflow Left Pane (Node Sidebar)

The left sidebar is where all nodes are located, offering various functionalities for building your workflow:

Left Pane of the Workflow Canvas

Search: Quickly locate a specific node by name.
Refresh Icon: Updates the node list, especially useful when new nodes have been added.
Add Node: Opens a node creation page for building custom nodes.
Drag & Drop: Drag nodes onto the Workflow Canvas to start connecting and building data pipelines. Each node has a unique Node Configuration panel displayed in the right pane when selected.

Additional Sidebar Functions

Zoom Controls: Zoom in/out, fit the view to screen, auto-arrange nodes, or use the outline view to see all nodes on the canvas at once.

Quality of Life Functionalities

Using the Console

The Console at the bottom of the screen shows output and error messages, specifically at the node level. This feature is valuable for debugging, allowing you to trace issues back to the specific node that encountered an error.

The Error Console

Additional Workflow Canvas Features

Workflow Arrangement Options: Choose between horizontal or vertical layout for workflow arrangement.
Mini-Map View: Located at the bottom-right, this provides a consolidated view of the entire workflow, highlighting the visible section on your screen to help you navigate larger workflows.

Troubleshooting

Common Issues and Solutions

Problem 1: Debugging Issues Cause: Errors traced back to their respective nodes. Solution:

Use the Error Console to trace errors back to their respective nodes.

Problem 2: Deployment Failures Cause: Incorrect configurations. Solution:

Ensure all configurations are set correctly before deployment.

Problem 3: Workflow Not Running Cause: Workflow not successfully deployed. Solution:

Confirm the workflow is successfully deployed before execution.

Additional Information

The Workflow Canvas allows workflows to be scheduled for automated execution. Workflow configurations can also be adjusted to optimize performance.

The workflow's control flow follows the sequence in which nodes are added to the canvas.
Ensure appropriate node usage: Transform nodes and Custom Code nodes cannot be used together.
Keep node names concise and clear for better readability on the canvas, ensuring smoother workflow deployment.

FAQs

How do I create a new workflow?

Navigate to Automation Hub → Workflow Manager → Workflows
Click the "New Workflow" button at the top-left of the Workflows Listing screen
This will open the Workflow Canvas interface where you can start building your workflow

How do I save my workflow?

Workflows can be saved in two ways:

Automatically through the autosave feature
Manually by clicking the Save button in the top bar of the Workflow Canvas

What are the different workflow statuses?

Common workflow statuses include:

DRAFT: Initial state of a new workflow
DEPLOYING: Workflow is in the process of being deployed
DEPLOYED: Workflow has been successfully deployed
FAILED: Deployment or execution has failed

How do I deploy and run a workflow?

Click the Deploy button in the top bar
Wait for the status to change to DEPLOYED
Once deployed, the Run button will become active
Click Run to initiate the workflow job

How can I debug issues in my workflow?

You can use the Console at the bottom of the screen which shows:

Output messages from nodes
Error messages at the node level
Specific node-related issues for debugging

How do I add nodes to my workflow?

There are two ways to add nodes:

Drag & Drop: Drag nodes from the left sidebar onto the Workflow Canvas
Right-click on the canvas and select nodes from the context menu

How can I navigate large workflows?

You can use several navigation features:

Zoom controls to zoom in/out
Fit to screen option
Mini-Map view at the bottom-right
Auto-arrange nodes feature
Outline view to see all nodes

Can I schedule workflows?

Yes, you can schedule workflows through the Workflow Configurations menu (gear icon) in the top bar, where you can specify when and how often the workflow should run.

Summary

Summarized the key points covered in the guide:
- Navigating to the Workflows Listing page
- Creating a new workflow and accessing the Workflow Canvas
- An overview of key Workflow Canvas features and functionalities
- Deployment, execution, and debugging techniques

With these insights, users are now equipped to create, deploy, and manage workflows efficiently using the Workflow Canvas.

Transform Node Workflows

Welcome to the Transform Node Workflows Overview! This guide will help users understand the features and benefits of Transform Node Workflows and learn how to create, configure, and deploy these workflows effectively.

Who is this guide for? This guide is designed for users of the Vue.ai platform.

Ensure access to the Vue.ai platform and familiarity with basic workflow concepts before starting.

Overview

Transform Node Workflows enable users to:

Create automated data processing pipelines.
Perform operations like filtering, joining, aggregating, and restructuring data.

Prerequisites Before beginning, ensure that:

The Vue.ai platform is accessible.
For more basic information on Workflows, please review the Getting Started with Workflows documentation.

Step-by-Step Instructions

Creating a Transform Node Workflow

Follow these steps to create a Transform Node Workflow:

Navigate to the Workflows Listing Page
- Go to Automation Hub → Workflow Manager → Workflows.
Create a New Workflow
- Click + New Workflow to create a new workflow canvas.
- To rename the workflow, click the Edit button and modify the name.
Build the Workflow
- Nodes can be added in two ways:
  1. Drag & Drop: Hover over the Nodes section, search for the required node, and drag it into the workflow canvas.
  2. Right-Click: Right-click on the workflow canvas, search for the node, and add it.
- Load a dataset by adding the Dataset Reader Node to the workspace.
- Transform Nodes include: SELECT, JOIN, GROUP BY, UNION, PARTITION, DROP, SORT.
  - SELECT: Extract specific columns or rows based on criteria.
  - JOIN: Merge rows from multiple tables using a related column.
  - GROUP BY: Group rows by specified columns, often used with aggregate functions.
  - UNION: Combine result sets from multiple queries, eliminating duplicates.
  - PARTITION: Divide the result set into partitions for window function operations.
  - DROP: Permanently remove a table, view, or database object.
  - SORT: Arrange the result set in ascending or descending order based on specified columns.
- Click on the node after adding it to the workflow canvas. Define the parameters for that node.
- Drag the end of one node to connect it to the start of another.
  Expected Outcome: Once all nodes are added and linked, the workflow structure is complete.
SpeedRun the Workflow
- This method serves as a trigger to execute workflows in synchronous mode using Pandas. It is designed for running lighter workloads, ensuring that the logic functions correctly by providing quick results for faster validation.
  1. Click the run icon on the sink node after each transformation to execute the speed run.
Deploy and Run the Workflow
- To modify the workflow configuration, click the gear icon at the top of the canvas.
- Select the engine (Pandas/Spark) in which the workflow needs to be deployed
- Click Deploy to initiate the deployment process.
- Once deployed, click Run to execute the workflow.
- Navigate to the Jobs Page to check the workflow job status.
Expected Outcome: The workflow is successfully deployed and executed.
Scheduling the Workflow
- Before deploying the workflow, modify its configuration by clicking the gear icon at the top of the canvas.
- You will find an option to schedule the workflow using either a daily format or a cron expression.
Expected Outcome: The workflow is successfully scheduled.

Troubleshooting

Common Issues and Solutions

Problem 1: Workflow Deployment Failure Cause: Nodes are not properly linked. Solution: Verify that all nodes are correctly linked before deploying.

Problem 2: Configuration Errors Cause: Incorrect node parameters. Solution: Verify the parameters of each node to avoid configuration errors.

Problem 3: Workflow Execution Failure Cause: Workflow errors. Solution: If a workflow fails, check the Job Status Page for error details.

Additional Information

Speed Run on the workflow can be performed on both the sample and the entire dataset by toggling the Use Sample Dataset option. Additionally, the number of records in the output can be configured in the workflow settings.
The output of workflows can be persisted as a dataset, which will be available on the datasets listing page once the workflow is executed. This option is available for all sink datasets, allowing you to specify the required file format and dataset name. Currently, supported formats include CSV, Delta, and Parquet.

Workflows support both batch and real-time processing. Advanced nodes can be used to implement ranking, partitioning, and custom logic.
It is recommended not to persist the dataset when performing a Speed Run on the workflow using a sample dataset, as a proper dataset will not be created without all the necessary resources.

The workflow's control flow follows the sequence in which nodes are added to the canvas.
Ensure appropriate node usage: Transform nodes and Custom Code nodes cannot be used together.
Keep node names concise and clear for better readability on the canvas, ensuring smoother workflow deployment.
Ensure that the workflow configuration (gear icon) is correctly set before deploying a Transform Node workflow.

Resources

Getting Started with Workflows Documentation

FAQ

What are Transform Nodes?

Transform nodes are specialized components that allow you to:

Filter, transform, and enrich data
Handle complex data manipulations
Combine data from multiple sources
Support both batch and real-time processing
Perform operations like joins, partitioning, and aggregations

How do I configure a Transform Node?

To configure a Transform node:

Drag the desired transform node onto the workflow canvas
Click on the node to open its configuration panel
Select the input dataset or source
Configure the transformation parameters (e.g., SELECT columns, JOIN conditions)
Save the configuration

What types of Transform operations are available?

Common transform operations include:

SELECT: Extract specific columns or rows
JOIN: Merge data from multiple tables
GROUP BY: Aggregate data based on columns
UNION: Combine multiple result sets
PARTITION: Divide data for window operations
DROP: Remove tables or columns
SORT: Order results by specified columns

How can I verify my Transform Node is working correctly?

You can verify your transform node by:

Running the workflow in test mode with sample data
Checking the node output in the Console tab
Viewing the transformed data preview
Monitoring the node status for any errors
Examining the logs for detailed execution information

Summary

This guide covered navigating to the Workflows Listing Page, creating a Transform Node Workflow, and deploying and running the workflow.

Custom Code Nodes Workflow

Welcome to the Custom Code Nodes Workflow guide! This guide will assist in understanding the flexibility provided by Custom Code Nodes within the Workflow Automation Hub and learning how to execute Python-based logic for custom data processing, transformation, and model training.

Who is this guide for? This guide is designed for users of the Workflow Automation Hub.

Ensure access to the Workflow Automation Hub, a registered dataset for input (if applicable), and basic knowledge of Python and Pandas before starting.

Overview

This guide serves as a comprehensive resource for:

Creating, configuring, and using Custom Code Nodes in workflows.
Ensuring the correct dataset formatting for seamless processing.

Prerequisites Before beginning, ensure the following requirements are met:

Access to the Workflow Automation Hub.
A registered dataset for input (if applicable).
Basic knowledge of Python and Pandas.

Step-by-Step Instructions

Adding a Code Node to the Workflow

To Create a Code Node that can further be used in the workflows please review the Create Custom Code Nodes documentation.

Two methods can be used to add a code node:

Drag and Drop Method
- Select the node from the left pane.
- Drag it onto the workflow canvas.
Right-Click Method
- Right-click on the canvas.
- Select the node from the context menu.
- Place it on the canvas.

Adding Code Node to Workflow

Configuring the Code Node

Once the node is added, it can be configured with the following parameters:

Name
- Enter a unique name (must be under 20 characters).
Description
- Optionally, provide a description for clarity.
Dataset
- Select the dataset to be processed by the node.

The node name must be less than 20 characters to avoid configuration issues. Provide meaningful descriptions to improve clarity.

Configuring Code Node

Saving the Configuration

Click the Add button to save the node configuration.

Running the Workflow

Once the workflow is set up, it can be executed in Speed Run mode by clicking the Play button on the sink node. After execution, the output should be reviewed to verify correctness.

Running the Workflow

Viewing Sample Output

After execution, a sample output can be viewed to confirm the correctness of the workflow.

View Sample Output

The persist dataset feature is not supported in the Speed Run mode. To persist dataset, use the Deploy Run mode. Speed Run mode is intended for quick verification of the workflow.

Speed run currently allows users to execute one node at a time. To run multiple nodes, execute them in sequence.

Speed Run Workflows with Sample Data

Speed Run enables users to execute workflows with sample data. This mode extracts a chunk of data from the selected dataset and executes the workflow, allowing for validation of logic, efficient testing and debugging.

To use sample data while performing speed run, make sure to check the Sample Data Run checkbox.

Speed Run With Sample Data

CSV Dataset Reader currently supports only CSV file format

Deploy the Workflow

Once the workflow is validated with speed run, it can be executed in Deploy Run mode

Click on Deploy button to initiate the deployment process
Once the workflow is deployed, Click the Run button to execute the workflow

Scheduling of custom code workflows is currently unavailable and will be enabled in future releases

Code Node Workflow Deploy

Refer to [section 4.9] regarding workflow deployment failures.

Workflow run will be triggered and Click on Yes, Redirect to check the job status

Code Node Job Redirect

Accessing Persisted Datasets

Once the job is completed, Click on the Sink Node to view the persisted dataset, it will redirect to the Datasets section of the Data Hub
All the persisted dataset are stored with workflow_id_node_id_epoch name

Code Node Persist Dataset

Ensure that Persist checkbox is checked before deploying the workflow

If there are any updates in the code node, the workflow must be undeployed and redeployed to reflect the changes. This ensures that the latest code changes are applied during execution.

Example Workflow

Consider you are building a model training workflow:

You have the training data registered.
You use the CSV Dataset Reader node to prepare the dataset, converting it into a Pandas DataFrame.
The Model node then reads this Pandas DataFrame and proceeds with training the model.

Example workflow

Workflow Deployment Failures:

Workflow Deployment Failures

Deployment failures can occur in two key scenarios:
- Case 1
  - If the deployment fails while setting up execution environment.
  - The reasons for these failures will be displayed on the workflow canvas next to the workflow name.
  - The users are allowed to reattempt DEPLOY since no deployment exists yet.
- Case 2
  - If the deployment initially succeeds but later encounters a pod failure, the existing setup becomes invalid.
  - In such cases, the reason for failure is displayed above the respective node, and the complete message can be viewed by hovering over it.
  - The correct action is to UNDEPLOY before making fixes and redeploying.

Example Failures

CrashLoopBackOff- Pod Terminated: Error : Indicates an issue with code execution, potentially due to missing or incorrect imports.
CrashLoopBackOff- Pod Terminated: OOMKilled : Indicates insufficient resources for execution. This can be addressed by updating the deployment configuration of the node.
Node: your_node not found in configs list (or) node not active : Indicates that the node is not yet active, likely due to the absence of an image. Either wait for the image to be built or commit a change to trigger the image build and activate the node.

Troubleshooting

Common Issues and Solutions

Problem 1: Node Configuration Errors Cause: The name is over 20 characters. Solution: Ensure the name is under 20 characters.

Problem 2: Dataset Issues Cause: The dataset is not correctly registered or formatted. Solution: Verify that the dataset is correctly registered and formatted.

Problem 3: Execution Errors Cause: Missing data or incorrect syntax. Solution: Check the logs for errors.

Problem 4: Exceeded Retry Limit Error Cause: Insufficient memory in the pods Solution: Increase the Memory Request and Memory Limit in the deployment config, then retry deploying the workflow

Additional Information

Support for Multiple Input Nodes Workflows support multiple input nodes with the help of CSV Dataset Readers, allowing the users to integrate multiple data sources in a single execution. Each CSV Dataset Reader can be configured to fetch data from a specific dataset which can be processed by the subsequent nodes in the workflow.

Code Node Workflow Multiple Inputs

The CSV Dataset Reader node formats registered datasets for use in custom code nodes. It converts datasets into a Pandas DataFrame, enabling further manipulation and processing. The CSV Dataset Reader node outputs a Pandas DataFrame that subsequent code nodes can process. This ensures the dataset is correctly structured for advanced data operations.

The workflow's control flow follows the sequence in which nodes are added to the canvas.
Ensure appropriate node usage: Transform nodes and Custom Code nodes cannot be used together.
Keep node names concise and clear for better readability on the canvas, ensuring smoother workflow deployment.
Ensure that the correct deployment configuration is specified for custom nodes in the workflow to enable a seamless deployment process.

Consider a machine learning workflow where: A registered dataset is used for training. The CSV Dataset Reader node converts it into a Pandas DataFrame. The Model Node reads the DataFrame and proceeds with training.

FAQ

What is the maximum length for a node name?

The node name must be under 20 characters to avoid configuration issues.

How do I pass a dataset to a custom code node?

Use the CSV Dataset Reader node to format the dataset as a Pandas DataFrame.

Can I use multiple code nodes in one workflow?

Yes, multiple code nodes can be used in a single workflow, depending on the complexity of the automation process.

Can I schedule code node workflows?

Scheduling of custom code workflows is currently unavailable and will be enabled in future releases.

Summary

Custom Code Nodes provide the flexibility to run custom logic in workflows.
These nodes can be added via drag and drop or by right-clicking on the canvas.
Configuration involves setting a name, description, and dataset.
The CSV Dataset Reader node ensures proper dataset formatting.
Execution is done via the Play button, with output verification available.
This structured approach ensures a smooth experience in setting up and using Custom Code Nodes within workflow automation.

Compute Node Workflows

Welcome to the Compute Node Workflows guide! This guide is designed to assist in understanding the role of Compute Node Workflows in automated document analysis and data extraction and learning how to create, configure, deploy, and execute a Compute Node Workflow.

Familiarity with the concept of Intelligent Document Processing (IDP) is assumed.

Overview

Compute Node Workflows are fundamental to Intelligent Document Processing (IDP), enabling:

Automation of model training
Dynamic dataset computation
Document segmentation
Intelligent content transformation
High-speed OCR and recognition
Real-time and batch processing

Prerequisites Before beginning, it is recommended that:

The "Getting Started with Workflows" documentation is thoroughly reviewed.

Navigation

To access the workflows, the following path is followed: Automation Hub → Workflow Manager → Workflows. This opens the Workflows listing page.

Step-by-Step Instructions

Creating a New Workflow

A new workflow is created by clicking + New Workflow, which opens the workflow canvas. The workflow can be renamed by clicking the Edit button and updating the name.

Building the Workflow

Nodes are added using:

Drag & Drop: Hover over the Nodes section, search for a node, and drag it onto the canvas.
Right-Click Menu: Right-click on the canvas, search for a node, and select it. Recently used nodes appear in the menu.
In the left panel under the Compute Nodes section, you will find the following preset nodes:
- Auto Classifier Training: Automates training for document classification models.
- Auto Classifier: Automatically categorizes documents based on learned patterns.
- Compute Dataset: Prepares datasets for analysis or training tasks.
- Dataset Metrics: Provides performance and quality metrics for datasets.
- Deskew: Corrects skew in document images for better readability and processing.
- Embedding Generation: Generates vector representations of document content for machine learning.
- ID Card Detection: Identifies and extracts information from ID cards.
- idp_sink: Serves as a destination node for processed data.
- Learn DocType: Learns and identifies document types based on input samples.
- OCR Multithreaded: Performs high-speed OCR with multithreading.
- Page Splitter: Splits multipage documents into individual pages.
- Textract: Extracts text and data using advanced OCR techniques.
- Trainconv: Trains models for conversational or document-specific tasks.
- Artifact: Manages and stores intermediate or final workflow artifacts.
- Section Generation Structured: Generates structured sections from documents.
- sec_clf_trig: Triggers section classification workflows based on rules.
- sec_clf_train: Trains models for accurate section classification.

Ensure all required parameters are correctly configured and all nodes are correctly linked.

Troubleshooting

Common Issues and Solutions

Problem: Issues while creating or executing a Compute Node Workflow Solution:

Ensure all required parameters are correctly configured.
Verify that all nodes are correctly linked.
Review the workflow's status in the dashboard for error messages.
Refer to the "Getting Started with Workflows" documentation for additional support.

Additional Information

Compute Node Workflows support real-time and batch processing, enabling seamless integration into existing automation pipelines. They provide robust performance for document classification, OCR, and machine learning-based transformations.

The workflow's control flow follows the sequence in which nodes are added to the canvas.
Ensure appropriate node usage: Transform nodes and Compute nodes cannot be used together.
Keep node names concise and clear for better readability on the canvas, ensuring smoother workflow deployment.

FAQ

What are Compute Node Workflows?

Compute Node Workflows are essential for Intelligent Document Processing (IDP), automating tasks like document classification, feature extraction, and embedding generation. They handle both structured and unstructured data efficiently.

How do I add nodes to a Compute Node Workflow?

You can add nodes in two ways:

Drag & Drop: Hover over the Nodes section, search for the desired node, and drag it onto the workflow canvas.
Right-Click Menu: Right-click on the canvas, search for the node, and select it to add. The menu also shows recently used nodes.

What types of nodes are available in Compute Node Workflows?

Available nodes include:

Auto Classifier Training
Auto Classifier
Compute Dataset
Dataset Metrics
Deskew
Embedding Generation
ID Card Detection
idp_sink
Learn DocType
OCR Multithreaded
Page Splitter
Textract
Trainconv
Artifact
Section Generation Structured
sec_clf_trig
sec_clf_train

How do I deploy and run a Compute Node Workflow?

Click Deploy to deploy the workflow.
Click Run to start the workflow.
Monitor the workflow's job status in the dashboard.

Summary

The guide covered navigation to the Workflows listing.
Creation of a Compute Node Workflow was explained.
Building, configuring, and linking nodes were detailed.
Deployment and execution of the workflow were outlined.

By following these steps, complex Intelligent Document Processing (IDP) tasks can be effectively automated using Compute Node Workflows.

Spark Node Workflows

Welcome to the Spark Node Workflows guide! This guide is designed to assist users in understanding and utilizing Spark Node Workflows on the Vue.ai platform.

Ensure a basic understanding of Spark and workflows, along with familiarity with Vue.ai's Workflow Manager, is possessed before starting. For introductory details, refer to Getting Started With Workflows.

Overview

Spark Node Workflows support a wide range of use cases, including:

Feature engineering
Geospatial analysis
Natural language processing
Scalable data warehousing

With its in-memory processing, Spark efficiently handles structured, semi-structured, and unstructured data at high speed. In workflow automation, Spark nodes enable seamless execution of data processing tasks within a unified, scalable engine.

Prerequisites Before beginning, ensure the following are understood:

Basic understanding of Spark and workflows
Familiarity with Vue.ai's Workflow Manager

Step-by-Step Instructions

Navigating to Node Types

Navigate to Node Types
- Navigate to: Automation Hub → Workflow Manager → Node Types
- This opens the Node Types Listing Page, where workflow nodes can be managed and configured.
Create a New Spark Node Type
- There are two ways to create a Spark Node Type:
  1. Click Create New Node Type to start from scratch.
  2. Enter the node type details.
  3. Set the Runtime as Spark (Note: The runtime cannot be changed later).
Commit Code to Git
- After creating a Spark Node Type:
  1. Navigate to Code Server in the left panel.
  2. Write code in the provided workspace.
  3. Use GitHub Actions to commit and push the code.
- Once committed, the new node type appears in the Node Types Listing Page.
Use the Spark Node in a Workflow
- Add a Spark Node to a Workflow
- Hover over the Nodes section, search for the required node, and drag it onto the Workflow Canvas.
- Alternatively, right-click on the Workflow Canvas and search for the node you want to add. This also shows a history of recently used nodes.
- Load a Dataset:
  - If you need to load a dataset in the workflow, search for the Dataset Reader Node.
  - After selecting the Dataset Reader Node, enter the dataset that needs to be loaded in its configuration.
- Link Nodes:
  - After adding the required nodes, you can connect them by dragging the end of one node and attaching it to the start of another.
  - Once completed, the workflow will display with the newly added Spark node.
Adding Multiple Spark Nodes in a Workflow
- Create new Spark Nodes with required code, following the previous steps.
- Drag and drop the new nodes into the workflow canvas.
- After adding the nodes, connect them by dragging the endpoint of one spark node to the start of another spark node.
- Passing data between nodes:
  - If you need to pass the output from one node to another, return the required output DataFrame (Spark DF) in the predecessor node. The next node can then read this output as its input.
Deploy and Run Workflow
- After creating your workflow, follow these steps to deploy and run it:
  - Click Deploy to deploy the workflow. The deployment status will be displayed.
  - Once the workflow is deployed, click Run to execute it.
  - After selecting Run, navigate to the workflow job page to monitor the execution.

Troubleshooting

Common Issues and Solutions

Problem 1: Deployment Failures Cause: Incorrect node configurations or missing dependencies. Solution: 1. Verify node configurations such as runtime settings and memory allocations. 2. Check for missing dependencies in the Spark environment.

Problem 2: Performance Optimization Cause: Inefficient resource configurations. Solution: 1. Adjust resource configurations such as executors, cores, and memory. 2. Enable Dynamic Resource Allocation to optimize scaling.

Problem 3: Execution Issues Cause: Unknown Solution: Debug using the Console tab in the Job Page to view node logs and payload details.

Additional Information

When creating a Spark Node Type, the following settings can be configured:

Number of Drivers – Sets the number of driver nodes.
Number of Executors – Defines the executor processes.
Executor Memory – Allocates memory per executor.
Executor Cores – Sets cores per executor.
Driver Memory – Allocates memory for the driver.
Dynamic Resource Allocation – Enables auto-scaling of executors.

The workflow's control flow follows the sequence in which nodes are added to the canvas.
Keep node names concise and clear for better readability on the canvas, ensuring smoother workflow deployment.
Spark Nodes currently support only a single large dataset as input.
Spark Nodes return only a dataset, as they are primarily designed for large-scale ETL (Extract, Transform, Load) processes.

FAQ

What are the key configurations for a Spark Node?

Important Spark node configurations include:

Number of Drivers: Controls the number of driver nodes
Number of Executors: Sets the number of executor processes
Executor Memory: Defines memory allocation per executor
Executor Cores: Specifies cores per executor
Driver Memory: Controls memory allocation for the driver
Dynamic Resource Allocation: Enables automatic scaling of executors

How do I optimize Spark Node performance?

To optimize your Spark node:

Configure appropriate memory settings based on data size
Set the right number of executors and cores
Enable dynamic resource allocation for variable workloads
Monitor executor usage and adjust accordingly
Use caching strategically for frequently accessed data
Partition data effectively for parallel processing

How do I troubleshoot Spark Node issues?

Common troubleshooting steps include:

Check the Console tab for error messages and stack traces
Verify resource configurations (memory, cores)
Monitor executor logs for performance bottlenecks
Ensure proper data partitioning
Review job progress in the Spark UI
Check for data skew or memory pressure issues

Summary

The guide covered the following:
- Navigating to the Node Types Listing Page
- Creating a new Spark Node Type
- Using the created Spark Node in workflows
- Deploying and running a Spark workflow

Nodes

The Automation Hub provides an extensive library of nodes for building comprehensive workflows. Nodes are organized into functional categories for easy discovery and use.

Preset Nodes

Datasets & Connectors Nodes

These nodes provide data ingestion and dataset management capabilities for workflow integration.

CSV Dataset Reader

Welcome to the CSV Dataset Reader guide! This guide will assist users in understanding the features and benefits of the CSV Dataset Reader node and learning how to set up and use the node effectively.

Who is this guide for? This guide is designed for users who are working with custom CSV Dataset Readers in a workflow.

Ensure access to a registered dataset and a basic understanding of Pandas for working with DataFrames before starting.

Overview

The CSV Dataset Reader node serves as a bridge between registered datasets and custom CSV Dataset Readers. It ensures that data is correctly formatted, enabling smooth workflow execution and simplifying complex data processing tasks. The node:

Takes the name of a registered dataset as input.
Converts the dataset into a Pandas DataFrame.
Outputs the DataFrame for use in subsequent custom code nodes.

Prerequisites Before beginning, the following are required:

A registered dataset for processing.
A custom CSV Dataset Reader in the workflow that requires a Pandas DataFrame as input.
Basic knowledge of Pandas for working with DataFrames.

Step-by-Step Instructions

Adding a CSV Dataset Reader to the Workflow

Two methods can be followed to add a CSV Dataset Reader:

Drag and Drop Method
- The node is selected from the left pane.
- It is then dragged onto the workflow canvas.
Right-Click Method
- The canvas is right-clicked.
- The node is selected from the context menu.
- It is then placed on the canvas.

Adding CSV Dataset Reader to Workflow

Configuring the CSV Dataset Reader

Once the node is added, it can be configured with the following parameters:

Name
- A unique name is entered (must be under 18 characters).
Description
- Optionally, a description is provided for clarity.
Dataset
- The dataset to be processed by the node is selected.

Configuring CSV Dataset Reader

The node name must be less than 18 characters to avoid configuration issues. Providing meaningful descriptions improves clarity.

Troubleshooting

Common Issues and Solutions

Problem 1: The dataset is not recognized Cause: The dataset may not be correctly registered in the system. Solution: Ensure that the dataset is correctly registered in the system.

Problem 2: The DataFrame is empty Cause: The dataset may not contain data before passing it to the node. Solution: Check if the dataset contains data before passing it to the node.

Additional Information

The CSV Dataset Reader node aids in structuring data for workflows involving machine learning, data preprocessing, and analysis. It eliminates the need for manual data conversion, making workflows more efficient.

FAQ

Can multiple datasets be used in a single CSV Dataset Reader node?

No, each CSV Dataset Reader node processes only one dataset at a time. Multiple nodes can be used if multiple datasets need to be handled.

What if the dataset is too large?

Consider using data sampling to test the workflow in speed run and deploy run the workflows to handle large datasets.

Summary

The CSV Dataset Reader node streamlines data formatting for workflows by converting registered datasets into Pandas DataFrames. It ensures compatibility with custom CSV Dataset Readers, enabling efficient data processing and analysis. By following the provided guidelines, users can seamlessly integrate this node into their workflows for various use cases, including machine learning and data transformations.

Data Ingress Gateway

Welcome to the Data Ingress Gateway Node guide! This guide will help users understand the purpose and capabilities of the Data Ingress Gateway Node and learn how to set up and use the node effectively for seamless data ingestion.

Who is this guide for? This guide is intended for users integrating a Data Ingress Gateway Node within a workflow to automate external data ingestion for analysis.

Overview

The Data Ingress Gateway Node facilitates seamless data ingestion by acting as a bridge between external data sources and processing pipelines. The node:

Connects to external data sources using a configurable connector.
Ingests and preprocesses incoming data for further analysis.
Ensures data consistency and formatting for downstream processing.
Optionally logs data ingestion details for monitoring and troubleshooting.

Prerequisites

Before using the Data Ingress Gateway Node, ensure the following:

Access to an external data source with appropriate permissions.
A properly configured connector for seamless data ingestion. Refer to Connection Manager for creating a connector.

Step-by-Step Instructions

Adding a Data Ingress Gateway to the Workflow

Two methods can be followed to add a Data Ingress Gateway:

Drag and Drop Method
- The node is selected from the left pane under Datasets & connectors section.
- It is then dragged onto the workflow canvas.
Right-Click Method
- The canvas is right-clicked.
- The node is selected from the context menu.
- It is then placed on the canvas.

Configuring the Data Ingress Gateway

Once the node is added, it can be configured with the following parameters:

Name
- A unique name is entered (must be under 18 characters).
Description
- Optionally, a description is provided for clarity.
Connection Name
- The required Connection Name must be selected from the provided list before execution

Adding Data Ingress Gateway Node Configuration

Output of the Node:

Upon successful execution, the node's output will include the Connection Run Summary, detailing the time taken and relevant run metrics.

Troubleshooting

Common Issues and Solutions

Problem 1: The connection is not recognized Cause: The connection may not be configured correctly as expected. Solution: Ensure that the source, destination, and connection are properly configured. Use the "Test Connection" option in the Connection Canvas to verify the setup.

Additional Information

The Data Ingress Gateway Node can be followed by a CSV Dataset Reader to load the data ingested by the Data Ingress Gateway Node.

FAQ

Are we allowed to modify the connection details using the Data Ingress Gateway Node?

No, the Data Ingress Gateway Node only allows selection of the connection to be used. Any modifications must be made in the Connection Manager.

Is the ingested data considered the output of the Ingress Node?

No, the Ingress Node serves as a conduit for data ingestion and does not generate an output in itself. Instead, the ingested data is forwarded to the dataset selected while setting up the connection in the Connection Manager.

Summary

The Data Ingress Gateway Node ingests external data through a connector, enabling seamless integration and preprocessing for analysis.

Control Flow Nodes

Control Flow nodes enable conditional logic, branching, and human-in-the-loop processes for complex workflow scenarios.

HITL Form

Welcome to the HITL Form guide! This guide will assist users in passing in varying types of inputs to workflows based on user choice.

Expected Outcome: By the end of this guide, an understanding of the HITL Form and its applications in the Vue.ai platform will be gained.

Overview

The HITL (Human-in-the-Loop) Node enables human validation and intervention within automated workflows. It allows users to review, modify, or approve data at critical decision points, ensuring higher accuracy and compliance with business requirements. This node is particularly useful in scenarios where automated processing alone may not be sufficient, such as handling ambiguous data, verifying critical outputs, or incorporating domain expertise. By integrating human oversight, the HITL Form enhances the reliability and trustworthiness of the workflow.

Prerequisites For a better understanding of Code Nodes and Workflows, it is recommended to review the Getting Started with Workflows documentation.

Navigation To begin, head to a Workflow Canvas to make use of the HITL Form: Home/Landing Page → Automation Hub → Workflow Manager → Workflows → New Workflow

Step-by-Step Instructions

Adding and Configuring the HITL Form

To integrate a HITL Form into the workflow, the following steps should be followed:

Add a HITL Form
- The HITL Form should be added to the workflow canvas.
- The HITL Form configurations can be opened by clicking on the HITL Form.

Add fields in HITL Form

Configure the HITL Form
- Using the Form Builder
  - Various input types can be added under the Add Fields section.

Added fields in HITL Form in config

- Once fields are added: - Entries of each field along with Labels & Type will appear. - Unnecessary fields can be deleted if not required. - The HITL form can be previewed.

Previewing HITL Form

- The Upload option allows for file uploads to S3 & returns the file path.

Using the JSON Schema View
- The HITL Form can be directly edited by viewing the exact JSON schema section at the top right of the node pane. This section also allows modification of the values for Dropdown Field Type.

JSON Schema view of the HITL Form

Accessing & Inputing to HITL Form in the Job screen

To give an input to a HITL Form in a job run, wait for the control flow to reach the node & then click on it.
This would open a smaller screen where each field can be given values based on user preference.
Finally, click on Submit to resume the Job

Giving inputs to HITL Form in a Job run

Available Fields for the HITL Form

Below is a list of available fields and their usage:

Field	Description	Example Usage Downstream
Checkbox	This is a simple boolean field	Flag based conditional checks
Input	This is a flexible text input string	Passing in custom inputs for processes
Dropdown	This is an input with predefined options	Case-based conditional checks
Upload	This is an option to upload a file	Files needed downstream based on user intervention

These field types provide flexibility to incorporate human validation, ensuring critical decisions are reviewed and refined as needed.

Troubleshooting

The workflow is not progressing past the HITL Form Possible Cause: Pending human validation or approval. Solution: Check if the workflow is waiting for user action. Ensure the HITL Form has an assigned reviewer, and the decision is submitted.

The HITL Form is not appearing in the workflow canvas Possible Cause: The node may not be enabled for the account. Solution: If the node is missing, check with the administrator to see if preset nodes are added.

The dropdown values are not appearing in the HITL Form Possible Cause: The dropdown options might not be configured correctly. Solution: Navigate to the Config section of the node and verify that the predefined options are correctly defined.

Additional Information

HITL Form Permissions

Ensure that the appropriate users have access to review and modify HITL Form decisions.
Workflow administrators may need to configure permissions based on business requirements.

Integration with Other Nodes

The HITL Form can be used in conjunction with other automation nodes to balance efficiency and accuracy.
Ensure that decisions made within the HITL Form are properly passed to subsequent nodes for execution.

Best Practices for Using HITL Forms

Use HITL Forms only in workflows where human intervention is necessary to avoid unnecessary delays.
Keep the number of fields minimal to ensure a streamlined review process.
Define clear guidelines for reviewers to ensure consistent decision-making.

FAQ

Do I need to use all the 4 fields present in the HITL Form?

No, any of the 4 field can be used as per the need.

Summary

This guide covered how to use the HITL Form to involve human validation and intervention in workflows.

HTTP Node

Welcome to the HTTP Node guide! This guide will assist users in sending different types of inputs to APIs within workflows based on user configurations.

Expected Outcome: By the end of this guide, an understanding of the HTTP Node and its applications in the Vue.ai platform will be gained.

Overview

The HTTP (Hypertext Transfer Protocol) Node enables seamless integration with external APIs within automated workflows. It allows users to send HTTP requests by providing the API URL, request method, headers, authentication keys, query parameters, and request body via input fields. This node supports making API calls using cURL or direct HTTP configurations, facilitating real-time data exchange between systems. The response from the API is returned in a structured JSON format, enabling further processing within the workflow. The HTTP Node is particularly useful for fetching external data, triggering third-party services, or interacting with web-based applications, enhancing the flexibility and connectivity of automated processes.

Prerequisites For a better understanding of Code Nodes and Workflows, it is recommended to review the Getting Started with Workflows documentation.

Navigation To begin, the following path should be navigated: Home/Landing Page → Automation Hub → Workflow Manager → Workflows → New Workflow

How to Configure the HTTP Node in the Workflow

The HTTP Node allows users to make API requests by specifying the URL, method, headers, query parameters, route parameters, and request body. This enables seamless integration with external services.

Here's a structured step-by-step guide for using the HTTP Node in a workflow:

Step 1: Add the HTTP Node to the Workflow

Drag and drop the HTTP Node from the node panel onto the workflow canvas.
Click on the HTTP Node to open the configuration panel on the right side.

Adding HTTP Node to the Workflow Canvas

Step 2: Configure the HTTP Request

The HTTP Node provides two primary ways to configure the API request:

Option 1: Using Structured Input Fields

Select HTTP Object from the dropdown and give your inputs as below:
- URL: Enter the endpoint you want to call.
- Method: Select the HTTP method (GET, POST, PUT, DELETE, etc.).
- Data: Provide the request body (if applicable) in JSON format.
- Headers, Query Params, and Route Params:
  - Click Add Item under each section to include key-value pairs.
  - Use headers for authentication, content-type, etc.
  - Query parameters help refine API calls.
  - Route parameters define dynamic parts of the endpoint.

HTTP Node - Adding API Request fields by using HTTP Objects dropdown

Option 2: Using cURL Input

Alternatively, you can use a cURL command to configure the HTTP request:

Select HTTP Curl from the dropdown.
Paste your cURL command into the provided text box.
The code node will parse the cURL request and execute it.

HTTP Node - Adding API Request cURL by using HTTP Curl dropdown

Step 3: Review and Modify JSON Schema

Once fields are configured, you can check the JSON schema under the Config section.
Modify dropdown field types or adjust parameters as needed.

HTTP Object - Review request in the Config section HTTP Curl - Review request in the Config section

Step 4: Execute the API Request

After configuring the request, click on Add at the right bottom of the panel to add and create a sink data node. Speed Run / Deploy Run to execute it and receive a JSON-formatted response.
The response will contain the API's output, which can be further processed in the workflow.

HTTP Node - Speed Run Workflow

Troubleshooting

1. The HTTP Node is not appearing in the workflow canvas Possible Cause: The node may not be enabled for the account. Solution: If the node is missing, check with the administrator to see if preset nodes are added.

2. API request is failing with a 401 Unauthorized error Possible Cause: Missing or incorrect authentication credentials. Solution: Verify that the correct API key, token, or credentials are provided in the headers or authentication fields.

3. API response returns an empty or unexpected result Possible Cause: Incorrect query parameters, request body, or endpoint. Solution: Double-check the API documentation and ensure the parameters match the expected format.

4. The response format is incorrect or unreadable Possible Cause: API returns data in a different format (XML, plain text, etc.). Solution: Check the API's Content-Type and use a transformation node if necessary to parse the response.

5. API call is timing out Possible Cause: Slow API response or network issues. Solution: Increase the timeout setting, optimize the request payload, or check for API performance issues.

6. API call works in Postman but fails in the workflow Possible Cause: Differences in headers, authentication, or request body formatting. Solution: Compare request details and ensure they match what works in Postman, including headers and payload structure.

Additional Information

Best Practices for Using HTTP Nodes

Use the HTTP Node efficiently to integrate APIs and automate processes without unnecessary API calls.
Minimize the number of query parameters and headers to optimize request performance.
Ensure API authentication is securely managed to prevent unauthorized access.
Validate API responses to handle errors and unexpected data effectively.
Structure API calls in a way that enhances workflow efficiency and minimizes execution delays.

Integration with Other Nodes

The HTTP Node can be used in conjunction with other automation nodes to balance efficiency and flexbility of retrieving data from external environments.
Ensure that the API's response within the HTTP Node is properly utilized to retrieve relevant information and automate subsequent tasks within the workflow.

FAQ

Can I dynamically set query parameters, headers, or request bodies?

Yes, you can use variables or workflow outputs to dynamically populate fields in the HTTP Node, allowing flexible API calls based on workflow data.

Summary

This guide covered how to use the HTTP Node to send API requests and retrieve external data for workflow automation.

Branching Node

Welcome to the Branching Node guide! This guide will assist users in understanding how to use the Branching Node to make decisions based on previous node values and utilizing the Branching Node to allow workflows to take different execution paths depending on specified conditions.

Who is this guide for? This guide is designed for users who need to implement conditional logic in workflows on the Vue.ai platform.

Overview

The Branching Node enables dynamic decision-making within workflows by evaluating the output of a preceding node and directing execution along different paths based on predefined conditions. Similar to an if-else statement, it allows workflows to adapt to varying inputs, ensuring that the appropriate actions are taken based on the context. This enhances automation flexibility, optimizes processing efficiency, and supports complex logic by enabling conditional execution at key decision points.

Prerequisites Before using the Branching Node, it is recommended that the following documentation be reviewed for a better understanding of workflows and related components:

Getting Started with Workflows – Provides foundational knowledge about workflows and their structure.
Code Node Creation Guide – Explains how to create a Code Node that can send values as input for decision-making.
HITL Node Documentation – Describes how the HITL Node can be used for human-in-the-loop validation.

Navigation To begin, head to a Workflow Canvas to make use of the Branching Node: Home/Landing Page → Automation Hub → Workflow Manager → Workflows → New Workflow

Step-by-Step Instructions

Adding and Configuring the Branching Node

To integrate a Branching node into the workflow, the following steps should be followed:

Add a Branching Node
- Drag and drop the Branching Node onto the workflow canvas from the control panel.
- Connect the Branching Node to a preceding node, such as a Custom Code Node or a HITL Node.

Branching Node in Workflow Canvas

Configure the Branching Node
- Using the Form Builder
  - Click on Add Item to create a Condition in Branching Node.
  - Specify the Field to be evaluated.
  - Choose the Comparison Operator
    - Select one of the available operators: ==, !=, >, <, >=, <=, in, isin, beginswith, endswith, like, is_empty.
  - Set the value to be compared against.

Input Configuration Form

Using the JSON Schema View
- The Branching Node Form can be directly edited by viewing the exact JSON schema section at the top right of the node pane.

Input JSON Schema

Field and Value Format Requirements Based on the Preceding Node
- HITL Node
  - If the HITL Node has an input with the label action, it should be referenced in the Branching Node using {{data.0.action}} as the field.
- Custom Code Node
  - If the Custom Code Node returns a dictionary with a key like {"action": "Success"}, this can be referenced in the same way as {{data.0.action}} to be used as field.

Input Config Example

Defining Multiple Conditions (if-elif-else Logic)
Multiple conditions can be added, and these will be evaluated sequentially:
- The first condition that matches will execute its corresponding action.
- If it does not match, the next condition will be checked, and so on.

Multiple Conditions Config Example

Connecting Subsequent Nodes
For each condition added in the Branching Node, connect the corresponding endpoint to the next required node, which could be a Custom Code Node or other preset nodes. The workflow will dynamically follow the designated path based on the evaluated condition, ensuring smooth execution.

Make sure that every condition, including the else clause, is linked to a successor node to ensure proper execution.

Troubleshooting

Workflow Failure in the Branching Node

Possible Cause: The specified field name in the Branching Node conditions might be incorrect. Solution:

Verify that the condition logic and input field name match the output key from the preceding node.
Confirm the data format is structured as expected.

Unexpected Execution Path in the Branching Node

Possible Cause: An incorrect comparison operator or mismatched value may have been used. Solution:

Validate that the specified value is accurate and the appropriate comparison operator is applied, or it will default to the else clause of the branching node.

Additional Information

Branching Node Permissions

Ensure that the appropriate users have access to review and modify Branching Node Conditions.
Workflow administrators may need to configure permissions based on business requirements.

Best Practices for Using Branching Node Use Descriptive Field Names: When configuring conditions, ensure that the field names are clear and align with the output from preceding nodes. Optimize Condition Order: Arrange conditions in a logical sequence to minimize unnecessary evaluations and improve workflow efficiency.

FAQ

Can multiple conditions be defined in the Branching Node?

Yes, the Branching Node allows multiple conditions, which are evaluated sequentially in an if-elif-else manner.

Can the Branching Node have multiple active paths?

No, the Branching Node follows a single execution path at a time. Conditions are evaluated in sequence, and once a matching condition is found, the corresponding path is taken while the others are ignored.

Which node types can be connected after a Branching Node?

The Branching Node can be followed by any node type, including Custom Code Nodes, HITL Nodes, and other preset nodes.

Summary

This guide covered the functionality of the Branching Node in Vue.ai workflows, including its role in decision-making based on the defined conditions. It explained how to configure conditions, connect preceding and subsequent nodes, and troubleshoot common issues.

Trigger Node

Welcome to the Trigger Node guide! This guide will assist users in understanding the Trigger Node, its functionality, and how it enables seamless workflow execution within the Vue.ai platform.

Expected Outcome By the end of this guide, you will have a clear understanding of the Trigger Node, its role in automating workflows, and how it facilitates workflow transitions within the Vue.ai platform.

Overview

The Trigger Node in the Vue.ai platform enables seamless workflow automation by initiating a new workflow from an existing one without relying on specific conditions or events. It allows users to link workflows together, ensuring smooth transitions and eliminating the need for manual intervention. By automating workflow execution, the Trigger Node enhances efficiency, scalability, and flexibility, making it easier to manage complex processes while reducing repetitive tasks.

Prerequisites For a better understanding of workflows, it is recommended to review the Getting Started with Workflows documentation.

Navigation To begin, the following path should be navigated: Home/Landing Page → Automation Hub → Workflow Manager → Workflows → New Workflow

Step-by-Step Instructions

Build Your Workflow – Start by adding the required code nodes to construct your workflow.
Add the Trigger Node – Drag and drop the Trigger Node into your workflow and connect it to the node from which you want to trigger another workflow.
- Multiple trigger nodes can be added in a workflow.
Configure the Trigger Node – In the Trigger Node panel, select the workflow that should be triggered. You can also choose whether to enable the "Wait Till Job Completes" option:
- Enabled – The current workflow will pause until the triggered workflow finishes execution.
- Disabled – The current workflow will continue running without waiting for the triggered workflow to complete.
Save and Deploy – Click the Save button and deploy your workflow.

Trigger Node Trigger node along with branching node

Troubleshooting

Common Issues and Solutions

Problem 1: If your workflow doesn't show up in the trigger workflow listing panel
- Solution: Click on the refresh button to ensure the latest workflows show up there.
Problem 2: Trigger node is in progress for a long time
- Solution: Ensure the workflow that needs to be triggered is deployed.

Additional Information

Note:

The workflow being triggered must also be deployed for successful execution.
Speed Run doesn't work in workflows where trigger nodes are involved.

FAQ

Can multiple Trigger nodes be attached to a single code node?
- Yes, multiple trigger nodes can be present in a workflow and also attached to a single code node.
Do Trigger nodes work in Speed Run?
- No, the workflow has to be deployed for it to work.
Can Trigger node be used along with any custom code node?
- Yes, it is possible to use it with a custom code node.

Summary

This guide covered the following:

Clear understanding of the Trigger Node.
How to build workflows using Trigger Node.

ML/DS Nodes

Machine Learning and Data Science nodes provide automated model training, inference, and data preprocessing capabilities.

AutoML - Data Preprocessor

Welcome to the AutoML - Data Preprocessor guide! This guide will assist users in understanding the features and benefits of the AutoML - Data Preprocessor node and learning how to set up and use the node effectively.

Who is this guide for? This guide is designed for users who are working with custom AutoML - Data Preprocessors in a workflow.

Ensure access to a registered dataset and a basic understanding of data preprocessing for ML models.

Overview

The AutoML - Data Preprocessor node refines input data by handling missing values, encoding categorical variables, scaling numerical features, and applying necessary transformations to optimize model performance. The node:

Converts raw data into a structured Pandas DataFrame.
Outputs the processed DataFrame for seamless integration with custom Model Training Nodes.

Prerequisites Before beginning, the following are required:

A registered dataset for processing.
A custom AutoML - Data Preprocessor in the workflow that requires a Pandas DataFrame as input.
Knowledge of Data Preprocessing for ML models

Adding AutoML - Data Preprocessor to the Workflow

Two methods can be followed to add a AutoML - Data Preprocessor:

Drag and Drop Method
- The node is selected from the left pane.
- It is then dragged onto the workflow canvas.
Right-Click Method
- The canvas is right-clicked.
- The node is selected from the context menu.
- It is then placed on the canvas.

Adding AutoML - Data Preprocessor

Configuring the AutoML - Data Preprocessor

Once the node is added, it can be configured with the following parameters:

Name
- A unique name is entered (must be under 18 characters).
Description
- Optionally, a description is provided for clarity.
ignore_columns_for_training
- Specifies columns that should be excluded from training the machine learning model. These might include identifiers, metadata, or features not relevant to the prediction task.
fit_numerical_to_categorical
- Lists numerical columns that should be treated as categorical features. This is useful when certain numbers represent categories rather than continuous values.

preproc_steps Defines a sequence of preprocessing steps applied to the dataset before model training. Each step includes:

Step Attributes

step - The type of preprocessing (e.g., imputation, encoding, scaling, handling skewness, or managing outliers).
method - The specific approach used for the selected preprocessing step.
columns_to_include - Specifies which columns the preprocessing step should be applied to.

Step Options:

Step	Method	Description
Impute	`mean`	Replaces missing values with the mean of the column.
	`median`	Replaces missing values with the median of the column.
	`mode`	Replaces missing values with the most frequently occurring value in the column.
Encode	`label`	Assigns a unique integer to each category. Suitable for ordinal categorical variables.
	`one_hot`	Creates binary columns for each category, representing presence (1) or absence (0). Ideal for nominal categorical data.
Scale	`standard`	Transforms data to have a mean of 0 and a standard deviation of 1, useful for models sensitive to scale.
	`min_max`	Rescales data to a fixed range, typically [0,1], preserving relative distances between values.
Skew	`yeo_johnson`	Handles both positive and negative skewness without requiring non-negative values.
	`cube_root`	Applies the cube root transformation, reducing right-skewed distributions.
	`exponential`	Raises values to an exponent, which can be used to compress large values.
	`absolute`	Converts all values to their absolute form, reducing the impact of negative values.
	`square`	Squares the values, amplifying larger values and potentially reducing left-skewed distributions.
Outlier	`handle`	Modifies outliers, such as capping extreme values to a predefined threshold.
	`drop`	Removes rows containing outliers, ensuring they do not influence model training.

persist
- A boolean which indicates whether the processed pipeline and steps should be saved for reuse, ensuring consistency during model inference.

Configuring AutoML - Data Preprocessor

Output

If persist is enabled:

pipeline_path – File path where the trained model pipeline is saved for future use.
preproc_config_path – Path storing preprocessing settings to maintain consistency in data transformation.
processed_data – The dataset after preprocessing, prepared for training or inference.

If persist is not enabled:

processed_data – The dataset after preprocessing, prepared for training or inference.

Avoid including target variable in data preprocessing steps

Best Practice of using AutoML Nodes: Vue.ai provides two approaches for AutoML training and inference:

Case 1: Standard Workflow

Training: Use Vue.ai's built-in data preprocessing and AutoML trainer.
Inference:
- Generate predictions using Vue.ai's AutoML inference code.
- Store the results as a dataset and load them into a notebook via the SDK for further analysis.

Case 2: Custom Workflow

Training: Perform custom data preprocessing before training with Vue.ai's AutoML trainer.
Inference:
- Develop custom inference code.
- Ensure test data undergoes the same preprocessing steps as the training data.

Troubleshooting

Common Issues and Solutions

Problem 1: The dataset is not recognized Cause: The dataset may not be correctly registered in the system. Solution: Ensure that the dataset is correctly registered in the system.

Problem 2: If different preprocessing steps are applied to overlapping column sets without clear rules, it can lead to unintended feature transformations Cause: Inconsistent Column Selection in preprocessing steps. Solution: Carefully choose columns during preprocessing.

Additional Information

The AutoML - Data Preprocessor node aids in structuring data for workflows involving machine learning, data preprocessing, and analysis. It eliminates the need for manual data conversion, making workflows more efficient.

All preprocessing steps, such as handling missing values, encoding categorical variables, and scaling numerical features, are performed on the entire dataset before it is split into training and testing sets. This ensures consistency in feature transformations and prevents data leakage.

FAQ

Can multiple datasets be used as input to AutoML - Data Preprocessor node?

No, each AutoML - Data Preprocessor node processes only one dataset at a time. Multiple nodes can be used if multiple datasets need to be handled.

What if the dataset is too large?

Consider using data sampling to test the workflow in speed run and deploy run the workflows to handle large datasets.

Summary

The Data Preprocessing Node in machine learning transforms raw data into a structured format by handling missing values (imputation), encoding categorical variables, scaling numerical features, correcting skewness, and managing outliers. It also excludes irrelevant columns and ensures consistency in data processing. These steps improve model performance by enhancing data quality, reducing bias, and preventing inconsistencies in training and evaluation.

AutoML - Model Trainer

Welcome to the AutoML Trainer Node guide! This guide will help users understand the features and benefits of the AutoML Trainer Node, learn how to set up and use the node effectively for training models. Vue provides two preset Model Trainer Nodes: one for Regression models and another for Classification models. Additionally, Vue provides tailored traditional regression and classification models as distinct presets.

Who is this guide for? This guide is intended for users integrating an AutoML Trainer Node within a workflow to automate machine learning model training.

Ensure access to a preprocessed dataset and a basic understanding of model training techniques for optimal results.

Overview

The AutoML Trainer Node streamlines model training by selecting the best algorithm, tuning hyperparameters, and optimizing performance based on input data. The node:

Automates model selection and hyperparameter tuning.
Trains multiple models and selects the best-performing one.
Optionally logs the trained model, metrics, and artifacts to MLflow for streamlined tracking and further evaluation.

Prerequisites

Before using the AutoML Trainer Node, ensure the following:

A preprocessed dataset formatted as a Pandas DataFrame.
An AutoML Trainer Node integrated into the workflow for automated model training.
Basic knowledge of various machine learning models, techniques, and evaluation metrics.

Adding AutoML Model Trainer to the Workflow

Two methods can be followed to add a AutoML Model Trainer:

Drag and Drop Method
- The node is selected from the left pane.
- It is then dragged onto the workflow canvas.
Right-Click Method
- The canvas is right-clicked.
- The node is selected from the context menu.
- It is then placed on the canvas.

Adding AutoML Model Trainer

Input Requirements This node supports the following types of datasets:

Fully Processed Datasets: These are preprocessed datasets that are ready for model training. You can preprocess datasets in Jupyter Notebooks available within the Developer Hub.
- Refer to Vue.ai Notebooks User Guide for more information on Notebooks.
Raw Datasets: These datasets require preprocessing, which can be handled directly within the workflow using available preprocessing nodes.

Configuring the AutoML Model Trainer

Once the node is added, it can be configured with the following parameters:

Parameter	Description	Default & Accepted Values
Name	A unique name (must be under 18 characters).	None (String, max 18 chars)
Description	Optional description for clarity.	None (String)
Target Column	Specifies the dependent variable (output).	None (Column Name)
Include Features	Number of features to select (`all` for all).	all (Integer ≥1 or all)
Validation Split Size	Proportion of data used for validation.	0.2 (Float between 0 and 1)
Number of CV Folds	Number of cross-validation folds.	None (Integer ≥2)
Ensemble	Use ensemble techniques to improve performance.	False (True or False)
Stacking	Enable stacking of multiple models.	False (True or False)
Tune	Perform hyperparameter tuning.	False (True or False)
Include Models	Models to be considered for training.	All available (List of models below)
Focus	Key evaluation metric for optimization.(See Below)	r2 (Regression) or accuracy (Classification)
Register Model	Register the trained model in MLflow.	False (True or False)
Experiment Name	Experiment name for tracking the model.	None (String)

Include Models

Specifies the models to be considered during training.

Available Regression Models

Linear Regression
K-Nearest Neighbors Regressor
Lasso Regression
Decision Tree Regressor
Random Forest Regressor
XGBoost Regressor
Support Vector Regressor
Stochastic Gradient Descent Regressor
Ridge Regressor
Multi-Layer Perceptron Regressor
Poisson Regression
Elastic Net Regression

Available Classification Models

K-Nearest Neighbors Classifier
Naive Bayes Classifier
Decision Tree Classifier
Random Forest Classifier
XGBoost Classifier
Support Vector Classifier
Stochastic Gradient Descent Classifier
Multi-Layer Perceptron Classifier
Ridge Classifier

Focus

Specifies the key evaluation metric used to optimize the model's performance. Helps in guiding the selection of the best model configuration.

Available Focus Metrics For Regression Tasks (Defaults to r2)

r2
mean_absolute_error
mean_squared_error
root_mean_squared_error
explained_variance
mean_absolute_percentage_error

Available Focus Metrics For Classification Tasks (Defaults to accuracy)

precision
recall
accuracy
f1

The Focus metric and Number of CV Folds is taken into account only when Tuning is enabled

Experiment Name is required when Register Model is enabled

Adding AutoML Model Trainer Node Configuration

Output For each model listed in Include Models, the following outputs are provided:

MLflow Run Link: Provides direct access to the registered model and its metadata for tracking, analysis, and reuse.
Metrics and Model Artifact Path: Enables retrieval of model details and artifacts for further evaluation or deployment when Register Model is enabled.
Metrics Only: If Register Model is not enabled, the output includes only performance metrics for evaluation.

For more information about model artificats and mflow refer to MLOps : Experiment and Model Tracking Flow

Artifacts listed in Mlflow

Best Practice of using AutoML Nodes: Vue.ai provides two approaches for AutoML training and inference:

Case 1: Standard Workflow

Training: Use Vue.ai's built-in data preprocessing and AutoML trainer.
Inference:
- Generate predictions using Vue.ai's AutoML inference code.
- Store the results as a dataset and load them into a notebook via the SDK for further analysis.

Case 2: Custom Workflow

Training: Perform custom data preprocessing before training with Vue.ai's AutoML trainer.
Inference:
- Develop custom inference code.
- Ensure test data undergoes the same preprocessing steps as the training data.

Troubleshooting

Common Issues and Solutions

Problem 1: The dataset is not recognized Cause: The dataset may not be correctly registered in the system. Solution: Ensure that the dataset is correctly registered in the system.

Problem 2: Model training is taking too long Cause: Large dataset, excessive hyperparameter tuning, or complex models. Solution: Reduce dataset size, limit tuning space, increase node deployment resources or choose simpler models.

Problem 3: Model performance is unexpectedly low Cause: Poor feature selection, incorrect hyperparameters, or data imbalance. Solution: Perform feature engineering, tune hyperparameters, and balance the dataset using custom workflows or jupyter notebook

Additional Information

The AutoML Model Trainer node aids in structuring data for workflows involving machine learning, data preprocessing, and analysis. It eliminates the need for manual data conversion, making workflows more efficient.

FAQ

Can multiple datasets be used as input to the AutoML Model Trainer node?

No, each AutoML Model Trainer node processes only one dataset at a time. Multiple nodes can be used if multiple datasets need to be handled.

What if the dataset is too large?

Consider using data sampling to test the workflow in speed run and deploy the workflow to handle large datasets.

Do I need to preprocess my data before using AutoML?

Vue.ai's AutoML trainer includes built-in preprocessing, but for custom workflows, you may need to perform preprocessing steps such as handling missing values, encoding categorical features, or normalizing numerical data.

Can I specify which features to include in the model?

Yes, you can define the number of features to select using the Include Features parameter. You can specify an integer value to select the top-ranked features or use all to include all features.

Does AutoML support hyperparameter tuning?

Yes, AutoML includes an option for hyperparameter tuning. Enabling the Tune parameter will automatically optimize model configurations for better performance.

How does AutoML handle model evaluation?

AutoML evaluates models using predefined metrics. You can specify an evaluation metric using the Focus parameter (e.g., accuracy for classification or r2 for regression).

Can trained models be registered for tracking?

Yes, AutoML supports model registration. Enabling the Register Model option will store the trained model in MLflow for tracking, evaluation, and reproducibility.

What happens if the target column is missing?

The model training process will fail. Ensure the target column exists in the dataset before proceeding.

Can I enable both ensemble and stacking?

Yes, but stacking requires a large dataset to avoid overfitting. Ensure model diversity for effective ensemble learning.

Why is my model taking too long to train?

This may be due to hyperparameter tuning (tune = true), a large dataset, or complex models like XGBoost Regressor. Consider reducing the search space or using fewer models.

What if I forget to provide an experiment name when registering a model?

If register = true, an experiment name is required. Otherwise, the model registration process will fail.

Summary

The AutoML - Model Trainer Node trains selected models using the scikit-learn library, analyze performance metrics, and optionally log the model, metrics, and artifacts to MLflow for streamlined tracking and management.

AutoML Inference

Welcome to the AutoML Inference Node guide! This guide will help users understand the purpose and capabilities of the AutoML Inference Node and learn how to set up and use the node effectively for generating predictions.

Who is this guide for? This guide is intended for users integrating an AutoML Inference Node within a workflow to automate predictions using trained machine learning models.

Ensure access to a trained model and a properly formatted dataset for accurate predictions.

Overview

The AutoML Inference Node simplifies the prediction process by leveraging a pre-trained model to generate outputs based on new input data. The node:

Loads a trained model for inference.
Processes new data using the stored preprocessing pipeline in AutoML - Data Preprocessor node and generates predictions efficiently.
Optionally logs inference results to MLflow for tracking and analysis.

Prerequisites

Before using the AutoML Inference Node, ensure the following:

A trained machine learning model saved using mlops service.
A properly formatted dataset for making predictions.
An AutoML Inference Node integrated into the workflow to automate inference.
Basic understanding of model deployment and evaluation.

Adding AutoML Inference to the Workflow

Two methods can be followed to add a AutoML Inference:

Drag and Drop Method
- The node is selected from the left pane.
- It is then dragged onto the workflow canvas.
Right-Click Method
- The canvas is right-clicked.
- The node is selected from the context menu.
- It is then placed on the canvas.

Adding AutoML Inference

Input Requirements This node supports the following types of datasets:

Preprocessed Datasets: These datasets have already undergone feature engineering and transformation, making them ready for inference. You can preprocess datasets in Jupyter Notebooks available within the Developer Hub.
- Refer to Vue.ai Notebooks User Guide for more information on Notebooks.
- Ensure the dataset follows the same preprocessing steps used during model training for accurate predictions.
Raw Datasets: If the dataset is not preprocessed, it must be transformed to match the model's expected input format. This can be achieved in one of the following ways:
- Using preprocessing nodes within the workflow.
- Providing a custom saved pipeline path.
- Applying the preprocessing pipeline saved during model training.

Configuring the AutoML Inference

Once the node is added, it can be configured with the following parameters:

Name
- A unique name is entered (must be under 18 characters).
Description
- Optionally, a description is provided for clarity.
Experiment Name
- The specific trained model to be used for inference.
Model Name
- This field determines the number of features to select. It accepts either an integer or all (default). If an integer is provided, a feature selection method ranks and selects the top n features based on their importance.

Experiment Name and Model Name can be retrieved from the AutoML Trainer Node Output.

Use Preprocessor Pipeline
- If enabled, applies the preprocessing pipeline saved in AutoML - Data Preprocessor node.
Get Object Paths from User
- If enabled user should provides paths for:
  - Model Path (model_path) – Trained model file.
  - Preprocessor Path (preprocessor_path) – Saved preprocessing pipeline.
  - Preprocessor Config Path (preprocessor_config_path) – Pipeline config file.

If neither Use Preprocessor Pipeline nor Get Object Paths from User is selected, the provided input dataset will be used directly for inference with the trained model.

Adding AutoML Inference Node Configuration

Output The output consists of the inference results generated by the trained model. This includes:

Predictions: The model's output based on the provided test data.

Best Practice of using AutoML Nodes: Vue.ai provides two approaches for AutoML training and inference:

Case 1: Standard Workflow

Training: Use Vue.ai's built-in data preprocessing and AutoML trainer.
Inference:
- Generate predictions using Vue.ai's AutoML inference code.
- Store the results as a dataset and load them into a notebook via the SDK for further analysis.

Case 2: Custom Workflow

Training: Perform custom data preprocessing before training with Vue.ai's AutoML trainer.
Inference:
- Develop custom inference code.
- Ensure test data undergoes the same preprocessing steps as the training data.

Troubleshooting

Common Issues and Solutions

Problem 1: The dataset is not recognized Cause: The dataset may not be correctly registered or formatted as expected. Solution: Ensure the dataset is properly registered and matches the expected format.

Problem 2: Inference is taking too long Cause: Large dataset size, complex model, or insufficient computing resources. Solution: Reduce dataset size, optimize model selection, and increase node deployment resources if needed.

Problem 3: Model predictions are inaccurate Cause: Mismatched preprocessing, incorrect input format, or outdated model. Solution: Verify preprocessing steps, ensure input data is in the correct format, and use an updated model version.

Additional Information

The AutoML Inference Node automates model predictions by streamlining input preprocessing and inference execution, eliminating the need for manual intervention.

FAQ

Can multiple datasets be used as input to the AutoML Inference node?

No, the AutoML Inference Node processes one dataset at a time. Use multiple nodes if multiple datasets need to be handled.

What if the dataset format does not match the trained model?

Ensure the dataset is preprocessed using the same pipeline as used during training. Use preprocessing nodes or provide a saved preprocessing pipeline.

What happens if I don't provide a saved preprocessing pipeline?

If the model requires preprocessing, you must either apply it within the workflow or specify the saved pipeline path. Otherwise, the inference may fail.

Can I perform inference on raw datasets?

Yes, but preprocessing must be applied first using either workflow preprocessing nodes or a saved preprocessing pipeline.

Why is my model producing unexpected results?

Check if the input data matches the expected schema and ensure the correct preprocessing pipeline is applied before inference.

How can I speed up inference?

Reduce dataset size, use optimized model versions, and ensure sufficient computational resources are allocated.

What if the model path is incorrect or missing?

Inference will fail. Ensure the correct model path is provided, either manually or via an automated pipeline.

Summary

VizQL Nodes

VizQL nodes provide comprehensive data manipulation capabilities with SQL-like operations for data transformation and analysis.

Select Node

Welcome to the Select Node guide! This guide will assist users in retrieving specific columns from a dataset, renaming a column while selecting it, and adding a new column using expressions.

Expected Outcome: By the end of this guide, you will gain an understanding of the Select Node and its applications in the Vue.ai platform.

Overview

The Select Node is utilized to refine and streamline a dataset by choosing specific columns for further processing. It enables the extraction of relevant data by selecting only the necessary columns, effectively reducing data clutter and optimizing workflow efficiency. With the capability to rename columns during selection, the Select Node ensures clarity and consistency in the dataset. Additionally, the Select Node enables the creation of new columns using Python expressions, providing flexibility in data transformation. This node is essential for organizing and preparing data for analysis or integration with other nodes in the workflow.

Prerequisites For a better understanding of Transform Nodes, it is recommended to review the Getting Started with Workflows documentation.

Navigation To begin, the following path should be navigated: Home/Landing Page → Automation Hub → Workflow Manager → Workflows → New Workflow

Step-by-Step Instructions

Load a dataset by adding the Dataset Reader Node to the workspace.
Drag and drop the Select node under the Transform node section in the left pane.
Select the necessary columns from the Field dropdown.
Add an Alias for the selected columns if required.
Click Add Item to include new columns.

Select Node

Use of Expressions in Select Node

You can use expressions in the Select node to perform calculations or transformations on existing fields

Example Usage If you want to add a new column salary_inr by performing calculations on salary_in_usd by multiplying it by 87, you can define the expression as follows:

Expression

salary_inr = salary_in_usd * 87

Select Node With Expression Select Node Output

The transformed data will include a new field, salary_inr, containing the converted salary values.

The Select node allows you to apply similar expressions for arithmetic operations, string manipulations, or conditional logic, making it a powerful tool for data transformation.

The newly added column name should also be included under the alias field

Other Examples

null_comlumn=""
monthly_rent = annual_rent / 12
full_name = first_name + " " + last_name
temp_fahrenheit = (temp_celsius * 9/5) + 32

Troubleshooting

Common Issues and Solutions

Problem 1: No values are listed in the Fields dropdown Cause: A Dataset Reader Node is not added before the Select Node or it is not linked to it. Solution: Ensure that a Dataset Reader Node is added before the Select Node and it is linked to it.

Problem 2: Warning sign above the Select Node Cause: The Select Node is not successfully added. Solution: Click on Add to add the node to the workflow, the warning sign will disappear.

Additional Information

Ensure that a Dataset Reader Node is added before the Select Node and it is linked to it if no values are listed in the Fields dropdown.

FAQ

How to rename a column in the output?

The Alias text box can be used to rename a column in the output.

How to unselect a column?

The bin button under the Actions section associated with that column name can be used to unselect a column.

How to delete a Select Node?

The bin button that is present in the Select node can be used to delete a Select Node.

Summary

This guide covered the following:

Retrieving specific columns from a dataset using select node.
Renaming a column using select node.

Drop Node

Welcome to the Drop Node Guide! This guide will assist users in removing specific columns from a dataset.

Expected Outcome: By the end of this guide, you will gain an understanding of the Drop Node and their applications in Vue.ai platform.

Overview

The Drop Node is used to remove unwanted or unnecessary columns from a dataset. It allows you to clean up the data by keeping only the relevant fields required for further analysis or processing.

Prerequisites For a better understanding of Transform Nodes, it is recommended to review the Getting Started with Workflows documentation.

Navigation To begin, the following path should be navigated: Home/Landing Page → Automation Hub → Workflow Manager → Workflows → New Workflow

Step-by-Step Instructions

Load a dataset by adding the Dataset Reader Node to the workspace.
Drag and drop the Drop node under the Transform node section in the left pane.
Choose the columns you want to exclude from a dataset from the Drop Columns dropdown.

Drop Node

Troubleshooting

Common Issues and Solutions

Problem 1: Only Select All Value Listed in Drop Columns Dropdown Cause: Absence of a Dataset Reader Node before the Drop Node or lack of linkage between them. Solution:

Ensure a Dataset Reader Node is added before the Drop Node.
Ensure the Dataset Reader Node is linked to the Drop Node.

Problem 2: Warning Sign Above the Drop Node Cause: The Drop Node has not been successfully added. Solution:

Click on Add to add the node to the workflow.

Expected Outcome: The warning sign will disappear upon successful addition of the Drop Node.

Additional Information

Ensure a Dataset Reader Node is added before the Drop Node and it is linked to it if only Select All value is being listed in the Drop Columns dropdown.

FAQ

Is it possible to drop multiple columns at once?

Yes, all the columns that need to be dropped can be selected in the Drop Columns dropdown at once.

How can a Drop Node be deleted?

The bin button present in the Drop node can be used for deletion.

Summary

The guide covered how to remove specific columns from a dataset using the Drop Node.

Filter Node

Welcome to the Filter Node guide! This guide will assist users in filtering the rows of a dataset based on defined criteria.

Expected Outcome: By the end of this guide, you will gain an understanding of the Filter Node and their applications in Vue.ai platform will be gained.

Overview

The Filter Node is utilized to refine datasets by applying specific conditions to include or exclude rows based on defined criteria. It allows focusing on relevant data by eliminating unnecessary or irrelevant records, thereby enhancing the quality and accuracy of the dataset. This node ensures that the data meets the required conditions before proceeding to subsequent steps, making it an essential tool for precise data preparation and analysis.

Prerequisites For a better understanding of Transform Nodes, it is recommended to review the Getting Started with Workflows documentation.

Navigation To begin, the following path should be navigated: Home/Landing Page → Automation Hub → Workflow Manager → Workflows → New Workflow

Step-by-Step Instructions

A dataset should be loaded by adding the Dataset Reader Node to the workspace.
Drag and drop the Filter node under the Transform node section in the left pane.
The required columns should be selected from the Field dropdown.
An appropriate condition operator (e.g., column value equals, not equals, greater than, less than, etc.) should be selected to filter the rows from the Conditional Operator dropdown.
Multiple conditions can be combined using logical operators like AND or OR to refine the filter logic.

Filter Node Filter Node Configurations

Available Operators for the Filter Node

The Filter Node supports a variety of operators to help define precise conditions for filtering rows in the dataset. Below is a list of available operators and their usage:

Comparison Operators:

Operator	Description	Example
`==`	Equal to	`columnValue == 'value'`
`!=`	Not equal to	`columnValue != 'value'`
`>`	Greater than	`columnValue > 100`
`<`	Less than	`columnValue < 100`
`>=`	Greater than or equal to	`columnValue >= 50`
`<=`	Less than or equal to	`columnValue <= 50`

Membership Operators:

Operator	Description	Example
`in`	Matches exact value	`columnValue in 'A'`
`isin`	Checks if values are in a list or series	`columnValue.isin(['value1', 'value2'])`

String Operators:

Operator	Description	Example
`beginswith`	Matches strings starting with a specified substring	`columnValue.beginswith('prefix')`
`endswith`	Matches strings ending with a specified substring	`columnValue.endswith('suffix')`
`like`	Matches strings containing a specific pattern	`columnValue.like('%pattern%')`

Other Operators:

Operator	Description	Example
`is_empty`	Checks if the column contains empty or null values	`columnValue.is_empty()`

These operators provide flexibility to filter rows based on numerical, categorical, or textual criteria, ensuring the dataset is tailored to specific needs.

In the Filter node, you can apply multiple conditions using a combination of AND and OR operators to refine your data selection.

How to Use AND & OR Conditions in the Filter Node

AND Condition All specified conditions must be met for a record to be included in the output.

Example: age > 25 AND salary > 50000 This filters records where age is greater than 25 and salary is greater than 50,000.

OR Condition At least one of the specified conditions must be met for a record to be included.

Example: city = 'New York' OR city = 'Los Angeles' This filters records where the city is either New York or Los Angeles.

Combining AND & OR Conditions You can group conditions using parentheses to control evaluation order.

Example: (age > 25 AND salary > 50000) OR (city = 'New York') This filters records where either:

Age is greater than 25 and salary is greater than 50,000, or
The city is New York.

Filter Node Configurations

Example Usage

Multiple Conditions in Filter Node

Troubleshooting

Common Issues and Solutions

Problem 1: Error while filtering a column Cause: Using numerical operator over a categorical column. Solution:

Ensure that numerical columns are filtered out using numerical operators, categorical columns with membership or string operators.

Problem 2: No results after filtering Cause: No valid column value added to filter out the data. Solution:

Ensure a valid column value is added to filter out the data.

Problem 3: No values are listed in the Fields dropdown Cause: No Dataset Reader Node added before the Filter node and linked to it. Solution:

Ensure that a Dataset Reader Node is added before the Filter node and it is linked to it.

Problem 4: Warning sign above the Filter Node Cause: Warning sign usually goes away upon successful addition of the Filter Node. Solution:

Click on Add to add the node to the workflow, the warning sign will disappear.

Additional Information

No values are listed in the Fields dropdown. Ensure that a Dataset Reader Node is added before the Filter node and it is linked to it.

FAQ

Can NULL values in a column be filtered out?

Yes, NULL values can be filtered out by using the is_empty Conditional Operator.

What is the use of the two Select the logic operator of the condition list one as a button and another one as a dropdown?

Both represent how the filters need to work on data. The top-level logic operator allows the addition of a second filter if necessary. The next level logic operator allows the addition of more criteria for a single filter.

Can multiple filters be stacked?

Yes, multiple filters can be stacked by clicking the Add Item.

How can a Filter Node be deleted?

The bin button that is present in the Filter node can be used to delete a Filter Node.

Summary

This guide covered the following:

How to filter a dataset using Filter Node.

OrderBy Node

Welcome to the OrderBy Node guide! This guide will assist users in understanding how to OrderBy one or more columns in ascending or descending order.

Expected Outcome: By the end of this guide, you will gain an understanding of the OrderBy Node and its applications in the Vue.ai platform.

Overview

The OrderBy Node is utilized to organize the dataset by arranging records in a specific order based on one or more columns. It enables efficient data structuring by OrderBying values in ascending or descending order, ensuring consistency and ease of analysis. This node is essential for data preparation, making it easier to identify patterns, optimize workflows, and integrate with other processing nodes.

Prerequisites For a better understanding of Transform Nodes, it is recommended to review the Getting Started with Workflows documentation.

Navigation To begin, the following path should be navigated: Home/Landing Page → Automation Hub → Workflow Manager → Workflows → New Workflow

Step-by-Step Instructions

Load a dataset by adding the Dataset Reader Node to the workspace.
Drag and drop the OrderBy node under the Transform node section in the left pane.
Select the column(s) to OrderBy by from the Field dropdown.
Choose the OrderBy Order (Ascending or Descending) for each selected column.
Click Add Item to include multiple OrderBying criteria.

OrderBy Node

Utilize the Add Item option to incorporate additional OrderBy options.

Example Usage

GroupBy Node Configuration with multiple OrderBys

Troubleshooting

Common Issues and Solutions

Problem 1: Error After Adding a OrderBy Node Cause: OrderBy type not selected after selecting the OrderBy column. Solution:

Ensure the OrderBy type is selected after choosing the OrderBy column.

Problem 2: No Values Listed in the Fields Dropdown Cause: Dataset Reader Node not added before the OrderBy Node or not linked to it. Solution:

Ensure a Dataset Reader Node is added before the OrderBy Node and it is linked to it.

Problem 3: Warning Sign Above the OrderBy Node Cause: Warning sign appears after adding the OrderBy Node. Solution:

The warning sign usually disappears upon successful addition of the OrderBy Node. Click on Add to add the node to the workflow, the warning sign will disappear.

Additional Information

No values are listed in the Fields dropdown. Ensure that a Dataset Reader Node is added before the OrderBy Node and it is linked to it.

FAQ

Can multiple columns be OrderByed?

Yes, multiple columns can be OrderByed by adding the required columns one by one with their OrderBy orders by clicking on Add Item.

How can a OrderBy Node be deleted?

The bin button that is present in the OrderBy node can be used to delete a OrderBy Node.

Summary

This guide covered how to OrderBy one or more columns from a dataset using OrderBy Node in the Vue.ai Platform.

GroupBy Node

Welcome to the GroupBy Node guide! This guide will assist in aggregating data based on one or more columns and summarizing large datasets by grouping similar values.

Expected Outcome: By the end of this guide, you will gain an understanding of the GroupBy Node and their applications in Vue.ai platform.

Overview

The GroupBy Node is utilized to aggregate and summarize data by grouping it based on one or more columns. This node aids in organizing data into meaningful groups and applying aggregate functions to generate insights, such as totals, averages, counts, or other statistical measures. It is essential for analyzing data at a grouped level and preparing it for further processing or visualization.

Prerequisites For a better understanding of Transform Nodes, it is recommended to review the Getting Started with Workflows documentation.

Navigation To begin, the following path should be navigated: Home/Landing Page → Automation Hub → Workflow Manager → Workflows → New Workflow

Step-by-Step Instructions

Load a dataset by adding the Dataset Reader Node to the workspace.
Drag and drop the GroupBy node under the Transform node section in the left pane.
Select the columns by which the data should be grouped from the Columns dropdown. These columns will serve as the keys for creating groups.
Select a column that need to be aggregated from the Field dropdown.
Select an aggregation logic (e.g., sum, average, count, max, min, etc.) to be applied for the selected column from the Aggregation dropdown.
Rename aggregated columns for clarity in the Alias text box.
Click Add Item to add multiple aggregations for various columns as needed.

Alias Field is mandatory for groupby Aggregation

GroupBy Node

Available Aggregation Functions:

Count: Counts the number of records in each group
Sum: Calculates the total sum of values in each group
Average (Mean): Computes the average value for each group
Min: Finds the minimum value in each group
Max: Finds the maximum value in each group
First: Returns the first value in each group
Last: Returns the last value in each group

Utilize the Add Item option to incorporate additional aggregation functions for different columns.

Example Usage

GroupBy Node Configuration With Multiple Aggregation

Troubleshooting

Common Issues and Solutions

Problem 1: No Values Listed in the Fields or Columns Dropdown Cause: Dataset Reader Node not added before the GroupBy Node or not linked to it. Solution:

Ensure a Dataset Reader Node is added before the GroupBy Node and it is linked to it.

Problem 2: Warning Sign Above the GroupBy Node Cause: Warning sign appears after adding the GroupBy Node. Solution:

The warning sign usually disappears upon successful addition of the GroupBy Node. Click on Add to add the node to the workflow.

Additional Information

Ensure that a Dataset Reader Node is added before the GroupBy Node and it is linked to it if no values are listed in the Fields or Columns dropdown.

FAQ

Can multiple columns be used for grouping?

Yes, multiple columns can be selected for grouping to create more granular groups.

Can multiple aggregations be applied to the same or different columns?

Yes, multiple aggregations can be applied by clicking Add Item and configuring additional aggregation functions.

How can a GroupBy Node be deleted?

The bin button that is present in the GroupBy node can be used to delete a GroupBy Node.

Summary

This guide covered how to aggregate and summarize data by grouping it based on one or more columns using GroupBy Node.

Partition Node

Welcome to the Partition Node guide! This guide will assist users in dividing a dataset into multiple subsets based on the values of a specific column.

Expected Outcome: By the end of this guide, you will gain an understanding of the Partition Node and its applications in the Vue.ai platform.

Overview

The Partition Node is utilized to divide a dataset into smaller, manageable subsets based on the unique values of a specified column. This node enables efficient data organization by creating separate datasets for each distinct value, making it easier to process, analyze, or route data to different workflows. It is essential for scenarios where data needs to be segmented for specialized processing or analysis.

Prerequisites For a better understanding of Transform Nodes, it is recommended to review the Getting Started with Workflows documentation.

Navigation To begin, the following path should be navigated: Home/Landing Page → Automation Hub → Workflow Manager → Workflows → New Workflow

Step-by-Step Instructions

Load a dataset by adding the Dataset Reader Node to the workspace.
Drag and drop the Partition node under the Transform node section in the left pane.
Select the column to partition by from the Partition Column dropdown. This column's unique values will determine how the data is divided.
The node will automatically create separate outputs for each unique value in the selected column.

Partition Node

Use Cases for Partition Node:

Data Segmentation: Divide customer data by region, category, or status
Workflow Routing: Route different data types to specialized processing workflows
Parallel Processing: Enable concurrent processing of data subsets
Analysis Preparation: Prepare data for group-specific analysis or reporting

Example Usage

Partition Node Configuration

Troubleshooting

Common Issues and Solutions

Problem 1: No Values Listed in the Partition Column Dropdown Cause: Dataset Reader Node not added before the Partition Node or not linked to it. Solution:

Ensure a Dataset Reader Node is added before the Partition Node and it is linked to it.

Problem 2: Warning Sign Above the Partition Node Cause: Warning sign appears after adding the Partition Node. Solution:

The warning sign usually disappears upon successful addition of the Partition Node. Click on Add to add the node to the workflow.

Problem 3: Too Many Partitions Created Cause: Selected column has too many unique values. Solution:

Consider using a different column with fewer unique values or preprocessing the data to reduce distinct values.

Additional Information

The Partition Node creates separate outputs for each unique value in the selected column. Ensure the selected column has an appropriate number of distinct values to avoid performance issues.

FAQ

How many partitions can be created?

The number of partitions depends on the number of unique values in the selected partition column. Be mindful of performance when dealing with columns having many distinct values.

Can multiple columns be used for partitioning?

No, the Partition Node works with one column at a time. For multi-column partitioning, consider preprocessing the data to create a combined column.

How can a Partition Node be deleted?

The bin button that is present in the Partition node can be used to delete a Partition Node.

Summary

This guide covered how to divide a dataset into multiple subsets based on column values using the Partition Node.

Join Node

Welcome to the Join Node guide! This guide will assist users to merge two datasets into a single dataset based on a common key.

Expected Outcome: By the end of this guide, you will gain an understanding of the Join Node and its applications in the Vue.ai platform.

Overview

The Join Node is utilized for merging multiple datasets based on a common key. This allows for the combination of relevant information from different sources into a single dataset. It ensures seamless data integration by aligning rows based on matching key values. This node is essential for enriching datasets, performing relational operations, and enabling comprehensive analysis.

Prerequisites For a better understanding of Transform Nodes, please review the Getting Started with Workflows documentation.

Navigation To begin, the following path should be navigated: Home/Landing Page → Automation Hub → Workflow Manager → Workflows → New Workflow

Step-by-Step Instructions

Load two datasets by adding the Dataset Reader Node to the workspace.
Drag and drop the Join node under the Transform node section in the left pane.
Link the two datasets to the Join Node
The Left Dataset and Right Dataset fields are automatically filled based on the selected datasets and how were they given as inputs to join node
Select the type of join (eg. Left, Inner, Outer, Right, Semi, Left_anti, Right_anti, Cross) from the Join Type dropdown
Select a column from left dataset to be used as the join key from the Left Field dropdown under query section
Select a column from right dataset to be used as the join key from the Right Field dropdown under query section
Select the join operator to be used (e.g., Equals, Greater Than, Less Than, etc.) from the Join Operator dropdown
Click Add Item to include additional join conditions.

Join Node Join Node Configurations

Available Operators for the Join Node are:

Operator	Description	Example
`==`	Equal to	`Left Dataset Column Value == Right Dataset Column Value`
`!=`	Not equal to	`Left Dataset Column Value != Right Dataset Column Value`
`>`	Greater than	`Left Dataset Column Value > Right Dataset Column Value`
`<`	Less than	`Left Dataset Column Value < Right Dataset Column Value`
`>=`	Greater than or equal to	`Left Dataset Column Value >= Right Dataset Column Value`
`<=`	Less than or equal to	`Left Dataset Column Value <= Right Dataset Column Value`

Utilize the Add Item option to incorporate additional join conditions effectively.

![Join Node Configurations](https://supportsite.azureedge.net/supportsite/Group 7.png)

Example Usage

Join Node Configuration With Multiple Conditions

Troubleshooting

Common Issues and Solutions

Problem 1: The Left Dataset and Right Dataset values are not automatically filled Cause: The Dataset Reader Nodes have not been added or linked. Solution: Ensure that the Dataset Reader Nodes have been added and linked.

Problem 2: No values listed in Left Field or Right Field under query section Cause: A Dataset Reader Node has not been added to Join Node and it is not linked to it. Solution: Ensure that a Dataset Reader Node is added to Join Node and it is linked to it.

Problem 3: Warning sign above the Join Node Cause: This warning sign usually disappears upon successful addition of the Join Node. Solution: Click on Add to add the node to the workflow, the warning sign will disappear.

Additional Information

Ensure that the Dataset Reader Nodes have been added and linked if the Left Dataset and Right Dataset values are not automatically filled. If no values are listed in Left Field or Right Field under the query section, ensure that a Dataset Reader Node is added to Join Node and it is linked to it.

FAQ

Is it possible to join three datasets?

Yes, a join operation for three datasets can be performed. Join two datasets and use another Join Node to merge the third dataset with the output of the previous Join Node to get the result.

How can a Join Node be deleted?

The bin button that is present in the Join Node can be used to delete a Join Node.

Summary

This guide covered how to merge two datasets with a common key using Join Node in a workflow.

Union Node

Welcome to the Union Node guide! This guide will assist users in combining two or more datasets with identical structures into a single dataset.

Expected Outcome: By the end of this guide, you will gain an understanding of the Union Node and its applications in the Vue.ai platform.

Overview

The Union Node is utilized to combine multiple datasets with identical column structures into a single unified dataset. This node enables the consolidation of data from various sources by stacking records vertically, ensuring seamless integration of datasets with matching schemas. It is essential for aggregating data from multiple sources, merging historical and current data, or combining datasets from different time periods or regions.

Prerequisites For a better understanding of Transform Nodes, it is recommended to review the Getting Started with Workflows documentation.

Navigation To begin, the following path should be navigated: Home/Landing Page → Automation Hub → Workflow Manager → Workflows → New Workflow

Step-by-Step Instructions

Load two or more datasets by adding Dataset Reader Nodes to the workspace.
Ensure all datasets have identical column structures (same column names and data types).
Drag and drop the Union node under the Transform node section in the left pane.
Connect all the datasets that need to be combined to the Union Node.
The node will automatically combine all connected datasets into a single output dataset.

Union Node

Important Requirements:

Identical Column Structure: All input datasets must have the same column names and data types
Column Order: Columns should be in the same order across all datasets
Data Type Consistency: Ensure data types match for corresponding columns

Use Cases for Union Node:

Historical Data Consolidation: Combine data from different time periods
Multi-Source Integration: Merge data from multiple sources with identical schemas
Regional Data Aggregation: Combine datasets from different geographical regions
Batch Processing: Consolidate multiple batch files into a single dataset

Example Usage

Union Node Configuration

Troubleshooting

Common Issues and Solutions

Problem 1: Schema Mismatch Error Cause: Input datasets have different column structures or data types. Solution:

Ensure all datasets have identical column names and data types.
Use Select or Transform nodes to standardize column structures before union.

Problem 2: No Values Listed or Empty Result Cause: Dataset Reader Nodes not properly connected to the Union Node. Solution:

Ensure all Dataset Reader Nodes are properly connected to the Union Node.
Verify that input datasets contain data.

Problem 3: Warning Sign Above the Union Node Cause: Warning sign appears after adding the Union Node. Solution:

Click on Add to add the node to the workflow, the warning sign will disappear.

Additional Information

The Union Node requires all input datasets to have identical column structures. Use data transformation nodes to align schemas before applying union operations.

FAQ

Can datasets with different column structures be unioned?

No, all datasets must have identical column structures. Use transformation nodes to align schemas before union.

Is there a limit to the number of datasets that can be unioned?

There is no strict limit, but performance may be affected with a very large number of datasets.

How can a Union Node be deleted?

The bin button that is present in the Union node can be used to delete a Union Node.

Summary

This guide covered how to combine multiple datasets with identical structures using the Union Node.

Transform Node

Welcome to the Transform Node guide! This guide will assist users in applying custom transformations and calculations to dataset columns using expressions.

Expected Outcome: By the end of this guide, you will gain an understanding of the Transform Node and its applications in the Vue.ai platform.

Overview

The Transform Node is utilized to apply custom transformations, calculations, and data manipulations to existing columns or create new columns using expressions. This node provides flexibility in data processing by enabling complex mathematical operations, string manipulations, conditional logic, and data type conversions. It is essential for data preparation, feature engineering, and custom business logic implementation.

Prerequisites For a better understanding of Transform Nodes, it is recommended to review the Getting Started with Workflows documentation.

Navigation To begin, the following path should be navigated: Home/Landing Page → Automation Hub → Workflow Manager → Workflows → New Workflow

Step-by-Step Instructions

Load a dataset by adding the Dataset Reader Node to the workspace.
Drag and drop the Transform node under the Transform node section in the left pane.
Define the transformation expression in the Expression field using Python-like syntax.
Specify the Output Column Name for the transformed data.
Click Add Item to include multiple transformations.

Transform Node

Expression Examples:

Mathematical Operations:

# Calculate total price with tax
total_price = price * (1 + tax_rate)

# Calculate age from birth year
age = 2024 - birth_year

# Convert temperature
fahrenheit = (celsius * 9/5) + 32

String Operations:

# Concatenate columns
full_name = first_name + " " + last_name

# Extract substring
domain = email.split("@")[1]

# Convert to uppercase
upper_name = name.upper()

Conditional Logic:

# Conditional assignment
category = "High" if score > 80 else "Low"

# Multiple conditions
grade = "A" if score >= 90 else ("B" if score >= 80 else "C")

Example Usage

Transform Node Configuration

Troubleshooting

Common Issues and Solutions

Problem 1: Expression Syntax Error Cause: Invalid Python syntax in the expression field. Solution:

Verify the expression syntax follows Python conventions.
Check for proper parentheses, quotes, and operators.

Problem 2: Column Not Found Error Cause: Referenced column name doesn't exist in the dataset. Solution:

Ensure all column names used in expressions exist in the input dataset.
Check for correct spelling and case sensitivity.

Problem 3: Data Type Mismatch Cause: Operation not supported for the given data types. Solution:

Ensure operations are compatible with column data types.
Use appropriate type conversion functions if needed.

Additional Information

The Transform Node uses Python-like expressions. Ensure familiarity with Python syntax for optimal usage.

FAQ

What types of expressions are supported?

The Transform Node supports Python-like expressions including mathematical operations, string manipulations, conditional logic, and function calls.

Can multiple columns be transformed simultaneously?

Yes, use the Add Item option to define multiple transformations in a single node.

How can a Transform Node be deleted?

The bin button that is present in the Transform node can be used to delete a Transform Node.

Summary

This guide covered how to apply custom transformations and calculations to dataset columns using the Transform Node.

Custom Nodes

Custom nodes allow you to extend the Automation Hub with your own functionality and integrate external services and libraries.

Create Custom Code Nodes

Custom Code Nodes are essential components in workflow automation, allowing users to execute custom logic within their workflows with integrated development environments.

Prerequisites

Access to the Automation Hub
Familiarity with JSON Schema for defining node configurations
Basic knowledge of Python for implementing custom logic
GitHub access for version control

Setting Up a Custom Code Node

Access the Nodes Page
- Navigate to the Nodes section within the Automation Hub
- Click on the + New Node button
Fill in Node Details
- Name: Provide a user-friendly name
- Group Name: Select appropriate group or add new group
- Runtime: Choose runtime (Python/Spark)
- Description: Add brief description
- Tags: Optionally add searchable tags

Creating a New Node

Define Node Schema Define the structure using JSON Schema:

{
  "id": "loginFormUI",
  "type": "object",
  "properties": {
    "username": {
      "type": "string",
      "title": "Username"
    },
    "password": {
      "type": "string",
      "title": "Password",
      "minLength": 6
    }
  },
  "required": ["username", "password"]
}

Configuring Node Form Schema Form Data Preview

Use the chat option in the lower-right corner for assistance with building basic forms.

Access and Edit Code in VS Code Server
- Navigate to Code Server section
- Use integrated VS Code editor to write and modify code
- Include necessary packages in requirements.txt file
- Clone repositories, create files, or open existing projects

VS Code Server Environment

Exclude these pre-installed base requirements from your requirements.txt:

requests==2.30.0
pandas
numpy==1.*

Deployment Configurations Configure resource allocation in Deployment Config section:
- CPU Limit/Request: Define CPU usage constraints
- Memory Limit/Request: Set memory bounds
- Number of Replicas: Specify instances for scaling
- Idle Replica Count: Default 0, can be removed to reduce wait time

Setting Up Deployment Configurations

Code Server Project Structure

codenode/
│
├── main.py
├── requirements.txt
├── README
├── .gitignore

main.py: Primary script where logic is implemented
requirements.txt: Lists dependencies
README: Includes setup instructions
.gitignore: Specifies ignored files

Enabling the Python Environment

python3 -m venv myenv  
source myenv/bin/activate  
pip install -r requirements.txt

Accessing Terminal from Code Server

Accessing Node Data and Configuration

Previous Node Data:

previous_node_name = list(payload['payload'].keys())[0]
input_data = payload["payload"][previous_node_name]["result"]["data"]

Current Node Data:

current_node_data = payload['node_details']['node_config']['query_value'][current_node_data_key]

Accessing Secrets Secrets are stored in the Secrets Manager:

from meta.global_constants import get_secrets_data
secret_json = get_secrets_data(f'{client_id}-{your-secret-name}')

Accessing Secrets Manager

SDK Client Initialization Sample Python code to initialize SDK client:

import logging
from vue_pre_sdk.client import AdminClient, AccountClient, ConnectorClient, DatasetsClient, MLOpsClient, UserClient, WorkflowClient

def main_process(payload: dict, logger: logging.Logger = None, **kwargs):
    client_id = payload["account_id"]

    mlops = WorkflowClient(
        base_url=os.environ.get("RBAC_URL"),
        api_key=("your_api_key"),
        account_id=client_id
    )

This functionality is available for nodes created after version 2.3.7 release.

Users can generate API keys in the tool under API Keys section in Account Settings.

Example Code Node

import logging
import json
from meta.global_constants import get_secrets_data
from openai import OpenAI

def main_process(payload, **kwargs):
    logger = logging.getLogger("CODENODE_APP")
    logger.setLevel(logging.INFO)
    
    secret_json = get_secrets_data('<your-secret-name>')
    client = OpenAI(api_key=secret_json["OPENAI_API_KEY"])

    input_data = [{"name": "Alice", "age": 30}, {"name": "Bob", "age": 25}]
    prompt = payload['node_details']['node_config']['query_value']['prompt']

    response = client.chat.completions.create(
        model="gpt-4-1106-preview",
        messages=[{"role": "user", "content": json.dumps(input_data) + prompt}]
    )
    result = response.choices[0].message['content']
    logger.info(result)
    return result

Committing and Pushing Changes

Using VS Code UI:

Open Source Control Tab
Stage and commit changes
Click Commit and Push
Click Publish Branch (if first time)

Push and Commit from VS Code UI

Using Terminal:

git add .
git commit -m "<Your Commit Message>"
git push

Monitoring Docker Builds Monitor Docker image builds using GitHub Actions within VS Code server:

Access GitHub Actions: Open GitHub Actions Tab and log in to GitHub account
Track Build Progress: View detailed logs and troubleshoot errors
Apply Changes: Navigate to Workflow section to update workflows after successful builds

Monitor Multiple Docker Builds

Developer Hub

Equip data scientists and engineers with cutting-edge notebooks and MLOps solutions.

Streamline development workflows for scalable and production-ready applications
Equip developers with tools for advanced data science and machine learning operations

Notebooks

Interactive and powerful tools for writing, executing, and visualizing code. Widely used in data science, machine learning, and scientific computing for combining code, text, and visualizations in a single environment.

Overview

This guide assists users in creating and managing notebooks efficiently, writing and executing code within notebook cells.

Prerequisites

Access to Vue.ai Platform Developer Hub → Notebooks
Understanding of Jupyter notebook concepts

Navigate to Home/Landing Page → Developer Hub → Notebooks

If the page is not visible, navigate to File → Hub Control Panel and click Stop my server.

Server Option

Select the required environment depending on needs (Python / Spark) and click "Start"
This will start setting up the selected environment and redirect once ready

Notebook Home

Notebook Interface

Left Sidebar

Notebooks Left Sidebar

File Browser: Displays directory structure where Notebook is running
Running Terminals and Kernels: Shows open files, running kernels, and terminals
Table of Contents: Automatically generated from markdown cells, with linked sections
Extension Manager: JupyterLab extensions for customizing themes, file viewers, editors, and renderers

Launcher Workspace

Notebook Launcher

Notebooks: Click Python 3 (ipykernel) icon to create new interactive Python notebook
Console: Click Python 3 (ipykernel) icon to open interactive Python shell
Other: Terminal, Text Files, Markdown, Python File

Menu Bar

Notebook Menubar

File: Manage notebooks and files (create, save, export, close)
Edit: Perform actions like undo, redo, cut, copy, paste, find/replace
View: Customize appearance (toggle line numbers, cell toolbar)
Run: Execute code cells in notebook or console
Kernel: Manage execution environment, restart or shut down
Tabs: Manage open tabs or notebooks in workspace
Settings: Customize notebook behavior and theme

Step-by-Step Instructions

Creating a New Notebook

Click the Python 3 (ipykernel) button in the Notebook section from the Launcher.

The new notebook opens in a new tab with default name Untitled.ipynb. Rename by clicking the current name at top and entering new name.

Create Notebook

Upload a Notebook or File

Click Upload icon in Left Sidebar File Browser
Choose file to upload and click Open
Upload existing notebooks or datasets for use in notebooks
Selected file will be added to workspace

Creating & Using Datasets via Vue.ai SDK

The DatasetClient of Vue.ai SDK allows users to seamlessly create, list, edit, and delete datasets within notebooks.

For more information, visit Vue.ai Datasets and Data Service SDK - Datasets.

Switching between Environments

Notebooks support multiple execution environments based on workload requirements:

Python environments: Small, Medium, and Large configurations for standard computations
Spark environments: Small and Large configurations for distributed processing

Switching with Hub Control Panel

Navigate to File → Hub Control Panel and click Stop My Server

Hub Control Panel Stop My Server

Once server is stopped, Start My Server button appears - click to select new environment

Start My Server

Select environment based on requirements and click Start

Server Option

Switching using LogOut

Navigate to File → LogOut and click Stop My Server

Logout

Click Login to select new environment - opens Server Options page
Select environment and click Start

Additional Information

Command mode: Used to execute code. Press esc key to enter command mode
Editor mode: Used to write/edit code. Click on any cell to enter Editor mode

To save notebook in various formats, go to File → Save and Export Notebook As and choose from HTML (.html), LaTeX (.tex), Markdown (.md), PDF (.pdf), Executable python Script (.py).

MLOps

The MLOps SDK integrates with MLflow to facilitate creation of experiments, logging of multiple models, and comparison of different models using the MLflow UI.

Overview

This guide assists in understanding how to use the MLOps SDK to create experiments, log models, and compare different models in MLflow with a breast cancer classification problem example.

Prerequisites

Access to the MLOps SDK
MLflow server running and accessible
Dataset for breast cancer classification

MLflow Authentication MLFlow UI

Using the MLOps SDK

Importing MLOpsClient

import pprint
from vue_pre_sdk.client import MLOpsClient

Initializing MLOpsClient

client = MLOpsClient(
    base_url="your_base_url",
    access_token="your_access_token",
    api_key="your_api_key",
    account_id="your_account_id"
)

Create an Experiment

create_experiment_payload = {
   "experiment_name": "Breast-Cancer-Experiment",
   "experiment_description": "Experiment to predict the breast cancer for a given patient", 
   "tags": ['classification'], 
}

created_experiment = client.experiments.create(create_experiment_payload)
pprint(created_experiment)

Create Experiment Response MLflow UI

Train and Log Models

Multiple models (Logistic Regression, Random Forest, SVC) are trained and logged using the create_model API.

Preprocessing

import pandas as pd
import cloudpickle
import base64
import io
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.impute import SimpleImputer
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OrdinalEncoder
import matplotlib.pyplot as plt
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

data = load_breast_cancer()
df = pd.DataFrame(data.data, columns=data.feature_names)
df['target'] = data.target

X_train, X_val, y_train, y_val = train_test_split(df.drop('target', axis=1), df['target'], test_size=0.2, random_state=42)
pipeline = Pipeline([
   ('imputer', SimpleImputer(strategy='mean')),
   ('encoder', OrdinalEncoder(handle_unknown='use_encoded_value', unknown_value=-1))
])

X_train_processed = pipeline.fit_transform(X_train)
X_val_processed = pipeline.transform(X_val)

Input Dataset

Metrics Calculation and Model Serialization

def calculate_metrics(model, X_train_processed, X_val_processed, y_train, y_val):
   y_train_pred = model.predict(X_train_processed)
   y_val_pred = model.predict(X_val_processed)

   train_accuracy = accuracy_score(y_train, y_train_pred)
   train_precision = precision_score(y_train, y_train_pred)
   train_recall = recall_score(y_train, y_train_pred)
   train_f1 = f1_score(y_train, y_train_pred)
   val_accuracy = accuracy_score(y_val, y_val_pred)
   val_precision = precision_score(y_val, y_val_pred)
   val_recall = recall_score(y_val, y_val_pred)
   val_f1 = f1_score(y_val, y_val_pred)

   metrics_dict = {
      "training_metrics": {
            "accuracy": train_accuracy,
            "precision": train_precision,
            "recall": train_recall,
            "f1-score": train_f1
      },
      "validation_metrics": {
            "accuracy": val_accuracy,
            "precision": val_precision,
            "recall": val_recall,
            "f1-score": val_f1
      }
   }
   return metrics_dict

def serialize_object(model):
   model_bytes = cloudpickle.dumps(model)
   model_base64 = base64.b64encode(model_bytes).decode('utf-8')
   return model_base64

Model 1 - Logistic Regression

model = LogisticRegression()
model.fit(X_train_processed, y_train)
metrics_dict = calculate_metrics(model, X_train_processed, X_val_processed, y_train, y_val)
serialized_model = serialize_object(model)
serialized_pipeline = serialize_object(pipeline)

create_model_payload = {
   "model_name": "LogisticRegression",
   "model_description": "Logistic Regression model to predict breast cancer",
   "tags": ['Classifier', 'sklearn'],

   "experiment_name": "Breast-Cancer-Experiment",
   "task": "Classification",
   "is_automl": False,

   "model_parameters": {
      "model_architecture": "LogisticRegression",
      "library": "scikit-learn",
      "library_version": "1.5.0", 
      "model_args": dict(model.get_params())
   },
   "metrics": metrics_dict,
   "artifact_config": {
      "model_object": serialized_model,
      "data_preprocessing_pipeline": [{"step_name": "pipeline", "preproc_object": serialized_pipeline}]
   },
   'model_interpretability': {
      'feature_scores': {
            'visual_representation_object': "",
            'tabular_representation_object': ""
      }
   }
}

created_model = client.models.create(create_model_payload)

MLflow UI

Model 2 - Random Forest & Model 3 - SVC follow similar patterns with their respective configurations

MLflow UI

Retrieve a Specific Model

model_id = "<model-id>"
retrieved_model = client.models.get(model_id)
pprint(retrieved_model)

Get Model Response MLflow UI

Compare Model Performance

MLflow comparison feature is used to evaluate model metrics and determine the best model.

MLflow UI

Load Model and Do Inference

model = client.models.get("<model-id>")
import joblib
X_val = <input-data>

SERVICE_PROVIDER = "<SERVICE_PROVIDER>"
REGION = "<REGION>"

artifact_path = model['data']['artifact_config']['model_path']
MLOPS_BUCKET_NAME = artifact_path.split('/')[2]
model_path = '/'.join(artifact_path.split('/')[3:])

storage = PolyCloudStorageSupport(SERVICE_PROVIDER, REGION, MLOPS_BUCKET_NAME)
model_bytes = storage.read_file_from_cloud(model_path)

ml_model = joblib.load(model_bytes)
prediction = ml_model.predict(x)

Delete Models

model_id = "<model-id>"
response = client.models.delete(model_id)
pprint(response)

Delete Model Response MLflow UI

MLOps v2 User Guide

Overview

This guide helps you learn how to use the MLOps SDK to manage experiments and models, interact with MLflow UI, and explore various logging features with different ML frameworks.

Prerequisites

Access to the MLOps SDK
MLflow server running and accessible

MLflow Authentication MLflow UI

Using the MLOps SDK

Initializing MLOpsClient

import pprint
from vue_pre_sdk.client import MLOpsClient

client = MLOpsClient(
    base_url="your_base_url",
    access_token="your_access_token",
    api_key="your_api_key",
    account_id="your_account_id",
    version="v2"
)

1. Create Experiment

create_experiment_payload = {
   "experiment_name": "Breast-Cancer-Experiment",
   "experiment_description": "Experiment to predict the breast cancer for a given patient", 
   "tags": ['classification'], 
}

created_experiment = client.experimentsv2.create(create_experiment_payload)
pprint(created_experiment)

Create Experiment Response MLflow UI

2. Create Model with Extensive Logging The flow includes:

Load data, preprocess, split into train/validation sets, train model
Encode model, image, and figure
Log model, params, metrics, text, dictionary, image, figure, table, artifacts
Create model and compare in MLflow UI

Data Loading and Preprocessing

data = load_breast_cancer()
df = pd.DataFrame(data.data, columns=data.feature_names)
df['target'] = data.target

X_train, X_val, y_train, y_val = train_test_split(df.drop('target', axis=1), df['target'], test_size=0.2, random_state=42)
pipeline = Pipeline([
    ('imputer', SimpleImputer(strategy='mean')),
    ('encoder', OrdinalEncoder(handle_unknown='use_encoded_value', unknown_value=-1))
])

X_train_processed = pipeline.fit_transform(X_train)
X_val_processed = pipeline.transform(X_val)

Input Dataset

Only the following models can be encoded:

Sklearn
XGBoost
Lightgbm
Statsmodels

Model Training - Logistic Regression

lr_model = LogisticRegression()
lr_model.fit(X_train_processed, y_train)
encoded_lr_model = client.models.encode_model(lr_model)
metrics = calculate_metrics(lr_model, X_train_processed, X_val_processed, y_train, y_val)

Logging Capabilities

Log Model

loggers = {
    "model": {
        "model_library": "sklearn",
        "encoded_model": encoded_lr_model
    }
}

loggers = {
    "model": {
        "model_library": "sklearn",
        "model_path": "s3://bucket-name/path/to/model"
    }
}

The path can start with s3://, gs://, or abfs:// depending on the cloud provider.

Log Params

loggers = {
    "params": lr_model.params()
}

Log Metrics

loggers = {
    "metrics": metrics
}

Log Text

text = "This is a logistic regression model. It predicts breast cancer."
loggers = {
    "text": [
        {
            "file_name": "notes.txt",
            "text_value": text
        }
    ]
}

Log Dictionary

dictionary = {
    "model_name": "Logistic Regression",
    "model_library": "sklearn",
    "library_version": "1.24.3",
}
loggers = {
    "dictionary": [
        {
            "file_name": "model_dict.json",
            "dict_value": dictionary
        }
    ]
}

Log Image

from PIL import Image
image_path = "<image-path>"
image_object = Image.open(image_path)
encoded_image = client.models.encode_image(image_object, "PNG")

loggers = {
    "image": [
        {
            "file_name": "image.png",
            "encoded_image": encoded_image
        }
    ]
}

Log Figure

import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0, 10, 100)
y = np.sin(x)

fig, ax = plt.subplots()
ax.plot(x, y, label="Sine Wave")

encoded_figure = client.models.encode_figure(fig)

loggers = {
    "figure": [
        {
            "file_name": "model_dict.txt",
            "encoded_figure": encoded_figure
        }
    ]
}

Log Table

table = {
    "Features": ["BP", "Haemoglobin", "Sugar level"],
    "Importance": [0.1, 0.2, 0.3]
}
loggers = {
    "table": [
        {
            "file_name": "feature_importance.json",
            "table_value": table
        }
    ]
}

Log Artifacts

files = ["s3://bucket-name/path/to/file1", "s3://bucket-name/path/to/file2"]
folders = ["s3://bucket-name/path/to/folder1", "s3://bucket-name/path/to/folder2"]
loggers = {
    "artifact": {
        "files": files,
        "folders": folders
    }
}

Create Model

create_model_payload = {
    "model_name": "Logistic Regression",
    "model_description": "Logistic Regression model to predict breast cancer",
    "tags": ['Classifier', 'sklearn'],
    "experiment_name": "Breast-Cancer-Experiment",
    "task": "Classification",
    "loggers": loggers
}
create_model_response = client.models.create(create_model_payload)
pprint(create_model_response)

Create Model Response

Multiple Models Comparison

MLflow UI

Examples with Different ML Libraries

Lightgbm Model

import lightgbm as lgb
train_data = lgb.Dataset(X_train_processed, label=y_train)
params = {"objective": "multiclass", "num_class": 3}
model_lgb = lgb.train(params, train_data)

encoded_lgb_model = client.models.encode_model(model_lgb)
cloud_path = client.models.save_model_to_cloud(model=model_lgb, model_library="lightgbm", model_name="lightgbm_model")

Statsmodels

import statsmodels.api as sm

X_train_sm = sm.add_constant(X_train_processed)
model_sm = sm.MNLogit(y_train, X_train_sm).fit()
encoded_sm_model = client.models.encode_model(model_sm)
cloud_path = client.models.save_model_to_cloud(model=model_sm, model_library="statsmodels", model_name="statsmodels")

XGBoost Model

import xgboost as xgb

model_xgb = xgb.XGBClassifier(use_label_encoder=False, eval_metric="mlogloss")
model_xgb.fit(X_train_processed, y_train)

Tensorflow Model

from tensorflow import keras
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

tf_model = keras.Sequential([
    keras.layers.Flatten(input_shape=(28, 28)),
    keras.layers.Dense(128, activation='relu'),
    keras.layers.Dense(10, activation='softmax')
])

tf_model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
tf_model.fit(x_train, y_train, epochs=5, validation_data=(x_test, y_test))

cloud_path = client.models.save_model_to_cloud(model=tf_model, model_library="tensorflow", model_name="tensorflow_model")

Pytorch Model

import torch
import torch.nn as nn
import torch.optim as optim

class SimpleNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(SimpleNN, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(hidden_size, output_size)
    
    def forward(self, x):
        x = self.fc1(x)
        x = self.relu(x)
        x = self.fc2(x)
        return x

torch_model = SimpleNN(10, 20, 2)
cloud_path = client.models.save_model_to_cloud(model=torch_model, model_library="pytorch", model_name="pytorch_model")

Autologging Functionality Example showing autologging-like functionality to log models and metrics at each step:

for epoch in range(5):
    metrics["training_metrics"].append({
        "loss": loss.item(),
        "accuracy": accuracy
    })
    updated_model = client.models.update(model_id, {
        "loggers": {"metrics": metrics}
    })

MLflow UI

Customer Hub

The Customer Hub empowers users to build customer profiles, configure personalized recommendations, manage digital campaigns, and optimize performance through experiments.

Segmentation: Native industry-agnostic audiences for immediate personalization, custom visitor 360 profiles with custom audience capabilities, hyper-tuning audience parameters for refined visitor segmentation, and auto visitor segmentation
Personalization: Industry-best out-of-the-box recommendation models, fine-tune dynamic personalized recommendation models to meet business needs, curation & bundling enhanced with personalized models for maximum gains, automated campaign management system for personalized content delivery, and dynamic experimentation capabilities for continuous recommendation optimization
Performance Analytics: Analyze campaign performance with built-in analytics and build custom reports

Audiences

The Audience Manager enables businesses to create, manage, and analyze audience segments. It helps in understanding customer behavior and optimizing content targeting by segmenting the audience based on specific criteria.

Overview

Users can work with both preset audience segments and custom-built audiences, gaining insights into:

Audience Size and Growth Trends: Monitor how your audience is expanding over time
Engagement and Conversion Performance: Evaluate how well different segments are engaging with your content
Comparative Analysis of Different Audience Segments: Compare performance across various audience groups

With Audience Builder

Segment Visitors and Customers: Group based on shared characteristics, behaviors, and preferences
Create Customized Audience Groups: Align with business goals, such as personalized recommendations
Analyze Audience Engagement Levels: Enhance customer experience through detailed insights

With Audience Parameters

Access Comprehensive Parameters: Define visitor/user attributes such as demographics, interests, and affinities
Leverage Parameters for Targeting: Create precise and targeted custom audience segments

Audience Hub allows you to create custom audiences, access preset audiences, and view their related metrics & overlap information.

Key Features

Create Custom Audiences: Design tailored audiences based on user segments and behaviors
Preset Audiences: Access pre-built audiences that provide insights and help with targeting
Custom Audience Metrics: View detailed metrics for custom and preset audiences
Key Benefits: Optimize audience targeting, improve user engagement, and drive marketing strategies

Audience Performance Tracking

Audience Performance Tracking: Monitor the performance of various audiences using detailed metrics
Overlap Information: Understand the relationship between audiences and how they intersect
Data-Driven Insights: Leverage audience data to refine marketing strategies and optimize campaigns

Latest Activities

Track Changes: View the latest activities from you and your colleagues within Audience Hub
Edit Activity: Click the 'Edit' icon to access and modify any activity
Keep Track: Stay updated with the ongoing changes and updates to audience configurations

Audience Listing

The Audience Manager allows users to view, manage, and upload audiences within the system. You can search, filter, and sort audiences in a table format, with options for uploading custom audiences, checking performance data, and exploring audience overlap information.

Prerequisites

Access to Audience Hub → Audience Manager → Audience Listing

Navigation Path Navigate to Audience Hub → Audience Manager → Audience Listing

Step-by-Step Instructions

Search for an Audience

In the Audience Manager, use the search bar to search for audiences by name or recent keywords
The table will display results based on your search criteria

Sort Columns

Hover over a column header to find the sort icon
Click the sort icon to toggle between ascending and descending alphabetical order
If multiple columns have sorting applied, the last applied sort will take precedence

Filter Columns

Hover over a column header to find the filter icon
Click the filter icon to open a dropdown menu where you can multi-select filter values
You can also search for specific filter values within the dropdown
The table will update to reflect the filtered results

Upload a Custom Audience

Navigate to the Audience Hub → Audience Manager
Click on the 'Upload Audience' button
Download the .csv template and populate it with the necessary data (optional)
Drag and drop your .csv file or browse and select the file to upload
Enter a name and description for your audience
Click on 'Upload'
The uploaded audience will appear in the Audience listing table

Performance Metrics In the Audience listing table, you will find performance metrics next to each audience name:

Metrics	Description
Conversion Rate	Number of orders placed per visit
Revenue Per Visitor	Total revenue generated per visit
AOV	Average revenue generated per order
Unique Visitors	Number of unique visitors
Last Run Date	Date when the audience was last updated

Checking Overlap Information

In the Audience listing table, click on the 'Overlap' icon next to the audience name
View the overlapping details for the selected audience compared to all other available audiences

Export an Audience

In the Audience listing table, click on the export icon next to the desired audience
Export the audience data as a CSV file for further use

Audience Builder

Audience Builder allows you to segment your visitors into different audiences based on your business goals. It helps you define custom audiences by identifying visitors who share similar characteristics.

Prerequisites

Familiarity with the platform's Audience Manager and basic setup for defining events

Navigation Path Navigate to App Drawer → Audience Manager → Audience Builder or Audience Manager → Audience Builder on the navigation bar

Step-by-Step Instructions

Step 1: Create a New Audience

On the Audience Listing Screen, click Create Audience
Choose Build via Form to define an audience using conditions
Alternatively, choose Upload via CSV to create an audience based on visitor IDs from a CSV file

Step 2: Define Audience Details

General Settings

Audience Name: Give your audience a unique name
Description: Add a brief description of the audience
Duration Settings:
- Refresh Frequency: Options are 'Once every 4 hours', 'Once every 6 hours', 'Once every 8 hours', 'Once every 12 hours', or 'Once a day'
- Lookback Period: Options include 'Last 1 day', 'Last 7 days', 'Last 30 days', 'Last 60 days', 'Last 90 days', or 'Last 180 days'

Define Audience Criteria

Condition Groups: You can define criteria using Parameters, such as demographics, interests, or behaviors
- A Condition Group contains rules with conditions (e.g., 'Brand Equals Nike', 'Country Equals India')
- A Sequence Group allows defining sequential behaviors (e.g., 'User added to cart, then bought within 2 days')

Step 3: Create the Audience Once the criteria are defined, click Create to finalize the audience.

It may take 20-30 minutes to create the audience. You can save your progress by clicking Continue Later.

Example: Creating a Simple Audience

Audience: Male iPhone users
- Group 1: Gender = Male
- Group 2: iPhone User = Yes

Example: Creating an Audience with Sequence

Audience: Users who added shoes to cart on Tatacliq and bought them within 1 day
- Group 1 (Add to Cart Event): Event = Add to Cart, Product = Shoes
- Group 2 (Buy Event): Event = Buy, Product = Shoes, Followed by = 1 day

Step 4: Upload Audience via CSV

Click the down arrow next to Create Audience
Select Upload via CSV
Provide a name, description, and upload the CSV file

The maximum file size for the CSV is 50 MB, and user IDs must be anonymous.

Audience Metrics and Information After creating the audience, you can view metrics and performance insights, such as:

Number of Sessions
Revenue Generated
Average Order Value
Conversion Rates

Glossary

Term	Definition
Parameter	A dimension or field that describes a visitor/user, such as gender, number of visits, or category
Condition Group	A group of conditions used to define an audience based on specific parameters and values
Sequence Group	A series of events that occur in sequence, defining specific user behavior over time
Logical Operator	Used to combine conditions: 'AND' (both conditions must be true), 'OR' (either condition must be true)
Boolean Operator	Operators like 'Equals', 'Greater than', 'Less than' used for defining conditions in rules
Time Operator	Specifies time-based conditions such as 'Within', 'After', or 'None' to define event relationships

Audience Presets

Preset Audience computation is done once every 24 hours at the time specified by you.

Audience Segments

Audience Segment	Description	Lookback	Supported Industries
New visitors	Customers who are visiting your site for the first time	lifetime	Yes, for all Industries
Customers with only 1 order	Customers who have made only one order in their lifetime on the site	lifetime	Yes, for all Industries
Repeat visitors with no cart additions or purchases	Customers who have repeatedly visited your site but never made any cart additions or purchases	lifetime	Only for Retail
Cart Abandoners	Customers who have added a product to cart in the last 30 days but not made a purchase	30 days	Yes, for all Industries
Repeat Buyers	Customers who have made more than 1 purchase in their lifetime on the site	lifetime	Yes, for all Industries
High Spenders	Customers who spend more than an average spender in the last 3 months	90 days	Yes with data constraints
Full Price purchasers	All products purchased at full price in the last 3 months	90 days	Yes with data constraints
Discount Purchasers	All products purchased at discounted price in the last 3 months	90 days	Yes with data constraints
Bulk Purchasers	Customers/Dealers/Wholesale purchasers who make bulk purchases in the last 3 months	90 days	Yes with data constraints
Browsers without Vue.ai engagement	Customers who view products without clicking on recommendations in the last 3 months	90 days	Yes, for all industries
Browsers without any Vue.ai exposure	Customers who view products without viewing any recommendations in the last 3 months	90 days	Yes, for all Industries
Purchasers without any Vue.ai engagement	Customers who have made purchases without any clicks on Vue.ai modules in the last 3 months	90 days	Yes, for all Industries
Purchasers without any Vue.ai exposure	Customers who have made purchases without exposure Vue.ai modules in the last 3 months	90 days	Yes, for all Industries
Desktops Visitors	Users who have visited the site from Desktop	90 days	Yes, for all Industries
Mobile Visitors	Users who have visited the site from Mobile	90 days	Yes, for all Industries

Digital Experience Manager

The Digital Experience Manager (DXM) allows businesses to create, personalize, and manage user experiences across different digital touchpoints. It helps in delivering tailored content and optimizing customer journeys.

Overview

With DXM, users can:

Create and Deploy Personalized Experiences: Improve customer engagement with tailored content
Perform Multivariate Testing (A/B Testing): Determine the most effective content variations
Configure and Manage Recommendation Strategies: Deliver dynamic, relevant content
Track Real-Time Performance Metrics: Assess user behavior, conversion rates, and engagement levels

Key Features

Metrics

View metrics for each feature in DXM Hub
Navigate to the Metrics feature via the Metrics Card by clicking on the 'Go to' CTA

Support Documents and FAQs

Access support documentation, the FAQ, and an inspiration library related to DXM Hub

Latest Activities

View a list of the 50 latest changes made by you and your colleagues across DXM Hub
Click the Edit icon on any activity to be redirected to the detailed screen of that activity

Strategy

View details about the latest three created or modified strategies
You can navigate to the strategy listing screen by clicking on the "View All" CTA

Experiences

View details about the latest three created or modified experiences
You can navigate to the experience listing screen by clicking on the "View All" CTA

Pages

View details about the latest three created or modified pages
You can navigate to the page listing screen by clicking on the "View All" CTA

Experiments

View a list of the 50 latest changes made by you and your colleagues across DXM Hub
Click the Edit icon on any activity to be redirected to the detailed screen of that activity

Strategies

With Strategies

Users can:

Select a Model for Personalized Content Recommendations: Choose the best model for your needs
Configure Model Parameters and Define Business Rules: Tailor recommendations to business objectives
Set Up Events to Trigger Tailored Recommendations: Automate content delivery based on user actions

Creating a Strategy

Creating a strategy is the foundational step in crafting a personalized user experience. This process involves tailoring model parameters to meet your needs and customizing content recommendations based on user behavior, business rules, and various other configurations.

Navigation Path Navigate to Strategy → Strategy Listing → Create Strategy

Strategy Listing

Step 1: Create a New Strategy Click the Create Strategy button to begin the configuration process. You will be prompted to:

Enter a unique name for the strategy
Select the catalog that will be used to serve recommendations
Choose a model: Depending on the model selected, you will be presented with relevant parameters to configure

Strategy Configuration Screen

Step 2: Configure Model Parameters You can configure model parameters based on the catalog you select. These include:

Content Attributes: Select attributes such as brand, color, or pattern, and assign a priority score to indicate their importance
Indexed Fields: Content attributes available during catalog onboarding in Content Hub will be used for these configurations

Step 3: Configure Events Events allow you to personalize recommendations based on user actions:

Add to Cart: Display products added to the customer's cart in the last X days
Add to Wishlist: Show products added to the wishlist in the last X days
Pageview: Recommend products viewed by the customer in the last X days
Buy: Display products purchased by the customer in the last X days

Choose a Look Back Period (Daily, Weekly, Monthly, etc.) and assign priority scores to these events.

Configuring Events for Strategy

Step 4: Apply Business Rules (Optional) Business rules allow you to filter the recommendation output based on defined conditions:

Filtering Conditions: Select content attributes to apply conditions like "Brand is Gucci" or "Price greater than $200"
Apply To: Specify which content attributes on the source content page the filter should apply to

Example:

Filtering Conditions: Brand is Gucci
Apply To: Price greater than $200 and Category is 'bags'

These rules ensure that recommendations align with your business needs.

Step 5: Save and Create the Strategy Once the strategy is configured, click Create to save it. The strategy will be listed on the strategy table and available for use.

If you want to save your progress and continue later, click Save & Exit, and the strategy will be saved in draft state.

Final Strategy Configuration

Tips for Strategy Configuration:

Segment & Boosting Content Attributes: Use content attributes to segment and boost relevant content for recommendations
Attribute Deduplication: Ensure uniqueness by deduplicating content based on specific attributes
1:1 Personalization: Enable this for personalized recommendations based on individual user affinities

Templates

With Templates

Users can:

Design and Structure Layouts for Recommendation Widgets: Customize how recommendations appear
Customize Widget Appearance on the Platform: Ensure visual consistency with your brand

Template Management

Template enables you to build layouts which will be used for rendering recommendation widgets on your platform. For example, you can set up a recommendation on your home page with a carousel template that allows customers to scroll through a collection of products.

Template Layout

Navigation Path Navigate to Vue menu bar → Digital Experience Hub → Templates

Template Listing Screen The Template Listing screen provides you with the list of templates created in your account. From the listing screen, you can request creation of a new template, view details about each template configuration, preview, and delete a template.

Search Templates

You can search for created Templates using the Template name via the search bar or use one of the suggested/recently searched keywords
The search results will populate in the table

Sort Columns

Hover over the column header to find the sort icon next to each column
Click the sort icon to sort the column alphabetically in either ascending or descending order
If sorting is applied to multiple columns, the column for which sorting was applied last will take precedence

Filter Columns

Hover over the column header to find the filter icon next to each column
Click the filter icon to open up a dropdown from which you can multi-select the values to be filtered
You can also search for a filter value within the dropdown
The table will be populated with the filtered results

Delete Template

From the Template listing table, next to each Template name, click on the 'Delete' icon under actions
Clicking Delete will prompt you with an overlay modal which lists all the entities (strategies and modules) linked to this template
From here, you can access any entity's config screen and make necessary changes before deleting the template
Deleting the template will unlink it from all linked features and permanently delete the template and its content from the system

Viewing Template Details

From the Template listing table, next to each Template name, click on the 'Info' icon under actions
Template configuration details will open in an overlay modal
From here, you can also access all the entities linked (Strategy & Module) to this template and navigate to them

Request Creation of a New Template

Navigate to Digital Experience Manager (DXM) via Vue Menu Bar and click on 'Assets' → 'Template'
Click on 'Request New'
Fill out the necessary details in the form. We will get back to you with your template within 7 to 14 business days
Provide the following details:
- Template type (Carousel, Carousel with Tabs, Grid, Dressing Room, Product Cards for email)
- Number of tiles
- Styling
- Attributes

Modules

With Modules

Users can:

Combine Strategies and Templates to Deliver Personalized Content: Integrate various elements for a cohesive experience
Deploy Modules Across Multiple Platforms: Use Embed Code, API, and Email for distribution

Module Management

Module enables you to (i) combine a strategy/combination of strategies/contents with a template, (ii) configure the number of results to be shown, (iii) link strategy(s)/content(s) to specific times on the template.

Navigation Path Navigate to Vue menu bar → Digital Experience Manager → Module

Module Listing Page

Module Listing Screen The Module Listing screen provides you with the list of modules created in your account. From the listing screen you can request creation of a new module, view details about each module configuration, Preview and Delete a module.

Viewing a Module config

From the Module listing table, next to each Module name, click on the 'Info' icon under actions
Module configuration details will open in an overlay modal
From here, you can also access all the entities linked (Strategy, Template, Experience) to this Module and navigate to them

Search Modules

You can search for the created Modules using the Module name via search bar or use one of the suggested/recently searched keywords
You can find the search results populated in the table

Sort Columns

Hover over the column header to find the sort icon next to each column
Click the Sort icon to sort the column alphabetically either in ascending or descending order
If the sort is applied to multiple columns, the column for which sort was applied last will take precedence

Filter Columns

Hover over the column header to find the filter icon next to each column
Click the filter icon to open up a dropdown from which you can multi-select the values to be filtered
You can also search for a filter value within the dropdown
The table will be populated with the filtered results

Delete Module

From the Module listing table, next to each module name, click on the 'Delete' icon under actions
Clicking Delete will prompt the user with an overlay modal which lists all the entities (Experiences) this Module is linked with
From here, you can access any entity config screen, make necessary changes before deleting the module
Deleting the module will unlink it from all the linked features and permanently delete it & the content from our system

Request creation of a New Module

Navigate to Digital Experience Manager (DXM) via Vue Menu Bar and click on 'Assets' > Module
Click on 'Request New'
In the form provided please fill out the necessary details. And we will get back to you with your Module within 7 to 14 business days
Details to be provided:
- Module Type: Embed Code, API, Email
- Template
- Strategy(s)
- Min & Max number of items

Pages

With Pages

Users can:

Configure Key Website Pages: Set up home pages, product listings, and cart pages for different platforms
Define Placements on the Website: Control where recommendations are displayed

Page Management

Pages on a website, such as the home page, product listing page, product detail page, and cart page, can be configured for use in different experiences. These pages can be customized based on the user's needs, offering flexibility in content display and interaction.

Prerequisites

Ensure that you have access to the Digital Experience Manager (DXM) via the Vue menu bar to manage your Pages

Navigation Path Navigate to DXM → Pages

Pages Listing

Search Pages You can search for the created Pages using the page name via the search bar or use one of the suggested/recently searched keywords. The search results will be populated in the table.

Sort and Filter Pages

Sort Columns: Hover over the column header to find the sort icon next to each column. Click the icon to sort the column alphabetically in either ascending or descending order
Filter Columns: Hover over the column header to find the filter icon. Click it to open a dropdown and select multiple filter values. You can also search for a specific filter value within the dropdown

Delete Pages

From the Pages listing table, click the Delete icon next to the page name
Deleting the page will unlink it from all the linked experiences and permanently delete it from the system

View Page Details

From the Pages listing table, click the 'Info' icon next to the page name
The page configuration details will open in an overlay modal, where you can also navigate to linked entities and preview the page on supported device types

Request New Page Creation

Navigate to the DXM via the Vue menu bar and click on 'Assets' → 'Page'
Click 'Request New' and fill out the form with necessary details. You will receive your page within 7 to 14 business days

Types of Supported Pages

Page Type	Description
All	Used for overlay placement
Home Page	Main website page
Category Page	Category overview page
Brand Page	Brand overview page
Product Listing Page (PLP)	Category-based product listing page
Product Details Page (PDP)	Description of a specific product in view
Cart	Consists of all added-to-cart products
Checkout	Proceed to purchase, add address and payment details
Order Confirmation	Order confirmed page with order ID and other details
Dressing Room	Virtual dressing room page
Account	Users' personal page
Wishlist	Products added to wishlist page
Search & Listing	Lists all Pages created in your account
Orders	Users' order history
Other Pages - Custom Pages	Customized pages apart from the mentioned types

Experiences

With Experiences

Users can:

Define Customer Touchpoints Across the Website: Map out where users interact with your content
Configure Personalized Recommendations and A/B Test Variations: Experiment with different setups
Publish and Experiment with Different Modules: Test across multiple placements
Set Targeting Conditions: Control audience visibility based on behavior

Experience Creation

An experience is a customer touchpoint—the point of contact or interaction that a customer has with your assets throughout the customer journey. These touchpoints can be pages on the website/app, a marketing email, an ad, ratings, purchasing an item or subscribing to a service.

Experience enables you to configure personalized recommendations on any touchpoint and A/B test different variations to find the best suited one for each customer.

Navigation Path Navigate to Vue Home Page → Digital Experience Manager (DXM) → Experience or hover over the DXM on the navigation bar and click on Experience.

Experience Creation Page

Creating an Experience To create an Experience, click on the 'Create Experience' CTA on the Experience Listing Screen. You will be brought to the Experience Configurations where you can select the touchpoint you want to configure an Experience on with options to target and test.

Experience Details:

Click the 'Create Experience' CTA on the Experience Listing Screen
Give your Experience a unique name

You can save your Experience config anytime by clicking on "Continue Later" CTA & clicking on "Save & Exit". The Experience is displayed on the listing screen in a draft state. (Partially filled details are not saved).

Select a Page

Select a Page

In this section, you can choose the Touchpoint on which you want to set up an Experience. A touchpoint can be any point of contact or interaction your customer shares with your assets.

Select the Page Type you want to place the modules on
Click the 'Pages' dropdown to view and select the Page you want to place the modules on
By clicking the 'Preview' Icon, you can preview each Page to help select the relevant Page

Once the Experience is published, it is not possible to change the Pagetype & Page.

Experience Settings:

Experience Settings

Targeting Conditions Optionally, you can configure Targeting Conditions, which enables you to decide:

Who? - which visitor(s) should see the experience (Ex: Audience, Traffic source and more)
Where? - on which specific page/screen should the experience be shown (Ex: Attributes from your Catalog like Brand, Category, etc.)
When? - render the experience only during specified date & time (Ex: Date, Day or Time)

When no Targeting Conditions are configured, the Experience is shown to All Visitors on your platform, all the time & on all pages/screens.

Targeting Conditions:

Target Based On	Targeting Condition	Description
Who	Audience	Target a group of defined users who should see the Experience. You will be able to select from a list of predefined Audiences or any created Audiences. Ex: New Visitors
	Device Type	Target an Experience to users based on the Device or Platform they are using. Ex: Mobile User
	User(s)	Target a list of custom users which you can add directly via Visitor ID or or MAD ID
	Traffic source	Target an Experience based on where users have landed on your platform from. Ex: From a Search Engine
	Country	Target an Experience to users from a specific Country. Ex: Australia
Where	Attributes	Target an Experience based on any Attribute marked as 'Facet' during Catalog Onboarding. Ex: Brand
When	Date	Target an Experience to be displayed within a particular Date Range. Ex: A sale period
	Day	Target an Experience to be displayed on particular Day(s). Ex: Weekends
	Time of Day	Target an Experience to be displayed between a particular Time of Day. Ex: 8 AM to 8 PM

Set Experience Priority

Set Experience Priority

When there are multiple Experiences configured & published simultaneously on the same page, Priority helps Vue determine which Experience should take precedence. This typically is required when multiple experiences of the same page have the targeting conditions which lead to one visitor being part of the target for more than one experience.

To set priority, reorder the Live Experience based on your preferences within the 'Set Experience Priority' accordion

The Experience at the top of the list has the highest priority, with the order of priority decreasing from top to bottom. Any newly created Experience or Experience being created is by default placed at the top of the Priority List.

Only the Experiences with status that are not draft & archive are shown in the Priority List.

Link Module to Placement

Link Module to Placement

Add modules to any placement on the page. Using the 'Link Module' CTA you can select which modules you want to link to the experience, where it should be displayed on the selected page and how it should behave.

Select the Platform you want to link your modules to
If you have already linked modules to a platform you can also easily import modules already set up on one platform to another Platform using the 'Import From' CTA & select a platform to import from
Click 'Link Module' CTA on the placement where you wish to link your module(s)
Select module(s) from the module listing and click 'Done'. The selected module is now linked & listed within the placement
Click on 'x' icon to unlink a module from the placement
Click on the 'Preview' icon to preview the selected module, placement & the page
Note: You can also click on 'Manage Module' icon to link more Module(s) or unlink already linked Module(s)

A precondition for this is that the pages need to be setup with predefined placements where modules can be placed. This can be done from the Page setup section. If there is no module linked to any placement, there will be nothing rendered on those placements.

Placement Behavior In this section, you can define how you would like the module(s) to behave on your site.

Settings - by clicking the 'Settings' icon on each placement, you can configure:
- Trigger - to define when the module(s) within the placement first appear on the page Eg: On page load, on exit intent and more
- Frequency - to define how often the module(s) within the placement should render on your site Eg: Once per page view, Once per user and more
- Button Behavior - to define how the module should appear on the page on click of the recommendation button Eg: Inline and overlay
- Button Style - you can to select the style of the button from the Vue Button Styles library or select any custom buttons shared
- Enable/Disable - by checking or unchecking the 'Enable' checkbox, you can control whether or not the linked entities within a Placement are displayed on the front end of your platform
- Import From - If you have already linked modules to a platform you can also easily import modules already set up on one platform to another Platform

Placement Behaviors:

Behavior	Options	Description
Trigger	On Page load	To display as soon as page loads
Frequency	Once per page view	To render on each page view
Button Behavior (only for Button Placements)	Inline	To open the module in an inline
Button Style (only for Button Placements)	Button Style Library	Select the style of the button from the Vue Button Styles library or any custom buttons shared

Business Rules If you would like to add Business Rules, you can click the 'Business Rules' icon in the Actions column of any linked module. A business rule acts as a filter to narrow down results based on a business goal or condition.

To add new/manage existing business rule, click the 'Business Rules' icon in the Actions column of any linked module
You can name the rule
If you have more than one catalog, select the catalog to apply the business rule
You can choose any attributes to apply as a filter

Attributes provided for business rules are the fields that are "Indexed" during catalog configuration.

Attributes can be any metadata. For example, Brand, Category, Price etc

You can click on 'And' to add another condition to a rule
For a contextual rule, you can apply a filter with the value option 'Same as Source'. For example, if you have set up a business rule as Brand is 'Same as Source', recommendations will be filtered based on the brand of each Source Content
Optionally, you can choose how the Business Rule is applied with 'Apply to'. Select the condition for when you want the Business Rule to filter results:
'All' - All the conditions should satisfy for this Business Rule to Apply
'Any' - Business Rule can apply as long as any one of the conditions are satisfied

Experiments Experiments enable you to allocate user traffic, test performance based on a business goal or metric & measure the results to determine each touchpoint's winning Experience. There are two types of experiments:

Within placement: This is used to test between multiple modules within a single placement. Typically used when the layout of the website / app is fixed and the question is around which module will work best
Between placements: This is used to test modules placed across different placements. Typically used when the question is around where on the page would be the best to place the module

Within a Placement You can configure a multivariate Experiment between two or modules + control to measure and determine the best performing module.

Click 'Link Module/Manage Module' on the placement to link module(s) and perform the Experiment
Select one or more Modules from the Module listing. Note: When multiple modules are selected 'Control' is automatically linked to the placement. Alternatively you can also link one module + control to configure an A/B test
Click 'Done'. The selected module(s) + control are now linked & listed within the placement
Click on the 'Settings' icon on the placement header to configure the Experiment name, goal, metric & confidence score

Between Placements After you have linked modules to more than one placement on the page, you can run an Experiment between the Placement(s) + Control to determine the best performing combination of module and placement.

Live Preview

Live Preview

To view the modules on your platform, you can use the Live Preview feature. It enables you to view all your linked modules on the selected page of your site, directly from the Experiences section of the tool.

To preview the selected page, click on the 'Live Preview' CTA
You will see your live preview with the option to toggle between desktop and mobile platforms
You can also choose to view configured placements on the page by enabling the toggle 'Display Placements'
View all your linked modules on the page
If more than one module is linked to a placement, you will be able to select the module you wish to preview by clicking on the dropdown above the placement

Publish Experiences Once you have set up the desired recommendation modules and/or A/B test, you can publish the experience:

Click the 'Publish' CTA at the top of the right corner of the screen. Once you publish an Experience, it will be labelled as 'Live' on the Experience Listing Page
After publishing an Experience, it should be available to view on the relevant touchpoint on your platform!

Visitors will be able to see the experience on your platform if they meet the configured experience priority and targeting conditions.

You can 'Unpublish' any live Experience from the Experience listing page by clicking the 'Unpublish' icon found under the actions column on the Experience Listing page

Glossary:

Term	Definition
Experience	An Experience is any touchpoint on your website where you have added one or more recommendations and/or set up an Experiment
Module	A Module is a combination of one or more recommendation Strategies with an optional Template
Placement	A Placement a configured position/location on your platform page/screen where you would like the linked Module(s) to render
Business Rule	A Business Rule is filter that can be applied on recommendations based on a business goal or use case
Experiment	An Experiment is a test run based on a business goal and metric to measure and determine a winning Experience. A test can be run between modules, placements and/or a control group.
Control	Visitors who are shown this variation (Control) do not see any modules. They see your default platform.
Left Navigation Bar	The Left Navigation Bar enables you to view and navigate between the 2 steps of Experience Settings, Page and Target Conditions and Placement Settings
Continue Later CTA	The Continue Later CTA allows you to save or discard changes made to the Experiment Settings and to return to the Experience Listing Page.

Metrics

With Metrics

Users can:

Access Detailed Data on Business Impact: Measure feature performance and experimentation results
Visualize Data in Multiple Formats: Use flexible filtering options to analyze trends
Analyze Trends and Measure Effectiveness: Evaluate the success of different experiences over time

Metrics Overview

Metrics provide you with exhaustive data ranging from business impact to performance to experiment data across all the features configured by you in DXM. You will be able to visualize data in different formats, and slice & view data using various filters for any date range.

Navigation Path

Choose Metrics from the top navigation bar
Alternatively, click on the app drawer → 'Digital Experience Manager' → 'Metrics'

You can view and navigate through the following metrics:

Vue Impact Metrics
Performance Metrics
Experiment Metrics

Vue Impact Metrics

Vue Impact Metrics

To understand the impact of Vue on your business, we provide a host of impact metrics. These metrics help you gain valuable insights into the incremental revenue and improved user engagement that Vue is driving for your website.

Within the Metrics screen, click on 'Vue Impact' on the left navigation panel
By default, the metrics shown are for the last 7 days
- To change the time period, click on the date selector and choose the desired date range
To query & filter metrics by different parameters, click on the 'Advanced Filter' CTA
The following key metrics are displayed by default:
- Assisted Revenue (visit)
- Direct Revenue (7 days)
- Click-Through Rate
- Direct Cart Additions (7 days)
- Direct Product Purchases (7 days)
- User Engagement Rate

You can use the 'Manage' icon to access the list of all available metrics and add/remove metrics to display.

Performance Metrics

Performance Metrics

Performance metrics enable you to view the performance of your customer experience at different levels of granularity: experiences, modules, and strategies.

Choose Metrics from the top navigation bar
Select 'Performance' on the left navigation panel
Click on the drop-down and select the time period to be used for calculating the metrics. The default time period is the last 7 days
Switch between the following sub-tabs for detailed information:

Filter	Description
Page	Performance data aggregated at the Page level
Experience	Performance data for different experiences published by you
Module	Performance data for different modules configured
Strategy	Performance data for different strategies configured
Facet	Performance data for different strategies configured

Experiment Metrics

Experiment Metrics

Experiment metrics enable you to view the data of all the experiments configured in one place. You can also control your experiments from here.

Choose Metrics from the top navigation bar
Select 'Experiment' on the left navigation panel
Click on the drop-down and select the time period to be used for calculating the metrics. The default time period is the last 7 days
In the All Experiments table:
- Click on the 'Pause/Play' icon to update the state of the experiment
- Click on the 'info' icon to view details about each experiment

Experiment Details

View the details about a variation with the experiment, the status of each variation, Experiment Metric, Uplift & Confidence Score
- Click on the 'Pause/Play' icon to update the state of the experiment
- Click on the 'Export' CTA to download the metrics screen as a CSV or PDF
- Click on the 'Gear' CTA, select the Metrics you want to access, and click on 'Done'
- Click on the 'Advanced Filters' CTA, configure the query to filter & click on 'Done'
- Filtered data will be displayed

Glossary

Field	Description
Export Metrics	Click on the 'Export' CTA to download the metrics displayed as a CSV or PDF
Manage Metrics	Click on the 'Gear' CTA, select the Metrics you want to access, and click on 'Done'
Advanced Filters	Click on the 'Advanced Filters' CTA, configure the query to filter & click on 'Done'. Filtered data will be displayed
Search	Search for any Metrics using any feature name
Filters	Hover over any column header where you'd like to apply filters
Sort	Hover over any column header where you'd like to apply sorting

Metrics Description

Field	Description
Unique Visitors	The total number of unique visitors to your website over a selected time period
Product Views	The total number of times products were viewed over a selected time period
Product Purchased	The total number of products purchased over a selected time period
Total Revenue	The total revenue from sales over a period of time
Incremental Revenue through Vue	Revenue resulting from the uplift in conversion rate and average order value in journeys powered by Vue
Assisted Revenue (visit)	Revenue from the sale of any product in a visit with at least 1 click on Vue modules
Direct Revenue (visit)	Revenue from the sale of products clicked and purchased in the same session, recommended by Vue
Direct Cart additions (visit)	The total number of products clicked and added to the cart in the same session, recommended by Vue
Direct Products Purchases (visit)	The total number of products clicked and purchased in the same session, recommended by Vue
Direct Revenue (7 days)	Revenue from the sale of products clicked and purchased within 7 days, recommended by Vue
Click-Through Rate (CTR)	The number of clicks on Vue recommendations divided by the number of times the module is viewed
Average Order Value (AOV)	Average amount spent each time a customer places an order on your website
Average Order Size	Average number of items sold in a single purchase
User Engagement Rate	Percentage of unique visitors that click at least once on your recommendation
Cart Abandonment Rate	Percentage of customers who add items to their shopping cart but abandon the cart and end the session before completing the purchase
Average Revenue per User (ARPU)	Average revenue each user brings, calculated by dividing total revenue by unique users
Revenue per Visit (RPV)	Total revenue generated in each visit, calculated by dividing total revenue by the total number of visits
Conversion Rate	Percentage of orders placed divided by the total number of unique visits
Product Views per Visit	Average number of product pages viewed per visit, calculated as a ratio of product views to unique visits
Opens	Number of Vue recommendation emails opened by customers
Click to Open Rate (CTOR)	Ratio of clicks to opens for Vue recommendation emails

Accounts

Welcome to Vue's Account Settings! This guide will help you understand the basics of managing your account settings and configurations.

Account Settings provides a centralized location to manage your:

User Profile: Update your personal information and preferences
Team Management: Add and manage team members and their roles

Quick Start Guide

Accessing Account Settings

To access your account settings:

Log in to your Vue account
Click on your profile icon in the top-right corner
Select "Account Settings" from the dropdown menu

Managing Your Profile

In the Profile section, you can:

Update your name and contact information
Change your password
Set your notification preferences
Configure your timezone and language settings

Organization Settings

The Organization section allows you to:

Update organization details
Manage billing information
Configure organization-wide preferences
Set up custom branding

Team Management

Under Team Management, you can:

Invite new team members
Assign roles and permissions
Manage access levels
Review team activity

Account Settings Overview

Managing and reviewing large quantities of data can be challenging for individuals or small teams. Our user management features are designed to help you efficiently manage your teams and distribute the workload.

Navigation

To access your account settings from any screen, click on the User Profile icon located at the top right corner of your screen. Then, click on 'Account Settings' to view your account details and permissions.

Account Settings Navigation

User Profile

In the User Profile tab, you can view your basic account information. For any edits to be made including changing your account password you need to contact admin of the account.

User Profile

Roles and Permissions

Admin Users

Admin users can access the User Roles tab to create roles, assign permissions, and manage account access across their team:

Click on 'Manage User Roles' from the side navigation of your Account Settings.

Manage User Roles

Here, you will see existing roles or have the option to create new ones.
To create a new role, click the '+ New Role' button. Provide a Name and/or Description for your new role, then assign access permissions per entity as required. Click "Save" when you're ready to return to the listing.

Manage User Roles

To edit, duplicate, or delete existing roles, use the icons provided under the Actions column listed with each created role on the listing.

Manage User Roles

Users & User Groups

Admin users can manage Users and User Groups from the respective tabs within their Account Settings:

To navigate, click on the 'Manage Users' or 'Manage User Groups' tab using the side navigation of your Account Settings.

Manage Users

To create a new User, click the '+ New User' button on the Manage Users tab. Enter relevant information such as Name & Credentials, assign access roles & permissions, and click "Create" when you're ready.
To create a new User Group, click the '+ New User Group' button on the Manage User Groups tab. Enter a Group Name, select the Users to include in this group, assign access roles and permissions, and click "Create" when you're ready.
You will be able to manage and edit User configurations from the Manage Users listing.

Manage Users

Assignment

Admin users can assign entities to other users and/or user groups as follows:

Navigate to the Entity Listing, where you will see an 'Assign User' column.
To assign users at a row level, use the dropdown to select User(s) and/or User Groups.
To bulk assign entities, multi-select the required entities and use the 'Assign Users' button above the listing to select User(s) and/or User Groups.
Non-admin users will only be able to view the entities assigned to them on the relevant listings.

Managing API Keys

Welcome to the Creating and Managing API Keys guide. This guide will assist users in understanding the purpose and use of API Keys and learning the process of creating and managing API Keys.

Who is this guide for? This guide is designed for users who need to integrate their applications or services with the platform's APIs.

Ensure that the necessary permissions for generating API Keys are granted before starting.

Overview

This guide covers the following topics:

Navigating to the API Keys section.
Creating an API Key.
Managing an API Key.
Best practices for securing API Keys.

Prerequisites Before starting, ensure the following requirements are met:

Necessary permissions to generate API Keys are granted.
Understanding of the role-based access control system in the platform.
A secure location to store the API Key is available, as it will not be retrievable later.

Step-by-Step Instructions

Navigating to API Keys Section

Follow these steps to navigate to the API Keys section:

Click on the Profile Icon
Go to Account Settings
Click on API Keys

Creating an API Key

To create an API Key:

Click on +New Key
Provide a unique name and description for the key
Select the role(s) for which the API Key needs to be created
Click on Create

The API Key will be generated immediately.
It should be copied and saved in a secure location, as it will not be available later.

Managing an API Key

To manage an API Key:

Identify the Key to be managed from the API Key listing table
Click on the Edit (pencil) icon
Update the User Roles as required
Click on Save

Troubleshooting

Common Issues and Solutions

Problem 1: Unable to find the API Key after creation Cause: API Keys are only visible once during creation. Solution:

Generate a new key if the previous one is lost.

Problem 2: Access is denied when using the API Key Cause: The assigned user role does not have the necessary permissions. Solution:

Ensure that the assigned user role has the necessary permissions.

Additional Information

Revoking an API Key will disable access immediately.

For enhanced security, consider rotating API Keys periodically.
API Keys should be stored securely and not shared publicly.

FAQ

What are API Keys?

API Keys provide a way to authenticate and access platform features programmatically via APIs.

Can an API Key be retrieved after creation?

No, API Keys are displayed only once. If lost, a new one needs to be generated.

What happens if an API Key is deleted?

The API Key will be permanently revoked, and any services using it will lose access.

Summary

This guide provided instructions on navigating to the API Key section, creating and managing API Keys, and best practices for secure storage.
It is imperative to handle API Keys securely to prevent unauthorized access.

Managing Secrets

Welcome to the Creating and Managing Secrets Guide! This guide will assist in understanding the functionality and benefits of the Secrets Manager and learning how to create, manage, and use secrets effectively.

Who is this guide designed for? This guide is intended for users who need to manage sensitive data within the system.

Ensure access to the Secrets Manager section in Account Settings is available before starting.

Overview

The Secrets Manager allows users to:

Store credentials and other sensitive information in a centralized location.
Use stored secrets in various parts of the system, such as Custom Code Nodes.
Manage secrets by adding, updating, or deleting key-value pairs.

Prerequisites Before beginning, ensure that:

Access to the Secrets Manager section in Account Settings is available.
The necessary permissions to create and manage secrets are granted.
Familiarity with Custom Code Nodes is present, although it's not mandatory.

Step-by-Step Instructions

Navigation

To navigate to the Secrets Manager section:

Click on the Profile Icon.
Select Account Settings.
Choose Secrets Manager.

Creating a Secret

To create a new secret:

Click on +New Key.
Provide a unique Secret name.
Click on Add Item to add a Key-Value Pair.
Under Keyname, input the access name.
Under Keyvalue, input the access secret.
Repeat steps 3 to 5 to add more Key-Value Pairs to the Secret, if necessary.
Click on Create to finish creating the Secret.

Managing a Secret

To manage a secret:

From the Secrets listing table, locate the Secret to be managed.
Click on the Edit (pencil) icon.
Update the Key-Value Pair as required.
Click on Save to apply changes.

Using a Secret

To use a secret in a Custom Code Node:

Create a Custom Code Node.
While building a Custom Code Node, use the following snippet of code:

from meta.global_constants import get_secrets_data
secret_json = get_secrets_data(f'{client_id}-{your-secret-name}')

Replace <your-secret-name> with the actual name of the Secret to be accessed.

Now, any user in the organization (with relevant permissions) can use Secrets securely and efficiently.

Troubleshooting

Common Issues and Solutions

Problem 1: Unable to Access Secrets Manager Cause: Lack of necessary permissions. Solution:

Verify that the necessary permissions are granted.
Contact the administrator if access is restricted.

Problem 2: Secret Not Found Cause: Incorrect Secret name while retrieving it in the Custom Code Node. Solution:

Double-check the Secret name.

Problem 3: Incorrect Key-Value Pair Cause: Incorrect credentials stored and retrieved. Solution:

Verify that the correct credentials have been stored and retrieved.

Problem 4: Changes Not Saved Cause: Page not refreshed. Solution:

Refresh the page and confirm if the updated Key-Value Pair is reflected.

Additional Information

Secrets are encrypted and stored securely.
Only users with the necessary permissions can create, modify, or access secrets.
Secrets can be used in different workflows, including automation and API authentication.

FAQ

Can multiple Key-Value Pairs be stored in a single Secret?

Yes, multiple Key-Value Pairs can be stored in one Secret.

Who can access stored Secrets?

Only users with the appropriate permissions can access stored Secrets.

Can a Secret be deleted?

Currently, this guide does not cover deleting Secrets. Refer to the Secrets Manager documentation for details.

Summary

The Secrets Manager enables secure storage, management, and retrieval of sensitive data.
This guide covered:
- Navigation to Secrets Manager
- Steps to create and manage a Secret
- Usage of Secrets in a Custom Code Node
- Troubleshooting common issues

By following these steps, sensitive information within the system can be securely managed.

Welcome to Vue! You're on the verge of exploring something amazing. The first step, however, is to log in.

Login Methods

Logging into the Vue App can be done through:

Email Credentials
SSO (Supported providers: Google and Okta)

Let's delve into each method for a clear understanding of how to access your Vue AI suite.

Email-based Access

Login

If you already have Vue login credentials:

Navigate to the Login screen.
Enter your registered email address and password.
Click 'Sign In' to access your Vue AI suite.

Forgot Password

If you've forgotten your password:

Click the 'Forgot Password' button on the Login screen.
Enter your registered email address and click 'Send Reset Password Link'.
Check your email for a link to reset your password.

Forgot Password

Request Access

If you're new to Vue and don't have credentials:

Click the 'Contact Us' button to request a demo and access credentials to the Vue AI suite.

Request Access

SSO-based Access

Activate Account

To activate your SSO login:

Have your company admin add your name and email through the '+ New User' form within Account Settings.
Look for a Welcome email in your inbox with an authentication link.
Click the link to activate your account for a seamless sign-in experience.

Sign in

For signing in with activated SSO credentials:

Click on your SSO provider's logo.
Enter your credentials.
You're now logged into Vue!

This guide aims to make your login process as smooth as possible, ensuring you get to your AI suite with ease.

Microsoft Entra ID Configuration

Azure

Navigate to Microsoft Registered App section. Under Microsoft Entra ID, open App Registration.
Select new registration.
The Created Application will contain client id, client secret which will be used in following steps.
Under optional claims, add the necessary fields like email. (for a field to work, azure users should have the respective details present in their account)
Go to Authentication and fill in the redirect URL.

Google Cloud Platform

To use Google SSO to login to the Vue Platform, you would need to do the following:

In your Google Cloud Console, navigate to the APIs & Services section within your Google Cloud Service.

Navigate to APIs & Services

Next, select Credentials.

Select Credentials

Create an OAuth Client ID credentials and choose Application type as Web application.

Create OAuth Client ID

Send the Client ID, Client Secret from the created Application to us at Vue.

Copy Client ID and Secret

We will generate a Redirect URI which needs to be added to Authorized redirect URI's in your application.

Copy & Paste Redirect URIs

Once the above steps are completed, you can use Google SSO to login to the Vue Platform.

Okta Configuration

Setting up Okta OIDC application

Head to your Okta project.
Under Okta Project, navigate to Applications

Navigate to Applications

Create a new application and select the following configurations:
- Sign-in method: OIDC - OpenID Connect
- Application type: Web Application

Create New Application

Provide a name for the Application and select the options in Grant Type as illustrated in the below screenshot

Application Configuration

Under Controlled access, select 'Allow everyone in your organization to access' option and Enable immediate access.

Access Control

The Application will contain client id, client secret, copy both client ID and client secret key.

Client ID and Secret

Paste the Client ID, secret key into Vue's SSO configuration screen and click on Confirm.

Vue SSO Configuration

Copy & Paste the generated Redirect URI's in Okta's Sign-in redirect URIs section

Redirect URIs

And that's it. Okta OIDC SSO integration is enabled for your account.

How to guides

Data Hub

Overview

Connection Manager

Data Ingestion Using Connectors

Getting Started

Configuring a Source

Configuring a Destination

Establishing a Connection

Sources Configuration

HubSpot Data Source Configuration

Google Sheets Data Source Configuration

PostgreSQL Data Source Configuration

Amazon Redshift Source Configuration

Advanced Configuration Options

Destinations Configuration

Redshift Destination Configuration

PostgreSQL Destination Configuration

Vue Data Catalog Destination Configuration

Document Manager

Document Type

Step 1: Navigate to the Document Type Manager

Step 2: Create and Configure the New Document Type

Step 3: Review the Initial (0-Shot) Extraction

Step 4: Refine the Taxonomy

Editing Standard Attributes

Configuring Table Attributes

Step 5: Verify the Final Taxonomy and Extraction

Step 6: Register the Document Type

Document Extraction

Step 1: Navigate the Documents Hub

Step 2: Upload New Documents

Step 3: Review and Annotate Extraction Results

Correcting Data

Step 4: Reviewing Extracted Tables

Step 5: Finalize the Review

Dataset Manager

Data Ingestion

Getting Started

Upload Process

Dataset Groups and Organization

Data Profiling and Quality

Metrics and Visualization

Report Creation

Advanced Visualization Features

Interactive Controls Configuration

Dashboard and Sharing Configuration

Automation Hub

Agent Building

Agent Service Guide

Workflow Manager

Orchestration

Transform Node Workflows

Custom Code Nodes Workflow

Compute Node Workflows

Spark Node Workflows

Nodes

Preset Nodes

Datasets & Connectors Nodes

CSV Dataset Reader

Data Ingress Gateway

Control Flow Nodes

HITL Form

HTTP Node

Branching Node

Trigger Node

ML/DS Nodes

AutoML - Data Preprocessor

AutoML - Model Trainer

AutoML Inference

VizQL Nodes

Select Node

Drop Node

Filter Node

OrderBy Node

GroupBy Node

Partition Node

Join Node

Union Node

Transform Node