Skip to content

Tutorial: Data Science Tutorial Part 1 of 3

Background on Lucd

The Lucd Enterprise AI Data Science Platform is a highly secure, scalable, open and flexible platform for persisting an fusing large and numerous datasets and training AI models for production against those datasets. The Lucd platform is an end to end platform that can be deployed in public cloud environments, on premise on bare metal hardware, or the Lucd multi-tenant PaaS can be directly accessed. The platform consists of:

  • A scalable open data ingest capability
  • A petabyte scale unified data space data repository
  • 3-D Visualization and Exploration
  • An Exploratory Data Analysis Rest Service
  • A Kubernetes environment to train PyTorch and TensorFlow models
  • NLP Word Embedding and Explainable AI Assets
  • Model results visualization and exporting to internal or external serving capability

Introduction, Prerequisites

This tutorial demonstrates the steps required to train an AI model on data leveraging the Lucd Data Science Platform. The tutorial is a toy, leveraging the IRIS dataset, designed to show the basic steps to train a model. In the example a Virtual Data Set is created, A custom operation adds a categorical feature to the existing continuous features. Then a custom Pytorch model is developed and trained in the platform. Both the Lucd 3D UI and the Lucd Python Client are leveraged during the tutorial. The tutorial is brokein up into three Parts:

  1. Part 1: Creating a Virtual Data Set (VDS)
  2. Part 2: Performing a Custom Operation during Exploratory Data Analysis: https://github.com/jmstadt/Tutorials/blob/master/Lucd_Part_2_of_3_Data_Science_Tutorial.ipynb
  3. Part 3: Developing a Custom AI Model and Training in the Lucd Platform https://github.com/jmstadt/Tutorials/blob/master/Lucd_Part_3_of_3_Data_Science_Tutorial-TF_1.ipynb

Prerequisites are:

1. Login to Platform, Visualize and Explore Data

At the end of step 2, there should be a Lucd Client Icon loaded on your Desktop.

Double Click on the Shortcut, the following screen should appear:

Enter the Username, Password, and Domain that you obtained per the prerequisites. Then click LOGIN. The following screen should appear. Refresh the screen.

A screen showing the datasets that were ingested (per the prerequisites from NiFi) will appear. From this screen various cardinality of the datasets can be examined. This is outside the scope of this tutorial but can refer to the Lucd User Guide: https://community.lucd.ai/hc/en-us/articles/360022853292-Lucd-Client-User-Guide

It is outside the scope of this tutorial, but refer to the user guide for the many ways you can search on data in the Unified Data Space, visualize data, and perform transformation (Exploratory Data Analysis)

I.e. Search on Data in the Unified Data Space:

I.e. Visualize the results of a Search:

I.e. Perform Transformations (Exploratory Data Analysis) on the results of a Search:

Step 2: Create a Virtual Data Set (VDS)

In this section we will create a VDS of the IRIS Dataset. There are many different ways to search on data in the Lucd Unified Data Space, fuze and merge multiple datasets in the UI, and perform EDA operations, but for purposes of this tutorial we will just focus on creating a VDS from the IRIS Dataset.

From the Home Screen, Select the "Data" tab

For now, we will just search on the a particular source (the Iris Dataset). Click on Sources

Select Iris and Click Search

The following should result. Click "Transform"

On the next screen, click add "Add most recent search to the workspace"

On the resulting screen, give the search a name, and select the green check box

The Search will appear on the workspace. Note: There are many operations that can be performed on the saved search within the UI, but for now, we will just save the result as a VDS.

Click on the Search and Select Save VDS:

Give the VDS a name and then click Select Data

Typically when creating a VDS, it will be for use in Training an AI model. So, the next screen is where features and labels are selected for AI training. In this tutorial, we will be creating a custom operation first, but we will still select the features and labels and then hit the green check mark

On the next popup, hit the green check mark again and the VDS should appear on the workspace

You can Verify that the VDS is saved as an Asset by clicking the Assets tab and then refreshing:

The following should result:

At this point, we have a VDS that is ready to train an AI model. However, for the purposes of this tutorial, we are going to assume that we would like to create a "helper" column prior to training. That "helper" column would be a a True False categorical column as to whether a particular row's "Flower_Petal_Length" is greater than the mean of all the entire VDS Flower Petal Length. This operation is currently not available as a Standard UI EDA Operation. So, we will have to create a Custom EDA Operation Leveraging the Lucd Python Client. For this, we will go to Part_2 of this tutorial

In [ ]: