Skip to content
Lucd JedAI Client | User Guide | 6.2.7

User Guide

The Lucd JedAI Client is downloaded locally on your device and interfaces with the Lucd Platform. The client enables users to visualize, transform, and prepare data for use in modeling frameworks (TensorFlow, PyTorch, etc.). Models can be uploaded and trained in the platform, which is touchscreen capable (not required).

System Requirements

The following specifications are required in order to run the client.

  • Windows, Linux or MacOS
  • 4 GB Memory
  • Modern CPU

Although not required, we recommend the following specifications in order to maximize the performance of the client.

  • A GPU to support accelerated rendering
  • 1600x900 display resolution minimum

Installation Instructions

The client is distributed via Lucd’s Steam Store.

A user is required to obtain a Steam account in order to access the client download.


Usage Instructions

Login

Log on to the client using the credentials provided to you.

  1. Username
  2. Password
  3. Domain
    Cloud customers will leave the domain field blank when logging in.
    Private build customers will be provided a domain to use when logging in.
  4. Login
  5. Exit Application

After successful authentication, the user is brought to the Home screen. The buttons along the left edge navigate to other 2D screen overlays. The buttons in the right corner manipulate camera perspective and visualization behavior.

Screen_Shot_2020-05-06_at_10.27.59_AM

  1. Home
  2. Data
  3. Modeling
  4. Assets
  5. Governance
  6. Epidemiology
  7. Collapse Sidebar
  8. Options
  9. Logout
  10. Reset Perspective

Home

The Home screen has numerous features. The primary feature of the Home screen is the Sources histogram, displaying the ingested record sources, the number of records per source, as well as date/time information relating to the ingested records.

Screen_Shot_2019-11-26_at_11.43.35_AM

  1. List of currently visible ingested sources
  2. Source Histogram of ingestion timeline for each visible source
    The actual data of the records are not displayed in the histogram. When browsing the sources histogram, the Lucd JedAI Client makes it easy to drill down on a time range of ingestion across all sources, down to the hour.
  3. Click-and-Drag date filter
    To narrow the range of shown data, click and drag the date filters to expand that window of time.
  4. Source Toggles
    Sometimes, a source may have had so much data ingested at a single time that it skews the histogram display scaling. In these cases, another useful function of the Home screen is the ability to hide that specific source from the histogram display by clicking its axis label. At this point, the chart will automatically re-scale the remaining visible data in the histogram, allowing a better, proportional chart display.
  5. Selectable bars
    To expand a single unit of time on the graph (e.g. see a single year of data), click on a bar to zoom in across all sources and change the axis scale to that unit of time. The Lucd JedAI Client allows scaling down to the day, so that a 24-hour period can be seen across all sources on the histogram.
  6. Active filters

Data & Visualization

The Data and Visualization screen is where users will query, visualize, perform Exploratory Data Analysis (EDA) functions, and transform the dataset into a Virtual Data Set (VDS) to be used with the machine learning model. The screen will initially open with a blank panel on the right and the Query option selected on the left.


Query Data

Screen_Shot_2019-11-26_at_11.44.27_AM

  1. Query tab
    To execute a query, begin by navigating to the Query tab. The Lucd JedAI Client provides four ways of querying data: Sources, Facets, Keywords / Dates, and Concepts. These can be combined to get a very specific result set.
  2. Data Sources
    To narrow which sources are queried, navigate to Sources and select the boxes of the desired sources. By default, none are selected, and so all sources will be queried.
  3. Facets filters
    To filter by data facet, navigate to Facets, select the drop-down of a data model, click “Add Filter” next to the desired facet. Selected facet filters will show up below the available facets list. To remove a facet filter, click its red X button. Multiple filters on the same facet are possible.
  4. Keyword/Dates filter
    To search by keyword/date-time range, navigate to the Keywords/Dates tab and enter values in the desired fields.
  5. Concepts filter
    To search by concepts, navigate to the Concepts tab and enter a keyword into the first input field. Optionally, specify a similarity threshold in the second input field. Acceptable values range from 0 to 1. A list of concepts will display below the input field. Select one to see similar concepts in a list below the threshold input.
  6. Lucene Query
    Lucene queries can be run directly from the client.
  7. Execute Search
    Once query parameters have been specified, click search to see a basic table of the resulting dataset.
  8. Reset Parameters
    Click this button to reset search parameters.

*Sources *(2)** are OR’d together and Facets (3) are AND’d together in their own sub blocks before being combined inside a parent ‘must’ or ‘must_not’ block of the query with other parameters.*


Visualizations

Screen_Shot_2019-11-26_at_11.44.44_AM

  1. Visualize
    The Visualize tab provides numerous ways to view your data.
  2. Options
    To load a visualization, select it from the list.

Table

To see a table structure of all fields in the query results, use Table. This displays all fields of each record. To see more in-depth detail about a record, select it from the table.

Screen_Shot_2019-11-26_at_11.47.15_AM


Scatterplot

To see a structured plot of the query results, use Scatterplot. This can display numerical and categorical data in an interactive plot. Drag the plot to rotate it, and scroll/pinch to zoom. If you are on a non-touchscreen enabled device use the mouse scroll wheel or the keyboard shortcut for zooming.

Screen_Shot_2019-11-26_at_11.49.18_AM

  1. Data Points
    The red orbs are the data points in space.
  2. Projection
    The blue squares exist on all six walls of the plot and represent a projection of all data points on a given 2D plane. Each point can be selected to see the x, y, and z values of that point.
  3. Dialog Box
    Click on the dialog box to close it.
  4. Zoom
    Select the zoom button to focus on that point and see more of its facets.
  5. Numerical Range Filter
    Numerical fields can have a range applied to them by dragging the filter handles.
  6. Recenter
    To re-center the plot, select Recenter.
  7. Reset Filters
    To remove categorical filters, select Reset.
  8. Rotate
    To toggle the graph rotation, select Rotate.
  9. Axis Feature
    The x, y, and z fields can be changed by changing the value from their respective drop downs.
  10. Categorical Filter
    Categorical data can be filtered by selecting Filter and toggling the desired values.
  11. Submit
    _Click Submit to apply changes.

Parallel Coordinate Plot

To see trends in features across a result set, use a Parallel Coordinate Plot. Each line from end to end represents a record. Drag to rotate, and move along the length of the plot by holding shift and dragging or by dragging with two fingers.

Screen_Shot_2019-11-26_at_11.50.28_AM

  1. Field
    Each blue plane represents a field.
  2. Field minimum
  3. Field maximum
  4. Recenter
    To re-center the plot, click the Recenter button.
  5. Add Field
    To add an additional field, click the [+] button.
  6. Field Select
    Each field can be changed by its drop down.
  7. Display Order
    A Field can be moved up or down in its display order.
  8. Remove Field

Histogram

To see how data is distributed across values, use a Histogram.

Screen_Shot_2020-05-06_at_10.45.50_AM

  1. Add Histogram
    To add an additional field histogram, select the [+] button. At this time, only numerical fields will be automatically added. The collection of charts can be scrolled across by dragging.
  2. Field Select
    To change a chart’s field, select it from the drop-down.
  3. Remove
    To remove the chart, select the [X] button.
  4. Filter
    Each chart can be filtered by dragging the yellow handles. These filters will be applied across all open charts. The new maximum and minimum will be displayed below.

Box Plot

To see the statistical distribution across values, use a Box Plot.

Screen_Shot_2020-05-06_at_10.32.13_AM

  1. Add Box Plot
    To add an additional field box plot, select the [+] button. At this time, only numerical fields will be automatically added. The collection of charts can be scrolled across by dragging.
  2. Field Select
    To change a chart’s field, select it from the drop-down.
  3. Remove
    To remove the chart, select the red [X] button.

2D Scatterplot

To see the values distributed and on an XY plane, use a 2D Scatterplot.

Screen_Shot_2020-05-06_at_10.31.52_AM

  1. Add Scatterplot
    To add an additional scatterplot, select the blue [+] button. The collection of charts can be scrolled across by dragging.
  2. Field Select
    To change a chart’s field, select it from the drop-down.
  3. Remove
    To remove the chart, select the red [X] button.

Pearson Correlation

To see how each field relates to all the other fields, use a Correlation Matrix. Only numerical fields are displayed. The matrix can be rotated by dragging. Each bar is scaled on its y axis according to how its two contributing fields relate on a scale of –1 (red) to 1 (blue). Select a bar to see more information about it. Select it again to hide the details.

Screen_Shot_2020-03-18_at_2.15.22_PM

Exploratory Data Analysis

The Exploratory Data Analysis (EDA) tab is where data can be transformed and shaped before it is used to train a model. Once a query is run, its results can be shaped and filled using Exploratory Data Analysis, or EDA.

Screen_Shot_2020-05-06_at_10.33.21_AM

  1. Create Tree
    To begin EDA on the most recent search, select the floppy disk icon.
  2. Existing Trees
    Saved searches for EDA will appear in the scroll view.
  3. Tabletop
    Once a saved search has been selected, it will show up on the tabletop.
  4. Operations
    EDA operations that have been added to the saved search will show up as white nodes.
  5. Menu
    Clicking a node will bring up a menu available options for that node.
  6. Statistics
    To see overview statistics on a selected node, choose it from the dropdown.

Save VDS

When saving a Virtual Dataset, complete the creation process by entering a name, description, and selecting features to include, as well as if the data should be persisted.

Screen_Shot_2020-05-06_at_10.33.37_AM


New Op

The Lucd JedAI Client provides flexible options for data transformation without having to leave the GUI.

Screen_Shot_2020-05-06_at_10.34.03_AM

  1. Operation Type
    When adding an operation to a saved search during EDA, choose between standard operations like Fill/Filter/Replace, NLP operations, Custom defined operations, and image specific operations.
  2. Operation Selection
    Select the desired operation from the dropdown.
  3. Operation Parameters
    Parameters must be specified before saving an operation.

Preparing Text Data for Model Training

Lucd provides special operations for easily preparing text data for model training, saving a model developer valuable time in manually coding routines for text transformation.

Picture1

  • After creating an EDA tree based on a query of a text data source, a developer can add a new operation to the tree based on NLP operations as shown above.
  • NLP operations (e.g., stopword removal, whitespace removal, lemmatization) can be applied in any sequence.
  • It’s important to select the correct facet as the “text attribute.”
  • One can also elect to apply tokenization based on a document level (i.e., create one sequence of tokens for the entire facet value per record), or sentence level (i.e., create a token sequence per sentence in the facet for a record).

Saving VDS with Processed Text

When a developer wants to create a new virtual dataset including the transformed text data, they must choose the “processed_text” facet as the “sole” feature of the virtual dataset as shown below.

Picture2

Currently, Lucd does not support text model training incorporating multiple feature columns, only the “processed_text” facet must be selected.

Multi-column text model training will be supported in a future release.


Applying Custom Operations

Once custom operations have been defined and uploaded using the Lucd Python Client library, they are available in the GUI for usage in data transformation.

Picture3

As shown above, clicking on a custom operation will show further details, specifically the features the operation uses as well as the actual source code defining the op. As mentioned in the documentation for defining custom operations via the Lucd Python Client, one must select how to apply the operation based one of the following three Dask dataframe approaches:


Image Workflows

The Lucd framework supports image-based workflows. Binary image data contained within fields of a record will automatically be rendered in the 3d client. The images below are from the Stanford Dogs dataset.

Russ-Screenshot2


Applying Image Operations

To apply image operations, select the Image Ops tab within the New Op menu in an EDA tree.

Russ-Screenshot3

  • It’s important to select an image facet as the “Feature.”
  • The currently provided operations are as follows:
    Vertical and horizontal flips
    Grayscale Contrast normalization
    Normalize (0 mean and unit variance)
    Resize width & height
    Color inversion
    Crop borders
    Gaussian blur
    Rotate
    Min-max scaling
    To array (converts binary data to Numpy Array)
    Reshape dimensions

* Operations can be applied to percentages of a dataset instead of the entirety, and can also be used to augment existing data instead of operating in-place.


Modeling

The Lucd JedAI Client provides an intuitive and practical dashboard for data science/machine learning modeling.

Screen_Shot_2019-11-26_at_11.57.23_AM

  1. View Select
    On the Modeling screen, review available model definitions by selecting that option from the dropdown.
  2. Model Upload
    Button to upload new python model files (Tensorflow, Pytorch, xgboost, etc.)
  3. Refresh
    Select to retrieve model statuses from the backend & refresh the GUI.
  4. Existing Model Definitions
    Model definitions are displayed in the center.
  5. Status Indicator Lights
    Each model will indicate if it has models in training, training complete, or errors.
  6. Filters
    Display only models matching filters selected (TensorFlow, XGBoost, Classification, etc.)
  7. Group/Sort
    Drop down boxes for defining defition grouping & sorting.
  8. Distribution
    Model library and type distribution can be seen at the bottom.
  9. Model Details
    The currently selected model’s details can be seen on the right.
  10. Train
    To begin training the selected model, click “START TRAINING”.
  11. Performance
    See all training runs for a selected model by viewing the performance analysis.

Start Training

Training runs require the selection of a VDS and specification of parameters/assets.

Screen_Shot_2019-11-26_at_12.00.27_PM

  1. Asset
    To set up a training run, begin by selecting an Asset to include, if any.
  2. Virtual Dataset
    _Choose an existing VDS to train against.
  3. Parameters
    _Set the parameters for the training run.

Trained Models

Trained models can also be inspected within the dashboard.

Screen_Shot_2019-11-26_at_11.58.36_AM

  1. View Select
    To review a training run, first select “Trained Models” from the dropdown.
  2. Training Runs
    Select a training run and view its details. Current status of the run is designated by the colored corner of the list item.
  3. Confusion Matrix
    When available, view the Confusion Matrix for a run.
  4. Training Artifact Files
    _Download run artifacts
  5. Governance
    _Submit for governance approval
  6. Tensorboard DEPRECATED _Open in Tensorboard DEPRECATED
  7. Stop/Restart _Pause a model & restart it. Can also be used to begin a new run after a training run has completed.

Assets

The Assets page provides a singular look at all existing user “Assets” (e.g. VDS, Embeddings).

Screen_Shot_2019-11-26_at_12.02.10_PM

  1. View Select
    To see available Virtual Datasets, select it from the dropdown.
  2. Usage
    Counters and indicator lights displaying training run usage of an Asset.
  3. Pre-Op Heatmap
    Heatmap before running the selected EDA operations.
  4. Post-Op Heatmap
    Heatmap after running the selected EDA operations.
  5. Operations
    EDA operations applied to the Asset.
  6. EDA Tree View
    The VDS can be viewed in the context of its parent saved search by clicking 3D.
  7. Embedding
    _Create an embedding from the given VDS (discussed below).

Embeddings

The Lucd JedAI Client provides the ability to easily generate word embedding Assets for use in modeling.

Screen_Shot_2019-11-26_at_12.02.31_PM

  1. View Select
    To see available Embeddings, select it from the dropdown.
  2. Download
    Embeddings can be downloaded locally.
  3. PCA/TSNE
    View PCA/TSNE charts for for the selected embedding.
  4. Restart
    _Restart the embedding training here.

PCA/TSNE

Embeddings can be viewed using PCA/TSNE techniques for visualization.

Screen_Shot_2019-11-26_at_2.14.54_PM

  1. Style
    When viewing an embedding’s PCA/TSNE, click to see terms instead of points.
  2. Region Select
    Toggle to select a cluster of points using a bounding box.
  3. Multiple Select
    Use to add multiple bounding boxes.
  4. Word Search
    Search for a term. All matching terms will be highlighted, as well as shown in a list to the right until there is only one matching term.
  5. Filter
    Narrow the number of occurrences for a term to a range using.
  6. Technique Select
    _Toggle between PCA and TSNE.

Governance

The Governance view illustrates what data, data transformations, and assets (e.g., VDS, word embeddings) were used as inputs to training a given model. The value is that a user can quickly gain insights as to what data caused a model to yield certain performance results. The following figure shows an overview of the Governance view.

image1

The main panel in the middle illustrates, for a selected model, what data and assets were used for training the model. The top half of the view shows information about the data which was used to create a virtual dataset for training the model.

  1. Submitted Models
    The main panel on the left-hand side displays what models are available for viewing in the Governance view.
    The dropdown menus at the top allow the user to select from models based on their governance approval status (i.e., “pending approval,” “approved,” or “rejected”) as well as sort the models based on various criteria.
  2. Query
    This represents the query that was used to generate the initial dataset, whether for the purposes of model training data or word embedding generation. Clicking the query will show query details at the bottom of the view.
  3. Transformation
    This represents the transformations performed on the initial dataset to establish either a virtual dataset (as in the case with training a model) or a word embedding. These are the same transformations that were applied in the exploratory data analysis section of the tool. Clicking the transformation box will show details at the bottom of the view, such as shown in the figure below. image2
  4. Heatmap
    Visualization of selected attributes (or facets) of queried or transformed data for virtual datasets or word embeddings. Dropdown selectors underneath each visual enable a user to customize the visualization (“feature 1” selects data for the y-axis and “feature 2” selects data for the x-axis). The “metric” selector chooses what statistic of the selected data to use for defining the heatmaps. In the current release, only total “counts” are available. Clicking “fetch metrics” will populate the visualization. Comparing and visualizing data heatmaps (or distributions) before and after a set of transformations is helpful for governance purposes since it can reveal, for example, if data biases exist and what transformation operations might have introduced them.
  5. Embedding Details
    Illustrates the name of the asset result from word embedding generation. The bottom half of the view shows details about word embedding data for models which require embeddings for training.
  6. Trained Model
    This represents the trained model after all previous operational flows are complete.
  7. Metadata & Performance Statistics
    Information like start / end time, model type, assets used, and training parameters are displayed here.
  8. Submit Report
    Clicking the green button enables the user to submit a governance report, either approving or rejecting the model for usage.

Explainability Analysis

Lucd provides the ability to visualize “explanations” of a model’s output given specific inputs. Generally, explanations take the form of computed attribute weights, indicating the significance that an attribute gave to a model’s decision. This supports the ability to either debug a model or scrutinize the data fed to the model. This particular feature is supported by integration of the Lime framework. The figure below illustrates the explainability panel in the governance view.

image3

This panel is displayed when the user clicks the model element (6) in the Governance view. Currently, model explainability only works for text classification models. Support for tabular data and image data will be available soon.

  1. Input Text
    For analyzing a text classification model, the user enters sample text into the input text box and clicks the “explain text” button underneath the box. The time required to run explanation analysis is dependent on the amount of text entered and the complexity of the model.
  2. Probability Output
    This is a simple bar chart showing the probabilities of a given model’s outputs. In the figure, the classes are “negative” and “positive”; however, more classes may be displayed depending on the model. The class labels are obtained from the labels returned by a user’s model, as explained in documentation for the Lucd modeling framework.
  3. Features Output
    This illustrates the weights of the most significant features determined to affect the model’s output. For instance, in referring to the figure, the tag "<UNKNOWN>" is highly indicative of a piece of text (in this case, a movie review) having a “negative” sentiment. The user is encouraged to try multiple examples to understand the explainability feature.
  4. Output Text
    The text on the right shows the major features (words) highlighted in the text. Note that the text shown is that processed by the transformation operations for the embedding creation (which the user specified when using NLP operations before creating the embedding set). This is so that the user understands what is done to the text before inputting it to a model, which might offer extra insight into the model’s decision logic.

Epidemiology

Lucd provides the ability to visualize epidemics and supply chain breakdowns on a map. Trained models can predict future infection rates and supply shortages down to the census tract level.

Screen_Shot_2020-05-06_at_10.34.36_AM

  1. Train
    To start training an Epidemic model, click “Train Model…”

  2. Trained Models
    A list of previously trained models can be found here.


Train Epidemiology Model

This view appears after selecting “Train Model” in the previous view.

Screen_Shot_2020-05-06_at_10.34.44_AM

  1. Dataset
    To finalize training start, a dataset to train against must be selected.
  2. Parameters
    Enter any custom parameters for training.
  3. Confirm

3D Map View

Selecting a trained Epidemiology model will display a 3D map view.

Screen_Shot_2020-05-06_at_10.35.55_AM

  1. Map View
    Census tracts, counties, and states can all be displayed.
  2. Details
    Information regarding a selected region on the map.
  3. Disease Statistic
    Selecting the disease statistic changes the value used when polygons are extruded.
  4. Civilian Features
    Selecting civilian features displays bar chart value on each census tract.
  5. Search
    The map can be searched to snap to a specific location.
  6. Style
    The map style can be changed via drop-down menu.
  7. Extent
    Configuration for the extent of the map.
  8. Terrain
    Toggle switch for map terrain.
  9. Save Settings
    The current zoom level and location can be saved to the model object to reload later.
  10. Polygon Extrude
    Polygon extruding can be toggled to make the underlying map easier to read.

Comments