Skip to content
Lucd Python Client | User Guide | 6.2.7

User Guide

The Lucd Python Client provides capabilities for data scientists and AI model developers to prototype AI model solutions before uploading them to the Lucd Unity Client for extended training and performance analysis.

The Lucd Python Client provides the following features:

  • functions for accessing raw data and other assets from in Lucd for general analysis and custom visualization;

  • functions for uploading user-defined feature transformation operations to Lucd, which can then be applied in the Lucd Unity Client to create a virtual dataset;

  • functions for accessing ingesting data into TensorFlow and PyTorch models, which can be used for prototyping models.

Installation

The lucd-python-client python package should be installed using the pip command with a python wheel file.

Instructions are as follows:

  1. Download or clone the lucd-python-client package (unzip if needed) from here: Lucd Python Client Project
  2. and open a command prompt and change to the package directory.
  3. At a command prompt, type python setup.py bdist_wheel. The wheel file will appear in the dist directory.
  4. Switch to the dist directory and type pip install <wheel filename>.

Requirements

  • Python 3.6.5 is required for custom feature operations to work appropriately.

APIs

The Lucd Python Client uses python and REST APIs. Code examples using both API types are available in the examples directory of the project.

Examples

Example code illustrating how to perform tasks such as authenticating to Lucd, performing queries, obtaining virtual datasets and training models resides in the examples directory of the project.

Below are specific examples of how to access Lucd data using the client as well as how to create and upload a custom feature transformation operation.

Accessing Data

from lucd import LucdClient, log

from eda.int import asset
from eda.int import vds
from eda.int import uds

from eda.lib import lucd_uds


if __name__ == "__main__":
    username = 'xxx'
    password = 'xxx'
    domain = 'xxx'

    client = LucdClient(username=username, password=password, domain=domain)

    log.info(f"Connected to Lucd platform.")

    # queries follow Elasticsearch API.. 
    # See: https://www.elastic.co/guide/en/elasticsearch/reference/6.8/query-dsl.html
    query = \
        {
            "query": {
                "bool": {
                    "must": [
                        {
                            "bool": {
                                "should": [
                                    {
                                        "match_phrase": {
                                            "source": "iris"
                                        }
                                    }
                                ]
                            }
                        },
                        {
                            "bool": {
                                "should": []
                            }
                        }
                    ],
                    "filter": [
                        {
                            "bool": {
                                "filter": [
                                ]
                            }
                        }
                    ]
                }
            },
            "size": 2000,
            "dataset": "iris"
        }

    results, http = uds.search(query)
    print(f"Search Results ({http}):\n{results}\n")

    hits, stats = client.search_to_dataframe(results)
    print(f"Search Results:\n{hits.head()}\n")
    print(f"Search Statistics:\n{stats}\n")

    all_models, http = client.rest('lucd/model/read', {"uid": username})
    print(f"All Models ({http}):\n{all_models}\n")

    all_vds, http = vds.read({"uid": username})
    print(f"All Virtual Datasets ({http}):\n{all_vds}\n")

    all_assets, http = asset.read({"uid": username})
    print(f"All Asset Embeddings ({http}):\n{all_assets}\n")

    #
    # Lucd Library Calls to fetch assets and VDSes
    #
    # When limiting asset size, you could encounter issues with missing index entries.

    embeddings_index, embedding_matrix, embedding_size, word_index_mapping, word_index_mapping_padded = \
        lucd_uds.get_asset("xxx", limit=100)
    print(embeddings_index, embedding_matrix, embedding_size, word_index_mapping, word_index_mapping_padded)

    # When limiting data size, you will encounter delays bring back large amounts of data
    # over the network, and possibly run the client out of memory.

    all_vds, http = vds.read({"uid": None})
    print(f"All Virtual Datasets ({http}):\n{all_vds}\n")

    df = lucd_uds.get_dataframe("xxx", limit=100)
    print(f"Dataframe Data\n{df.head(20)}")

    client.close()

Custom Feature Transformation

from eda.int import custom_operation
import lucd

def create_greater_than_mean_column(df):
    column_mean = df["flower.petal_length"].mean()
    df["flower.petal_length_Mean"] = df["flower.petal_length"] > column_mean
    return df


if __name__ == "__main__":
    client = lucd.LucdClient(domain="xxx",
                             username="xxx",
                             password="xxx",
                             login_domain="xxx"
                             )

    data = {
            "operation_name": "create_greater_than_mean_column_JBstyle",
            "author_name": "J. Black",
            "author_email": "j.black@lucd.ai",
            "operation_description": "Sample operation",
            "operation_purpose": "add a new column",
            "operation_features": ["flower.petal_length"],
            "operation_function": create_greater_than_mean_column
    }

    response_json, rv = custom_operation.create(data)

    client.close()

Comments