Skip to content

Machine Learning Data Engineer

Lucd is relentlessly focused on revolutionizing how the AI development and production lifecycles are executed and governed in enterprise settings. This includes how technical experts traverse and collaborate on AI development workflows (e.g., perform data feature transformations, AI experimentation, AI governance and integration), as well as how business experts extract actionable insights, value, and interpretability from data analytics and AI model output. Highly performant and well-architected distributed ML data management is a core part of Lucd’s goals for success. Enterprise AI development is a rapidly evolving space, and hence our customers need a highly-capable and dynamic ML platform for exploiting large multi-modal datasets for experimentation and production purposes. As a ML Data Engineer, you will be responsible for the Lucd platform’s ML data processing capabilities.

This primarily includes the following responsibilities:

  • architecting and implementing large-scale data services for data transformation, visual analytics, and distributed AI experimentation and training;

  • managing data-level security and privacy-preserving data processing capabilities;

  • contributing to AI workflow governance solutions, especially regarding data quality and provenance;

  • contributing to AI model production management and monitoring solutions for various types of platforms (e.g., enterprise APIs, embedded devices).

The ideal candidate is expected to embody the following characteristics:

  • exemplary communication and documentation skills and comfortable pitching to team and C-suite audience;

  • extreme passion for understanding and anticipating users’ requirements and preferences;

  • boundless curiosity for identifying emerging experience trends and enthusiasm for using as a catalyst for change to make experiences more effective, relevant and successful;

  • a self-starter that has excellent time management skills, is comfortable dealing with ambiguity and can synthesize complex work-streams into a single narrative;

  • highly collaborative, innovative and a creative non-linear thinker, with the execution skills to make it all count;

  • ability to work within an exciting distributed startup environment.

Basic Qualifications

  • Solid understanding of machine learning, data engineering, and the overall AI development workflow.

  • Strong command of computer science fundamentals (object-oriented design, data structures, algorithms, etc.).

  • Solid understanding of scalable (enterprise) software architectures, including Kubernetes and application monitoring.

  • Strong Python and C-based language development skills.

  • Working knowledge of Linux systems.

  • 2+ years of experience with agile development, including knowledge of CI/CD toolsets (e.g., gitlab).

  • 2+ years of experience with distributed AI model training on large-scale datasets (e.g., using a distributed framework such as Horovod).

  • 2+ years of experience working with text and image data.

  • 2+ years of experience with using Dask development for large-scale feature engineering.

  • 2+ years of experience developing AI models and workflows with industry-leading frameworks such as TensorFlow, PyTorch, XGBoost, Scikit-Learn, and MXNet.

  • 2+ years of experience with creating tools for visual data analytics.

Preferred Qualifications

  • Experience with AI development and deployment using NVIDIA

  • Strong understanding of statistics and data science.

  • 2+ years of experience with processing, managing, and visualizing spatiotemporal data.

  • 2+ years of experience with Accumulo.

  • 2+ years of experience with AI model deployment and monitoring for enterprise applications.

  • 2+ years of experience with workflow management frameworks (e.g., Apache Airflow).

  • M.S. in Computer Science or related field of study, with concentration in AI or data science.