PRODUCT

Organizations leading the future with endless challenges and changes, organizations delivering fast and convenient

AI/HPC Solution

 

AI Pub: An MLOps tool with web user interface

AI Pub is built on Coaster, a container platform, designed to
support efficient management of GPU infrastructure resources for your AI development and training.

INFRASTRUCTURE

Coaster, AI Pub Dev, and AI Pub Ops

enable you to create value throughout the MLOps lifecycle.
They facilitate an efficient process for AI development and operation, making it easier to achieve your goals.

  • AI Pub DEV
  • AI Pub Ops
  • Coaster

AI Pub Dev is a resource management tool designed for AI development and training.

It enables the allocation of limited AI infrastructure to multiple AI developers, facilitating their efficient use of the allotted GPU infrastructure resources for their specific tasks.
AI Pub Dev also allows the administrator to manage the GPU infrastructure resources according to various infrastructure patterns.

Training result - NAS

Discover AI Pub Dev’s functions

With Coaster at its core, AI Pub Dev offers fully-managed services for model training as well as resource and workload management.

Discover AI Pub Dev’s functions
Main Services Service Description
Create workload
  • Manages the user’s development environment via Docker images
  • Creates workspaces using development images
  • Facilitates integration with Jupyter Notebook and TensorBoard
Model training
  • Automatically allocates necessary resources for each AI training
  • Enables applications for GPU and CPU resources
Resource management
  • Restricts resource usage per user account
  • Withdraws idle resources
  • Manages workspace of each node
  • Configures MIG for each node
  • Monitors the entire infrastructure
Workload management
  • Manages suspension/resumption of schedulers
  • Oversees job scheduling and prioritization
User history management
  • Manages resource usage history for each user account
  • Downloads usage history

AI Pub Ops streamlines the operation of your AI services.

Utilizing Coaster’s GPU fragmentation function, which divides the GPU into 100 separate blocks,
AI Pub Ops ensures efficient allocation of GPU blocks to various AI services in accordance with their specific requirements.
Furthermore, it provides an intuitive and user-friendly web UI, making the creation and management of AI services accessible even to non-developers.

100분할

Discover AI Pub Ops’s functions

With Coaster at its core, AI Pub Ops provides a fully-managed service for service creation and service and resource management.

Discover AI Pub Ops’s functions
Main Services Service Descriptio
Service creation and update
  • Offers UI for creation, suspension, deletion, and distribution of services
  • Facilitates non-stop service updates via UI
  • Manages service versions and enables rollbacks
Service monitoring
  • Monitors operational status using service list and details
  • Provides service error alerts and enables troubleshooting through log analysis
Resource group management
  • Enables administrator to create resource groups and specifies user entitlements
  • Edits resource groups
Resource management
  • Enables allocation of GPU blocks for specific services
  • Monitors real-time operation rates of GPU blocks and servers
Usage history management
  • Manages resource usage history of each service
  • Downloads usage history

Maximize GPU resource efficiency with Coaster’s fragmentation

Coaster empowers you to divide the utilization and memory of a single GPU unit into 100 blocks, enabling efficient multi-container deployment.
By preventing resource interference between containers, Coaster enhances stability and facilitates concurrent operation of multiple processes./p>

100분할

Discover the operational management functions of Coaster, the advanced container platform

Discover the operational management functions of Coaster, the advanced container platform
Main Services Service Descriptio
GPU fragmentation Divides the utilization and memory of a single GPU unit into 100 blocks
GPU resource inquiry and allocation Inquires about and allocates computing resources across the entire cluster with extended Kubernetes commands
User entitlement management - individual and group Assigns policies to individuals and groups
Job scheduling and priority management Uses Kubernetes-based job scheduler to automatically initiate jobs, with added flexibility for manual re-prioritization by operators through GUI

See a demo of AI Pub Dev’s functions.

Visit TEN’s YouTube channel to see demonstrations showing AI Pub Dev’s functions.

 

Ready to find out more about AI Pub?

arrow_upward
close