成功大學為了推動AI研究,引進了多台NVIDIA DGX-1超級電腦,並採用數位無限的AI-Stack平台進行管理。AI-Stack提供了自動化排程、彈性資源分配、精確計費等功能,有效提升了GPU資源的利用率。此外,AI-Stack的客製化能力,滿足了成大在AI資源管理方面的各種需求,加速AI研究的進程。

Artificial Intelligence (AI) is revolutionizing various industries, becoming a crucial force in transforming the world. This trend has driven many businesses to invest in AI-related applications and digital transformation. Many universities, considering the integration of academic and practical knowledge, have also launched AI programs to cultivate students’ problem-solving abilities and experiences in AI, preparing them for future careers.

Among the AI development progress in various colleges and universities, National Cheng Kung University (NCKU) has shown remarkable ambition. Professor Pau-Chu Chan, Director of the Computer and Network Center at NCKU, said that since President Huey-Jen Su took office in 2015, AI has been listed as a key initiative, with a focus on developing cross-disciplinary AI applications and research. Subsequent projects include AI biotechnology and medical innovation research funded by the Ministry of Science and Technology, collaboration with Tainan City Government to calculate dengue fever mosquito hotspots, cooperation with the industry to create smart epidemic prevention bracelets, and pioneering the application of AI in campus security operations, all demonstrating the university’s strong aspirations.

To build a solid foundation for AI research, NCKU has invested in multiple NVIDIA DGX-1 supercomputers. However, to maximize the value of these supercomputers and turn them into a controllable resource pool rather than standalone machines, a good management platform is needed to meet the operational requirements of multiple research projects. Through the recommendation of their long-term IT service partner, Stark Technology, the Computer and Network Center learned about Infinitix’s AI-Stack machine learning collaboration management platform. After evaluation, they found that its management mechanism was quite appropriate, and the original manufacturer had mature software development and technical support capabilities that could be customized according to NCKU’s needs. Thus, they decided to adopt AI-Stack as the management core of NCKU’s AI resource platform.

Director Chan stated that the AI resource platform operates 24 hours a day, open for faculty and students from various colleges and departments to apply for GPU resources. It is impossible to rely on manual scheduling and pricing. With the help of AI-Stack, NCKU can use its automated scheduling function to systematically control every GPU resource and utilize the unique wallet top-up function to accurately calculate the derived costs of each usage process, implementing the user-pay principle.

Customized Assistance in Compliance with University Application Processes

NCKU has high expectations for the AI resource platform, hoping to demonstrate maximum application flexibility and value. Therefore, during the implementation of AI-Stack, they raised many adjustment requirements, which were actively and appropriately addressed by Infinitix.

Firstly, to ensure users can continuously enjoy the best training environment, Infinitix provided NCKU’s Computer and Network Center with self-upgrade operation guidelines, allowing the AI resource platform to update to the latest version of NGC (NVIDIA GPU Cloud) AI Framework and CUDA Driver, maintaining optimal conditions at all times.

Secondly, the university places great importance on the compatibility between AI model training and AI training data, as well as the smoothness of data flow. To this end, Infinitix demonstrated its technical capabilities through a unique design that binds personal NFS data spaces, allowing users to mount their exclusive storage space each time they create an AI training environment. This eliminates the need for additional downloading or moving of required files, improving data usage and transfer efficiency while ensuring effective isolation of data files between different projects.

Moreover, DGX-1 systems previously purchased by other university units were left unmanaged, so the university decided to have the Computer and Network Center centrally control them. Thanks to AI-Stack’s powerful horizontal management capabilities, these additional GPU servers could be immediately added to the resource pool and made available for users to apply for use independently. Conversely, when the Computer and Network Center lends DGX-1 to NCKU Hospital, the platform can quickly remove these servers from the cluster. This excellent flexibility in expansion and contraction has greatly benefited the center’s staff.

Furthermore, Infinitix fully complied with the university’s application mechanism and information regulations, providing comprehensive customization assistance for the overall website visual design, integration of identity authentication portal, project form design, application order planning, approval and rejection review process system, GPU resource trial mode, and other matters.

Self-Service Mechanism Facilitates Faster AI Training for Faculty and Students

Director Chan summarized that although AI-Stack has only been in operation for a few months, it has already brought many substantial benefits to NCKU. The most obvious is the convenience of management and the reduction of management burden for the Computer and Network Center staff. This is mainly due to AI-Stack’s comprehensive web interface, which allows for self-service application and utilization of AI resource services, coupled with the integration of campus-wide faculty and student identity authentication mechanisms. This enables the direct transfer of resource usage rights to users, who can quickly generate development environments based on form contents and swiftly initiate routine AI training operations. As a result, the center’s administrators do not need to constantly monitor the system to help users set up training environments; they only need to periodically review usage reports to clearly grasp the GPU resource utilization situation, making GPU management work twice as efficient with half the effort.

Moreover, benefiting from the wallet top-up function, not only has precise cost calculation been realized, but it also facilitates effective control for faculty and students to manage their research projects, special topics, or various program budgets. This allows for more efficient and fair use of GPU computing resources across the university, which can be considered an added value of AI-Stack.

On the other hand, most research units in colleges and universities have applied for GPU resources from the National Center for High-performance Computing’s TWCC. Since Infinitix is a founding member of the TWGC prototype system development team for TWCC, users can quickly become familiar with AI-Stack, whether it’s the intuitive design of the operation interface or the user-friendliness of the resource deployment process. This helps to greatly accelerate the progress of AI resource introduction plans and new system launches.

More importantly, throughout this project – from requirement planning and environment setup to the official launch for users to activate AI-Stack – Infinitix has fully demonstrated its enthusiasm and professionalism in the AI field. They have carefully interacted with the university’s staff and gained in-depth familiarity with NCKU’s AI application scenarios and environmental details, ensuring the smooth progress of the new system launch plan. Looking ahead, NCKU will continue to make good use of this platform to jointly enhance the AI research capabilities of the university, faculty, and students. They hope to catalyze more cross-disciplinary intelligent application results, making the greatest contribution to Taiwan’s industrial transformation and upgrading, the practical implementation of smart healthcare, and the development of various public services.