- NVIDIA (Santa Clara, CA)
- …be doing: + Build internal perf/power profiling and analysis tools and platform for AI workloads at cluster scale + Build debugging tools for common ... frameworks like Pytorch, TensorFlow and etc + Knowledge of AI cluster job scheduling, storage management and...GPU cluster scale continuous profiling & analysis tools /platforms + Solid experience in large AI … more
- NVIDIA (Santa Clara, CA)
- …performance for a variety of AI /HPC workloads. + Working knowledge of cluster configuration managements tools such as Ansible, Puppet, Salt. + Experience ... parallel computing. More recently, GPU deep learning ignited modern AI - the next era of computing. NVIDIA is...join us today! As a member of the GPU AI /HPC Infrastructure team, you will provide leadership in the… more
- NVIDIA (Santa Clara, CA)
- …5K GPUs cluster . + Deep understanding of GPU computing and AI infrastructure. + Passion for solving complex technical challenges and optimizing system ... NVIDIA is the leader in AI , machine learning and datacenter acceleration. NVIDIA is...Solid experience with GPU clusters, and working knowledge of cluster configuration management tools such as BCM… more
- NVIDIA (Santa Clara, CA)
- We are now seeking a Senior AI Infrastructure Engineer! NVIDIA's Compute Architecture Group is growing our team of AI focused Infrastructure Engineers who ... What you'll be doing: + Administer an NVIDIA Internal AI cluster composed of Linux systems ranging...updates, and maintenance of system availability using modern DevOps tools (Ansible, Gitlab, etc.) + Plan and maintain new… more
- NVIDIA (Santa Clara, CA)
- NVIDIA is leading the way in the AI revolution, revolutionizing industries with our brand-new GPU technology. Our GPUs drive groundbreaking innovations, from ... in computer vision, speech recognition, and more. As "the AI computing company," we constantly push the limits of...leaders to join us on an exciting journey as Senior SRE Engineering Leader. Lead our globally distributed clusters,… more
- Bloomberg (New York, NY)
- …a critical step on the MDLC to realize the business value for Bloomberg AI applications and the advent of large language models (LLMs) presents new opportunities for ... KServe which is a production ready inference solution for both generative and predictive AI applications. We are poised for enormous user growth this year and have… more
- NVIDIA (Santa Clara, CA)
- …, HW, and SW engineering and research teams to define a vision and roadmap for AI /HPC cluster observability. + Architect and lead teams to d evelop, test, and ... NVIDIA's Hardware Infrastructure organization is seeking a Senior or Princip al Data and Observability Architect....We serve and collaborate directly with NVIDIA's rapidly growing AI , HW, and SW engineering and research teams across… more
- NVIDIA (Santa Clara, CA)
- …experienced software engineers with kubernetes experience to help scale up its AI Infrastructure. We expect you to have significant software engineering experience ... with kubernetes including cluster operations, operator development, node health monitoring and working...deploy leading infrastructure solutions for a broad range of AI -based applications. If you're creative, passionate about kubernetes and… more
- NVIDIA (Santa Clara, CA)
- NVIDIA is looking for a dedicated and motivated senior build and continuous integration (CI/CD) engineer for its GenAI Frameworks (NeMo, ... working on Large Language Models (LLM), Multimodal (MM), and Speech AI . NeMo provides end-to-end model training, including data curation, alignment, customization,… more
- NVIDIA (Santa Clara, CA)
- …both on-premises and cloud based. + 12+ years of proven experience with cluster management and related tools , including Docker Containers, Slurm, Kubernetes, and ... part of a team that's revolutionizing the field of AI with data center scale solutions? We are looking...are the voice of experience, using Kubernetes, SaaS, infrastructure-as-code tools , network debugging, and problem solving skills to help… more