Blog | 4 of 25 | PyTorch

December 12, 2023

From PyTorch Conference 2023: From Dinosaurs to Seismic Imaging with Intel

December 05, 2023

Snowflake Joins the PyTorch Foundation as a General Member

November 30, 2023

Accelerating Generative AI with PyTorch II: GPT, Fast

This post is the second part of a multi-series blog focused on how to accelerate generative AI models with pure, native PyTorch. We are excited to share a breadth of newly released PyTorch performance features alongside practical examples to see how far we can push PyTorch native performance. In part one, we showed how to accelerate Segment Anything over 8x using only pure, native PyTorch. In this blog we’ll focus on LLM optimization.

November 29, 2023

PyTorch 2.1 Contains New Performance Features for AI Developers

We are excited to see the release of PyTorch 2.1. In this blog, we discuss the five features for which Intel made significant contributions to PyTorch 2.1:

November 16, 2023

🎉 PyTorch Docathon H2 2023 Wrap-up 🎉

We are thrilled to announce the successful completion of the Fall 2023 PyTorch Docathon! The event was a resounding success, and we want to extend our heartfelt gratitude to all the participants who made it possible. Dedication, expertise, and tireless efforts of our open-source contributors have once again helped us to improve PyTorch documentation.

November 16, 2023

Accelerating Generative AI with PyTorch: Segment Anything, Fast

This post is the first part of a multi-series blog focused on how to accelerate generative AI models with pure, native PyTorch. We are excited to share a breadth of newly released PyTorch performance features alongside practical examples of how these features can be combined to see how far we can push PyTorch native performance.

November 07, 2023

PyTorch compile to speed up inference on Llama 2

In this blog, we discuss how to improve the inference latencies of the Llama 2 family of models using PyTorch native optimizations such as native fast kernels, compile transformations from torch compile, and tensor parallel for distributed inference. Our approach results in 29ms/token latency for single user requests on the 70B LLaMa model (as measured on 8 A100 GPUs). We are excited to share our findings with the community and make our code available here.

Understanding GPU Memory 1: Visualizing All Allocations over Time

From PyTorch Conference 2023: From Dinosaurs to Seismic Imaging with Intel

Snowflake Joins the PyTorch Foundation as a General Member

Accelerating Generative AI with PyTorch II: GPT, Fast

PyTorch 2.1 Contains New Performance Features for AI Developers

🎉 PyTorch Docathon H2 2023 Wrap-up 🎉

Accelerating Generative AI with PyTorch: Segment Anything, Fast

PyTorch compile to speed up inference on Llama 2

Install PyTorch

Quick Start With
Cloud Partners

Docs

Tutorials

Resources

Install PyTorch

Quick Start WithCloud Partners

Docs

Tutorials

Resources

Quick Start With
Cloud Partners