Advice to Students New to Research

Here’s some advice for grad students getting into research. It’s mainly for Computer Science folks, but engineering students in general can find it helpful too!

📚 Learn to learn

If you were to ask me what distinguishes graduate studies from any prior education, I would say it is the development of the ability to learn how to learn (interestingly, this concept is called meta learning in machine learning).

In your earlier education, you are led step by step with a structured schedule specifying what to learn and when, with everyone following similar schedule and pace.

However, in graduate studies, you delve into research on topics that may have never been explored before, where no one knows the correct answer. You must independently search for resources and attempt to uncover an answer that might not even exist.

The value of graduate education lies in the process of seeking answers, not the answers themselves. Through this process, you learn how to think critically, search for resources, and solve problems.

🚴 Work-Life Balance: Avoiding Constant Busyness

Research can be challenging—most of your ideas might not work, and many of your submissions could end up with rejections. So, be ready for those ups and downs! Make sure to take breaks when you need them and keep a healthy work-life balance.

If you're always busy, you're likely not engaging in creative and innovative work.

⏱️ Read Papers Regularly

Papers are a great source of knowledge and inspiration. Make it a habit to read them regularly and think about their contributions and innovations.

🤷 How to Quickly Dive into a Research Field

📖 Understand a Paper

As you read through a paper, remember to keep these key points in mind.

Understand The Problem

The first thing to think about is the problem or task the paper aims to tackle, especially if you’re new to this area. A paper typically either presents a new approach to an existing problem or introduces a completely new problem.

When trying to grasp the problem, make sure to focus on the input and the output. For example, in an image classification task, the input is an image, and the output is the category that the image is predicted to belong to. So, the input is the image, and the output is its associated category.

Simply understanding this level is not sufficient! You need a more concrete grasp of the input and output. What do you really mean when you say the input is an "image"? How do computers represent an image? And what exactly is meant by the "category" of the image? How do computers represent that "category"?

Computers always work with (digitalized) numbers for input and output. It’s important to understand how these numbers are organized. In image classification, a solid understanding means recognizing that the input is an image represented as an M x N x 3 matrix, while the output is a vector that shows the probabilities of the input image belonging to each category.

The method

The next step is to grasp the proposed method. To truly "understand" it, you should have a clear mental picture of how the method processes the input step by step until it produces the output. This includes knowing the input and output for each step along the way.

If you're a junior engineer putting the methods from the papers into practice, that’s perfectly fine for now. However, if you’re looking to dive into research and develop your own methods, you’ll need to understand why these methods work—particularly why the older methods fell short—and what inspired the authors to create the new ones.

The motivation and intuition

Once you grasp how the method works, the next, deeper level is understanding why the method works: why the authors proposed it and what motivated them to develop it.

A good paper will make you feel both surprised and satisfied. When you first look at the paper, you may feel surprised by the new approach or idea presented, thinking, "Wow, this is a different way to tackle this issue!" However, as you read further and understand the reasoning and methodology behind the paper, you come to see that the ideas are logical and well-founded, leading you to agree that the approach is indeed effective in solving the problem at hand.

A simple Chinese phrase captures this feeling: 意料之外情理之中.

👨‍💻 Coding

Courses and Material

I strongly recommand The Missing Semester of Your CS Education from MIT.

Tools and setup

Here are setup and tools I personally use in my AI coding and experiments.

  • ssh for remote access to servers.

  • vscode for editing local and remote files.

  • tmux for terminal multiplex and run programs in background. Useful tmux cheat sheet.

Debug AI code

基础

编程语言分两种:编译语言脚本语言。 前者的源代码需要 编译 成二进制码才能执行, 典型的语言有 C++,CUDA等; 后者的源代码可以直接运行,典型的语言有 JavaScript,python,matlab 等。

脚本语言相对好安装、配置和debug;而编译型语言因为复杂的依赖环境有时十分难配置。

CUDA / torch 版本问题

很多 Python 的 package,比如 pytorch,上层接口是脚本语言 python 写的, 底层计算是 c++ 或者 cuda 写的, 这种 package 就是脚本语言和编译语言的混合。

一般安装 python package,比如 torch,torch 的开发者会 预编译好 二进制文件直接下载安装。 当你执行 pip install torch时,会自动下载预编译好的二进制文件(wheel文件)安装。 而开发者编译该 wheel 文件的环境,比如 c++编译器版本、cuda版本,可能与你本地机器上的不一致。

这种情况下,如果你在开发中只使用 python 接口,一般不会有什么问题。 但是如果你需要安装另外一些包 同时依赖 cuda 和 torch 的 python 接口,就可能会有问题。 因为 用来编译 torch 的cuda/c++版本和你本地机器上的 cuda/c++ 版本不一致, 因而导致第三方 package 编译失败(因为第三方 package 编译时同时要用到本地 cuda/c++ 以及 torch)。 一般你会遇到以下报错信息

RuntimeError: The detected CUDA version (XX.X) mismatches the version that was used to compile PyTorch (YY.Y)

The detected CUDA version (XX.X) 指的是你本地的 cuda, the version that was used to compile PyTorch (YY.Y) 是用于编译你安装的 torch 的 cuda。

一些计算密集型的第三方package通常会既依赖 torch 的 python 接口方便用户使用, 又依赖底层的cuda接口加速运算。 典型的这类包有 https://github.com/rahul-goel/fused-ssim, https://github.com/open-mmlab/mmcv。

这时候就 必须保证用于编译 torch 的 cuda 版本和你本地的 cuda 版本一致。 解决方案有三种:

  1. 去找和你本地cuda版本一致的wheel文件(推荐做法)
  2. 下载 torch 源代码,在本地编译 torch(不太推荐)
  3. 根据你下载的 torch wheel文件,在本地重装一个cuda(极不推荐)

这里只讲第一种方法,你可以到 https://download.pytorch.org/whl/torch/ 找到和你本地 python/cuda/操作系统 一致的 wheel 文件安装。 例如,你在 Linux 系统下使用 python 3.9,本地cuda是 12.1, 那么就可以下载 torch-2.1.1+cu121-cp310-cp310-linux_x86_64.whl

注意文件名中的 cu121, cp310linux,对应了 cuda、python版本以及操作系统。

混合型 python package 的一般安装建议

对于这种既依赖 cuda,又依赖 torch 的混合型 python packages, 建议不要使用 pip 直接安装,推荐下载源代码本地编译+安装。

因为依赖众多,很难保证本地 python/cuda/c++/torch 版本和 pip 直接下载的 wheel 文件一致。 下载源代码本地编译后安装是最稳妥的做法,但是缺点是编译可能耗时较长。

以 fused-ssim 为例,本地编译 + 安装的步骤为:

git clone https://github.com/rahul-goel/fused-ssim
cd fused-ssim
pip install .