* 영작 연습 및 정보 기록용으로 올린 포스팅입니다.
I bought GIGABYTE RTX 3080 gaming oc 10GB for deep learning and used it to train a model.
But the validation loss was nan but training loss was fine.
I tested the same script with 4 environments(OS : Windows 10 x64):
1. 3700x + RTX 3080 (CUDA 10.1)
2. 3700x only (no GPU)
3. Other laptop (i7 8750H + GTX 1050ti)
4. 3700x + RTX 3080 (CUDA 11.0 + cudnn 8.0.3)
The validation losses were fine except for the 1st environment.
So i think that there are some issues with RTX 3080 + CUDA 10.1 setting.
If you has some issues with RTX 3080, using Tensorflow nightly build and CUDA 11.0 can be a solution. And a contributor of Tensorflow said that tensorflow 2.4.0 will support CUDA 11.0.
Edit) 10/21/2020 - I tested (Tensorflow nightly-build + CUDA 11.1 + cudnn 8.0.4) combination and it worked.
'Coding > Machine Learning' 카테고리의 다른 글
Example : Multiprocessing with shared large numpy array in Jupyter, Windows 10 (0) | 2020.12.02 |
---|---|
An example of custom loss using model internals (0) | 2020.11.14 |
Implementation of Guided Grad-CAM with Tensorflow 2 (0) | 2020.11.11 |