前言
tensorflow-gpu、cuda、cudnn的版本一定要对应!!!
详见:在 Windows 环境中从源代码构建 | TensorFlow (google.cn)
1.配置环境
我的配置环境:windows11(家庭版,x64) + tensorflow-gpu2.6.0 + cudatoolkit11.2.0 + cudnn8.1.0.77 + Anaconda3(2024.06_1) + pycharm 2023.3.2 + RTX4050显卡
2.安装步骤
1.安装顺序
安装pycharm安装Anaconda3管理员方式打开Anaconda promt,通过conda或pip命令安装接下来步骤中的安装包通过conda命令安装cudatoolkit通过conda命令安装cudnn通过pip命令安装tensorflow-gpu
2.安装参考链接
按照上述顺序安装,即可完成环境配置。我参考的博文如下:
Anaconda3安装:1.Windows下的Anaconda详细安装教程_windows安装anaconda-CSDN博客
2. 还是搞不懂Anaconda是什么?读这一篇文章就够了-CSDN博客 (辅助参考)
步骤3~6安装:十分钟安装Tensorflow-gpu2.6.0+本机CUDA12 以及numpy+matplotlib各包版本协调问题_tensorflow cuda12-CSDN博客
注意点:
1.安装Anaconda3第一个参考链接中的“conda默认虚拟环境路径修改”,我的是使用方法1生效(方法2也可以):
2.配置tensorflow-gpu、cuda、cudnn时,按照参考博文步骤,最终 输入conda list命令 所显示的安装包版本会与博主所展示的些许差别,可以先不用着急改,我安装完后 list 如下:
3.测试验证
1.TensorFlow-gpu测试
测试代码如下:
import tensorflow as tf
"""
tensorflow-gpu-2.6.0 test
"""
print(tf.__version__)
print(tf.test.gpu_device_name())
print(tf.config.experimental.set_visible_devices)
print('GPU:', tf.config.list_physical_devices('GPU'))
print('CPU:', tf.config.list_physical_devices(device_type='CPU'))
print(tf.config.list_physical_devices('GPU'))
print(tf.test.is_gpu_available())
# 输出可用的GPU数量
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
# 查询GPU设备
得到报错结果如下:显示“Could not load dynamic library ‘cudart64_110.dll‘; dlerror: cudart64_110.dll”等不能加载动态库的错误。
在网上搜了一大堆基本都是缺什么dll文件,就寻找下载对应的的dll文件,然后放在C:WindowsSystem
路径下。这是治标不治本的方法。仔细一想,我的tensorflow-gpu、cuda、cudnn的版本都是按照tensorflow官网建议来的,应该不会错。于是,先查看了我的conda虚拟环境下的是否有这些缺失的dll文件,我创建的路径是D:ProgramDataDataAnaconda_envsenvs f-gpu-2.6Libraryin:
报错不能加载的dll文件都在, 怀疑是系统找不到该路径,于是打开系统环境,找到path,果然,没有添加。添加后,再次运行测试代码,结果如下:
得到上述结果,测试通过,则说明TensorFlow-gpu环境配置初步成功。Congratulations!!!
2. 调用cuda加速测试
测试代码如下:
import cv2
import tensorflow as tf
from mtcnn import MTCNN
config = tf.compat.v1.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.6
config.gpu_options.allow_growth = True
sess = tf.compat.v1.Session(config=config)
detector = MTCNN()
img = cv2.imread("Lena3.png")
output = detector.detect_faces(img)
print(output)
得到报错信息如下,主要是
错误1. Call to CreateProcess failed. Error code: 2
告警2. Couldn’t get ptxas version string: Internal: Couldn’t invoke ptxas.exe –version
告警3. Internal: Failed to launch ptxas
D:ProgramDataDataAnaconda_envsenvs f-gpu-2.6python.exe D:CodesPython f_gpu_testcuda_test.py
2024-08-29 17:45:39.732544: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-08-29 17:45:40.595161: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 3684 MB memory: -> device: 0, name: NVIDIA GeForce RTX 4050 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.9
2024-08-29 17:45:40.618800: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 3684 MB memory: -> device: 0, name: NVIDIA GeForce RTX 4050 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.9
2024-08-29 17:45:40.827761: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)
2024-08-29 17:45:41.754506: I tensorflow/stream_executor/cuda/cuda_dnn.cc:369] Loaded cuDNN version 8100
2024-08-29 17:45:42.639977: E tensorflow/core/platform/windows/subprocess.cc:287] Call to CreateProcess failed. Error code: 2
2024-08-29 17:45:42.641084: E tensorflow/core/platform/windows/subprocess.cc:287] Call to CreateProcess failed. Error code: 2
2024-08-29 17:45:42.641227: W tensorflow/stream_executor/gpu/asm_compiler.cc:77] Couldn't get ptxas version string: Internal: Couldn't invoke ptxas.exe --version
2024-08-29 17:45:42.648187: E tensorflow/core/platform/windows/subprocess.cc:287] Call to CreateProcess failed. Error code: 2
2024-08-29 17:45:42.648554: W tensorflow/stream_executor/gpu/redzone_allocator.cc:314] Internal: Failed to launch ptxas
Relying on driver to perform ptx compilation.
Modify $PATH to customize ptxas location.
This message will be only logged once.
2024-08-29 17:45:45.448664: I tensorflow/stream_executor/cuda/cuda_blas.cc:1760] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.
[{'box': [202, 182, 158, 210], 'confidence': 0.9971387386322021, 'keypoints': {'left_eye': (267, 265), 'right_eye': (336, 267), 'nose': (316, 317), 'mouth_left': (265, 348), 'mouth_right': (322, 351)}}]
Process finished with exit code 0
对于错误1 Call to CreateProcess failed,该博主的方法1:运行 conda install -c nvidia cuda-nvcc 来确保ptxas在conda环境中 (因为我的cudatoolkit是通过在conda虚拟环境中安装的,没有通过从官网下载安装包的方式,因此方法2不适合我)。完整命令如下:
conda install -c nvidia cuda-nvcc
试了,还是报错,和上面的报错信息一模一样,没有任何改善……. (别急着卸载cuda-nvcc)
通过在电脑中搜索 ptxas.exe ,发现它就在我的conda虚拟环境路径下:
于是老方法,将该路径添加到系统环境变量的path中。关闭pycharm,重新打开,运行测试代码,上述错误和警告均解决,nice!!! 正确结果如下:
D:ProgramDataDataAnaconda_envsenvs f-gpu-2.6python.exe D:CodesPython f_gpu_testcuda_test.py
2024-08-29 19:11:11.700032: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-08-29 19:11:14.105552: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 3684 MB memory: -> device: 0, name: NVIDIA GeForce RTX 4050 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.9
2024-08-29 19:11:14.129957: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 3684 MB memory: -> device: 0, name: NVIDIA GeForce RTX 4050 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.9
2024-08-29 19:11:14.347828: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)
2024-08-29 19:11:15.332231: I tensorflow/stream_executor/cuda/cuda_dnn.cc:369] Loaded cuDNN version 8100
2024-08-29 19:11:21.279191: I tensorflow/stream_executor/cuda/cuda_blas.cc:1760] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.
[{'box': [202, 182, 158, 210], 'confidence': 0.9971387386322021, 'keypoints': {'left_eye': (267, 265), 'right_eye': (336, 267), 'nose': (316, 317), 'mouth_left': (265, 348), 'mouth_right': (322, 351)}}]
Process finished with exit code 0
3.人脸识别代码测试
下面跑一段人脸识别代码测试:
import cv2
import tensorflow as tf
from mtcnn import MTCNN
config = tf.compat.v1.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.6
config.gpu_options.allow_growth = True
sess = tf.compat.v1.Session(config=config)
detector = MTCNN()
img = cv2.imread('Lena3.png')
output = detector.detect_faces(img)
print(output)
x,y,width,height = output[0]['box']
left_eye_X,left_eye_Y = output[0]['keypoints']['left_eye']
right_eye_X,right_eye_Y = output[0]['keypoints']['right_eye']
nose_X,nose_Y = output[0]['keypoints']['nose']
mouth_left_X,mouth_left_Y = output[0]['keypoints']['mouth_left']
mouth_right_X,mouth_right_Y = output[0]['keypoints']['mouth_right']
# opencv的三色顺序为BGR (Blue,Green,Red)
cv2.rectangle(img,pt1=(x,y),pt2=(x+width,y+height),color=(0,255,0),thickness=2)
cv2.circle(img,center=(left_eye_X,left_eye_Y),color=(0,0,255),thickness=2,radius=1)
cv2.circle(img,center=(right_eye_X,right_eye_Y),color=(0,0,255),thickness=2,radius=1)
cv2.circle(img,center=(nose_X,nose_Y),color=(255,0,0),thickness=2,radius=1)
cv2.circle(img,center=(mouth_left_X,mouth_left_Y),color=(255,0,0),thickness=2,radius=1)
cv2.circle(img,center=(mouth_right_X,mouth_right_Y),color=(255,0,0),thickness=2,radius=1)
cv2.imshow('result',img)
cv2.waitKey(0)
结果如下: