CUDA学习（二十二）-阿里云开发者社区

CUDA学习（二十二）

2018-02-09 1827

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介：

与OpenGL互操作性：
OpenGL（全写Open Graphics Library）是指定义了一个跨编程语言、跨平台的编程接口规格的专业的图形程序接口。它用于三维图像（二维的亦可），是一个功能强大，调用方便的底层图形库。
可以映射到CUDA地址空间的OpenGL资源是OpenGL缓冲区，纹理和渲染缓冲区对象。
一个缓冲区对象使用cudaGraphicsGLRegisterBuffer（）注册。在CUDA中，它显示为一个设备指针，因此可以通过内核或通过cudaMemcpy（）调用来读写。
使用cudaGraphicsGLRegisterImage（）注册纹理或渲染缓冲区对象。在CUDA中，它显示为一个CUDA数组。内核可以通过将数组绑定到纹理或表面引用来读取数组。如果资源已经用cudaGraphicsRegisterFlagsSurfaceLoadStore标志进行了注册，它们也可以通过表面写入功能写入。该数组也可以通过cudaMemcpy2D（）调用读取和写入。 cudaGraphicsGLRegisterImage（）支持所有具有1,2或4个分量和浮点内部类型（例如GL_RGBA_FLOAT32），归一化整数（例如GL_RGBA8，GL_INTENSITY16）和非标准化整数（例如GL_RGBA8UI）的纹理格式（请注意，非标准化的整数格式需要OpenGL 3.0，它们只能被着色器写入，而不能被固定功能管道写入）。
其资源被共享的OpenGL上下文必须是主线程的当前进行任何OpenGL互操作API调用。
请注意：当一个OpenGL纹理被绑定（例如，通过请求使用glGetTextureHandle / glGetImageHandle API的图像或纹理句柄），它不能被注册到CUDA。应用程序需要在请求图像或纹理句柄之前注册纹理进行交互。
以下代码示例使用内核动态修改存储在顶点缓冲区对象中的顶点的2D宽x高网格：

GLuint positionsVBO;
struct cudaGraphicsResource* positionsVBO_CUDA;
int main()
{
    // Initialize OpenGL and GLUT for device 0
    // and make the OpenGL context current
    ...
        glutDisplayFunc(display);
    // Explicitly set device 0
    cudaSetDevice(0);
    // Create buffer object and register it with CUDA
    glGenBuffers(1, &positionsVBO);
    glBindBuffer(GL_ARRAY_BUFFER, positionsVBO);
    unsigned int size = width * height * 4 * sizeof(float);
    glBufferData(GL_ARRAY_BUFFER, size, 0, GL_DYNAMIC_DRAW);
    glBindBuffer(GL_ARRAY_BUFFER, 0);
    cudaGraphicsGLRegisterBuffer(&positionsVBO_CUDA,
        positionsVBO,
        cudaGraphicsMapFlagsWriteDiscard);
    // Launch rendering loop
    glutMainLoop();
    ...
}
void display()
{
    // Map buffer object for writing from CUDA
    float4* positions;
    cudaGraphicsMapResources(1, &positionsVBO_CUDA, 0);
    size_t num_bytes;
    cudaGraphicsResourceGetMappedPointer((void**)&positions,
        &num_bytes,
        positionsVBO_CUDA));
        // Execute kernel
        dim3 dimBlock(16, 16, 1);
        dim3 dimGrid(width / dimBlock.x, height / dimBlock.y, 1);
        createVertices << <dimGrid, dimBlock >> >(positions, time,
            width, height);
        // Unmap buffer object
        cudaGraphicsUnmapResources(1, &positionsVBO_CUDA, 0);
        // Render from buffer object
        glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
        glBindBuffer(GL_ARRAY_BUFFER, positionsVBO);
        glVertexPointer(4, GL_FLOAT, 0, 0);
        glEnableClientState(GL_VERTEX_ARRAY);
        glDrawArrays(GL_POINTS, 0, width * height);
        glDisableClientState(GL_VERTEX_ARRAY);
        // Swap buffers
        glutSwapBuffers();
        glutPostRedisplay();
}
void deleteVBO()
{
    cudaGraphicsUnregisterResource(positionsVBO_CUDA);
    glDeleteBuffers(1, &positionsVBO);
}
__global__ void createVertices(float4* positions, float time,
    unsigned int width, unsigned int height)
{
    unsigned int x = blockIdx.x * blockDim.x + threadIdx.x;
    unsigned int y = blockIdx.y * blockDim.y + threadIdx.y;
    // Calculate uv coordinates
    float u = x / (float)width;
    float v = y / (float)height;
    u = u * 2.0f - 1.0f;
    v = v * 2.0f - 1.0f;
    // calculate simple sine wave pattern
    float freq = 4.0f;
    float w = sinf(u * freq + time)
        * cosf(v * freq + time) * 0.5f;
    // Write positions
    positions[y * width + x] = make_float4(u, w, v, 1.0f);
}

在Windows和Quadro GPU上，cudaWGLGetDevice（）可用于检索与wglEnumGpusNV（）返回的句柄关联的CUDA设备。 Quadro GPU在多GPU配置中提供比GeForce和Tesla GPU更高的OpenGL互操作性，其中在Quadro GPU上执行OpenGL渲染，并在系统中的其他GPU上执行CUDA计算。
u_1076833624_2824088214_fm_27_gp_0

CUDA学习（二十二）

热门文章

最新文章

相关课程

相关电子书