与金属swift并行计算数组值的总和

与金属swift并行计算数组值的总和,第1张

概述我试图与金属 swift并行计算数组总和. 有神的方法吗? 我的平面是我将我的数组划分为子数组,并行计算一个子数组的总和,然后当并行计算完成时计运算符和的总和. 例如,如果我有 array = [a0,....an] 我在子数组中划分数组: array_1 = [a_0,...a_i],array_2 = [a_i+1,...a_2i],....array_n/i = [a_n-1, . 我试图与金属 swift并行计算大数组的总和.

有神的方法吗?

我的平面是我将我的数组划分为子数组,并行计算一个子数组的总和,然后当并行计算完成时计运算符和的总和.

例如,如果我有

array = [a0,....an]

我在子数组中划分数组:

array_1 = [a_0,...a_i],array_2 = [a_i+1,...a_2i],....array_n/i = [a_n-1,... a_n]

这个数组的总和是并行计算的,我得到了

sum_1,sum_2,sum_3,... sum_n/1

最后只计运算符和的总和.

我创建运行我的金属着色器的应用程序,但有些事情我不太了解.

var array:[[float]] = [[1,2,3],[4,5,6],[7,8,9]]        // get device        let device: MTLDevice! = MTLCreateSy@R_403_6563@DefaultDevice()        // get library        let defaultlibrary:MTLlibrary! = device.newDefaultlibrary()        // queue        let commandQueue:MTLCommandQueue! = device.newCommandQueue()        // function        let kernerFunction: MTLFunction! = defaultlibrary.newFunctionWithname("calculateSum")        // pipeline with function        let pipelinestate: MTLComputePipelinestate! = try device.newComputePipelinestateWithFunction(kernerFunction)        // buffer for function        let commandBuffer:MTLCommandBuffer! = commandQueue.commandBuffer()        // encode function        let commandEncoder:MTLComputeCommandEncoder = commandBuffer.computeCommandEncoder()        // add function to encode        commandEncoder.setComputePipelinestate(pipelinestate)        // options        let resourceOption = MTLResourceOptions()        let arrayBiteLength = array.count * array[0].count * sizeofValue(array[0][0])        let arrayBuffer = device.newBufferWithBytes(&array,length: arrayBiteLength,options: resourceOption)        commandEncoder.setBuffer(arrayBuffer,offset: 0,atIndex: 0)        var result:[float] = [0,0]        let resultBiteLenght = sizeofValue(result[0])        let resultBuffer = device.newBufferWithBytes(&result,length: resultBiteLenght,options: resourceOption)        commandEncoder.setBuffer(resultBuffer,atIndex: 1)        let threadGroupSize = MTLSize(wIDth: 1,height: 1,depth: 1)        let threadGroups = MTLSize(wIDth: (array.count),depth: 1)        commandEncoder.dispatchThreadgroups(threadGroups,threadsPerThreadgroup: threadGroupSize)        commandEncoder.endEnCoding()        commandBuffer.commit()        commandBuffer.waitUntilCompleted()        let data = NSData(bytesNocopy: resultBuffer.contents(),length: sizeof(float),freeWhenDone: false)        data.getBytes(&result,length: result.count * sizeof(float))        print(result)

是我的Swift代码,

我的着色器是:

kernel voID calculateSum(const device float *infloat [[buffer(0)]],device float *result [[buffer(1)]],uint ID [[ thread_position_in_grID ]]) {    float * f = infloat[ID];    float sum = 0;    for (int i = 0 ; i < 3 ; ++i) {        sum = sum + f[i];    }    result = sum;}

我不知道如何定义infloat是数组数组.
我不确切知道什么是threadGroupSize和threadGroups.
我不知道着色器属性中的设备和uint是什么.

这是正确的方法吗?

我花时间用Metal创建了这个问题的完整工作示例.解释在评论中:
import Metallet count = 10_000_000let elementsPerSum = 10_000// Data type,has to be the same as in the shadertypealias DataType = CIntlet device = MTLCreateSy@R_403_6563@DefaultDevice()!let parsum = device.newDefaultlibrary()!.newFunctionWithname("parsum")!let pipeline = try! device.newComputePipelinestateWithFunction(parsum)var data = (0..<count).map{ _ in DataType(arc4random_uniform(100)) } // Our data,randomly generatedvar dataCount = CUnsignedInt(count)var elementsPerSumC = CUnsignedInt(elementsPerSum)let resultsCount = (count + elementsPerSum - 1) / elementsPerSum // Number of indivIDual results = count / elementsPerSum (rounded up)let dataBuffer = device.newBufferWithBytes(&data,length: strIDeof(DataType) * count,options: []) // Our data in a buffer (copIEd)let resultsBuffer = device.newBufferWithLength(strIDeof(DataType) * resultsCount,options: []) // A buffer for indivIDual results (zero initialized)let results = UnsafeBufferPointer<DataType>(start: UnsafePointer(resultsBuffer.contents()),count: resultsCount) // Our results in convenIEnt form to compute the actual result laterlet queue = device.newCommandQueue()let cmds = queue.commandBuffer()let encoder = cmds.computeCommandEncoder()encoder.setComputePipelinestate(pipeline)encoder.setBuffer(dataBuffer,atIndex: 0)encoder.setBytes(&dataCount,length: sizeofValue(dataCount),atIndex: 1)encoder.setBuffer(resultsBuffer,atIndex: 2)encoder.setBytes(&elementsPerSumC,length: sizeofValue(elementsPerSumC),atIndex: 3)// We have to calculate the sum `resultCount` times => amount of threadgroups is `resultsCount` / `threadExecutionWIDth` (rounded up) because each threadgroup will process `threadExecutionWIDth` threadslet threadgroupsPerGrID = MTLSize(wIDth: (resultsCount + pipeline.threadExecutionWIDth - 1) / pipeline.threadExecutionWIDth,depth: 1)// Here we set that each threadgroup should process `threadExecutionWIDth` threads,the only important thing for performance is that this number is a multiple of `threadExecutionWIDth` (here 1 times)let threadsPerThreadgroup = MTLSize(wIDth: pipeline.threadExecutionWIDth,depth: 1)encoder.dispatchThreadgroups(threadgroupsPerGrID,threadsPerThreadgroup: threadsPerThreadgroup)encoder.endEnCoding()var start,end : UInt64var result : DataType = 0start = mach_absolute_time()cmds.commit()cmds.waitUntilCompleted()for elem in results {    result += elem}end = mach_absolute_time()print("Metal result: \(result),time: \(Double(end - start) / Double(NSEC_PER_SEC))")result = 0start = mach_absolute_time()data.withUnsafeBufferPointer { buffer in    for elem in buffer {        result += elem    }}end = mach_absolute_time()print("cpu result: \(result),time: \(Double(end - start) / Double(NSEC_PER_SEC))")

着色器:

// Data type,has to be the same as in the Swift filetypedef int DataType;kernel voID parsum(const device DataType* data [[ buffer(0) ]],const device uint& dataLength [[ buffer(1) ]],device DataType* sums [[ buffer(2) ]],const device uint& elementsPerSum [[ buffer(3) ]],const uint tgPos [[ threadgroup_position_in_grID ]],const uint tPerTg [[ threads_per_threadgroup ]],const uint tPos [[ thread_position_in_threadgroup ]]) {    uint resultIndex = tgPos * tPerTg + tPos; // This is the index of the indivIDual result,this var is unique to this thread    uint dataIndex = resultIndex * elementsPerSum; // Where the summation should begin    uint endindex = dataIndex + elementsPerSum < dataLength ? dataIndex + elementsPerSum : dataLength; // The index where summation should end    for (; dataIndex < endindex; dataIndex++)        sums[resultIndex] += data[dataIndex];}

我用我的Mac测试它,但它应该在iOS上运行得很好.

输出:

Metal result: 494936505,time: 0.024611456cpu result: 494936505,time: 0.163341018

Metal版本的速度提高了约7倍.我敢肯定,如果你实施像截断或其他任何东西的分治,你可以获得更快的速度.

总结

以上是内存溢出为你收集整理的与金属swift并行计算数组值的总和全部内容,希望文章能够帮你解决与金属swift并行计算数组值的总和所遇到的程序开发问题。

如果觉得内存溢出网站内容还不错,欢迎将内存溢出网站推荐给程序员好友。

欢迎分享,转载请注明来源:内存溢出

原文地址: http://www.outofmemory.cn/web/1089826.html

(0)
打赏 微信扫一扫 微信扫一扫 支付宝扫一扫 支付宝扫一扫
上一篇 2022-05-27
下一篇 2022-05-27

发表评论

登录后才能评论

评论列表(0条)

保存