有神的方法吗?
我的平面是我将我的数组划分为子数组,并行计算一个子数组的总和,然后当并行计算完成时计运算符和的总和.
例如,如果我有
array = [a0,....an]
我在子数组中划分数组:
array_1 = [a_0,...a_i],array_2 = [a_i+1,...a_2i],....array_n/i = [a_n-1,... a_n]
这个数组的总和是并行计算的,我得到了
sum_1,sum_2,sum_3,... sum_n/1
最后只计运算符和的总和.
我创建运行我的金属着色器的应用程序,但有些事情我不太了解.
var array:[[float]] = [[1,2,3],[4,5,6],[7,8,9]] // get device let device: MTLDevice! = MTLCreateSy@R_403_6563@DefaultDevice() // get library let defaultlibrary:MTLlibrary! = device.newDefaultlibrary() // queue let commandQueue:MTLCommandQueue! = device.newCommandQueue() // function let kernerFunction: MTLFunction! = defaultlibrary.newFunctionWithname("calculateSum") // pipeline with function let pipelinestate: MTLComputePipelinestate! = try device.newComputePipelinestateWithFunction(kernerFunction) // buffer for function let commandBuffer:MTLCommandBuffer! = commandQueue.commandBuffer() // encode function let commandEncoder:MTLComputeCommandEncoder = commandBuffer.computeCommandEncoder() // add function to encode commandEncoder.setComputePipelinestate(pipelinestate) // options let resourceOption = MTLResourceOptions() let arrayBiteLength = array.count * array[0].count * sizeofValue(array[0][0]) let arrayBuffer = device.newBufferWithBytes(&array,length: arrayBiteLength,options: resourceOption) commandEncoder.setBuffer(arrayBuffer,offset: 0,atIndex: 0) var result:[float] = [0,0] let resultBiteLenght = sizeofValue(result[0]) let resultBuffer = device.newBufferWithBytes(&result,length: resultBiteLenght,options: resourceOption) commandEncoder.setBuffer(resultBuffer,atIndex: 1) let threadGroupSize = MTLSize(wIDth: 1,height: 1,depth: 1) let threadGroups = MTLSize(wIDth: (array.count),depth: 1) commandEncoder.dispatchThreadgroups(threadGroups,threadsPerThreadgroup: threadGroupSize) commandEncoder.endEnCoding() commandBuffer.commit() commandBuffer.waitUntilCompleted() let data = NSData(bytesNocopy: resultBuffer.contents(),length: sizeof(float),freeWhenDone: false) data.getBytes(&result,length: result.count * sizeof(float)) print(result)
是我的Swift代码,
我的着色器是:
kernel voID calculateSum(const device float *infloat [[buffer(0)]],device float *result [[buffer(1)]],uint ID [[ thread_position_in_grID ]]) { float * f = infloat[ID]; float sum = 0; for (int i = 0 ; i < 3 ; ++i) { sum = sum + f[i]; } result = sum;}
我不知道如何定义infloat是数组数组.
我不确切知道什么是threadGroupSize和threadGroups.
我不知道着色器属性中的设备和uint是什么.
这是正确的方法吗?
我花时间用Metal创建了这个问题的完整工作示例.解释在评论中:import Metallet count = 10_000_000let elementsPerSum = 10_000// Data type,has to be the same as in the shadertypealias DataType = CIntlet device = MTLCreateSy@R_403_6563@DefaultDevice()!let parsum = device.newDefaultlibrary()!.newFunctionWithname("parsum")!let pipeline = try! device.newComputePipelinestateWithFunction(parsum)var data = (0..<count).map{ _ in DataType(arc4random_uniform(100)) } // Our data,randomly generatedvar dataCount = CUnsignedInt(count)var elementsPerSumC = CUnsignedInt(elementsPerSum)let resultsCount = (count + elementsPerSum - 1) / elementsPerSum // Number of indivIDual results = count / elementsPerSum (rounded up)let dataBuffer = device.newBufferWithBytes(&data,length: strIDeof(DataType) * count,options: []) // Our data in a buffer (copIEd)let resultsBuffer = device.newBufferWithLength(strIDeof(DataType) * resultsCount,options: []) // A buffer for indivIDual results (zero initialized)let results = UnsafeBufferPointer<DataType>(start: UnsafePointer(resultsBuffer.contents()),count: resultsCount) // Our results in convenIEnt form to compute the actual result laterlet queue = device.newCommandQueue()let cmds = queue.commandBuffer()let encoder = cmds.computeCommandEncoder()encoder.setComputePipelinestate(pipeline)encoder.setBuffer(dataBuffer,atIndex: 0)encoder.setBytes(&dataCount,length: sizeofValue(dataCount),atIndex: 1)encoder.setBuffer(resultsBuffer,atIndex: 2)encoder.setBytes(&elementsPerSumC,length: sizeofValue(elementsPerSumC),atIndex: 3)// We have to calculate the sum `resultCount` times => amount of threadgroups is `resultsCount` / `threadExecutionWIDth` (rounded up) because each threadgroup will process `threadExecutionWIDth` threadslet threadgroupsPerGrID = MTLSize(wIDth: (resultsCount + pipeline.threadExecutionWIDth - 1) / pipeline.threadExecutionWIDth,depth: 1)// Here we set that each threadgroup should process `threadExecutionWIDth` threads,the only important thing for performance is that this number is a multiple of `threadExecutionWIDth` (here 1 times)let threadsPerThreadgroup = MTLSize(wIDth: pipeline.threadExecutionWIDth,depth: 1)encoder.dispatchThreadgroups(threadgroupsPerGrID,threadsPerThreadgroup: threadsPerThreadgroup)encoder.endEnCoding()var start,end : UInt64var result : DataType = 0start = mach_absolute_time()cmds.commit()cmds.waitUntilCompleted()for elem in results { result += elem}end = mach_absolute_time()print("Metal result: \(result),time: \(Double(end - start) / Double(NSEC_PER_SEC))")result = 0start = mach_absolute_time()data.withUnsafeBufferPointer { buffer in for elem in buffer { result += elem }}end = mach_absolute_time()print("cpu result: \(result),time: \(Double(end - start) / Double(NSEC_PER_SEC))")
着色器:
// Data type,has to be the same as in the Swift filetypedef int DataType;kernel voID parsum(const device DataType* data [[ buffer(0) ]],const device uint& dataLength [[ buffer(1) ]],device DataType* sums [[ buffer(2) ]],const device uint& elementsPerSum [[ buffer(3) ]],const uint tgPos [[ threadgroup_position_in_grID ]],const uint tPerTg [[ threads_per_threadgroup ]],const uint tPos [[ thread_position_in_threadgroup ]]) { uint resultIndex = tgPos * tPerTg + tPos; // This is the index of the indivIDual result,this var is unique to this thread uint dataIndex = resultIndex * elementsPerSum; // Where the summation should begin uint endindex = dataIndex + elementsPerSum < dataLength ? dataIndex + elementsPerSum : dataLength; // The index where summation should end for (; dataIndex < endindex; dataIndex++) sums[resultIndex] += data[dataIndex];}
我用我的Mac测试它,但它应该在iOS上运行得很好.
输出:
Metal result: 494936505,time: 0.024611456cpu result: 494936505,time: 0.163341018
Metal版本的速度提高了约7倍.我敢肯定,如果你实施像截断或其他任何东西的分治,你可以获得更快的速度.
总结以上是内存溢出为你收集整理的与金属swift并行计算数组值的总和全部内容,希望文章能够帮你解决与金属swift并行计算数组值的总和所遇到的程序开发问题。
如果觉得内存溢出网站内容还不错,欢迎将内存溢出网站推荐给程序员好友。
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)