假设有很多重复，使用numpy向量化“纯”函数_随笔

假设有很多重复，使用numpy向量化“纯”函数

实际上，您可以在数组上进行一次遍历，但这需要您

dtype

事先知道结果。否则，您需要对元素进行二次遍历才能确定它。

functools.wraps

暂时忽略性能（和），实现可能看起来像这样：

def vectorize_cached(output_dtype):    def vectorize_cached_factory(f):        def f_vec(arr): flattened = arr.ravel() if output_dtype is None:     result = np.empty_like(flattened) else:     result = np.empty(arr.size, output_dtype) cache = {} for idx, item in enumerate(flattened):     res = cache.get(item)     if res is None:         res = f(item)         cache[item] = res     result[idx] = res return result.reshape(arr.shape)        return f_vec    return vectorize_cached_factory

它首先创建结果数组，然后遍历输入数组。一旦遇到字典中尚不存在的元素，则调用该函数（并存储结果）-否则，它仅使用存储在字典中的值。

@vectorize_cached(np.float64)def t(x):    print(x)    return x + 2.5>>> t(np.array([1,1,1,2,2,2,3,3,1,1,1]))123array([3.5, 3.5, 3.5, 4.5, 4.5, 4.5, 5.5, 5.5, 3.5, 3.5, 3.5])

但是，这并不是特别快，因为我们正在对NumPy数组进行Python循环。

Cython解决方案

为了使其更快，我们实际上可以将此实现移植到Cython（目前仅支持float32，float64，int32，int64，uint32和uint64，但由于使用了融合类型，因此几乎可以扩展）：

%%cythoncimport numpy as cnpctypedef fused input_type:    cnp.float32_t    cnp.float64_t    cnp.uint32_t    cnp.uint64_t    cnp.int32_t    cnp.int64_tctypedef fused result_type:    cnp.float32_t    cnp.float64_t    cnp.uint32_t    cnp.uint64_t    cnp.int32_t    cnp.int64_tcpdef void vectorized_cached_impl(input_type[:] array, result_type[:] result, object func):    cdef dict cache = {}    cdef Py_ssize_t idx    cdef input_type item    for idx in range(array.size):        item = array[idx]        res = cache.get(item)        if res is None: res = func(item) cache[item] = res        result[idx] = res

使用Python装饰器（以下代码未使用Cython编译）：

def vectorize_cached_cython(output_dtype):    def vectorize_cached_factory(f):        def f_vec(arr): flattened = arr.ravel() if output_dtype is None:     result = np.empty_like(flattened) else:     result = np.empty(arr.size, output_dtype) vectorized_cached_impl(flattened, result, f) return result.reshape(arr.shape)        return f_vec    return vectorize_cached_factory

同样，这仅执行一次，并且每个唯一值仅对函数应用一次：

@vectorize_cached_cython(np.float64)def t(x):    print(x)    return x + 2.5>>> t(np.array([1,1,1,2,2,2,3,3,1,1,1]))123array([3.5, 3.5, 3.5, 4.5, 4.5, 4.5, 5.5, 5.5, 3.5, 3.5, 3.5])

基准：功能快速，重复项很多

但是问题是：在这里使用Cython是否有意义？

我做了一个快速基准测试（没有

sleep

），以了解性能有何不同（使用我的库

simple_benchmark

）：

def func_to_vectorize(x):    return xusual_vectorize = np.vectorize(func_to_vectorize)pure_vectorize = vectorize_pure(func_to_vectorize)pandas_vectorize = vectorize_with_pandas(func_to_vectorize)cached_vectorize = vectorize_cached(None)(func_to_vectorize) cython_vectorize = vectorize_cached_cython(None)(func_to_vectorize)from simple_benchmark import BenchmarkBuilderb = BenchmarkBuilder()b.add_function(alias='usual_vectorize')(usual_vectorize)b.add_function(alias='pure_vectorize')(pure_vectorize)b.add_function(alias='pandas_vectorize')(pandas_vectorize)b.add_function(alias='cached_vectorize')(cached_vectorize)b.add_function(alias='cython_vectorize')(cython_vectorize)@b.add_arguments('array size')def argument_provider():    np.random.seed(0)    for exponent in range(6, 20):        size = 2**exponent        yield size, np.random.randint(0, 10, size=(size, 2))r = b.run()r.plot()

根据这些时间，排名将是（最快到最慢）：

Cython版本
熊猫解决方案（从另一个答案）
纯溶液（原帖）
NumPys矢量化
使用缓存的非Cython版本

如果函数调用非常便宜，则普通的NumPy解决方案只会慢5-10倍。熊猫解决方案还具有更大的恒定因子，因此对于非常小的阵列，它是最慢的。

基准：昂贵的函数（

time.sleep(0.001)

），很多重复项

如果函数调用实际上很昂贵（如

time.sleep

），则

np.vectorize

解决方案会 慢很多 ，但是其他解决方案之间的差异要小得多：

# This shows only the difference compared to the previous benchmarkdef func_to_vectorize(x):    sleep(0.001)    return [email protected]_arguments('array size')def argument_provider():    np.random.seed(0)    for exponent in range(5, 10):        size = 2**exponent        yield size, np.random.randint(0, 10, size=(size, 2))

基准：功能快捷，重复项很少

但是，如果您没有那么多重复项，那么平原的

np.vectorize

速度几乎与pure和pandas解决方案一样快，并且仅比Cython版本慢一点：

# Again just difference to the original benchmark is [email protected]_arguments('array size')def argument_provider():    np.random.seed(0)    for exponent in range(6, 20):        size = 2**exponent        # Maximum value is now depending on the size to ensures there         # are less duplicates in the array        yield size, np.random.randint(0, size // 10, size=(size, 2))

欢迎分享，转载请注明来源：内存溢出

原文地址: https://www.outofmemory.cn/zaji/5666265.html

假设有很多重复，使用numpy向量化“纯”函数

发表评论

评论列表（0条）