在多维立方体上的Postgresql k-最近邻(KNN)_sql

概述我有一个有8个维度的立方体.我想做最近邻居匹配.我对 postgresql完全不熟悉.我读到9.1支持多维上的最近邻匹配.如果有人能给出一个完整的例子,我真的很感激： >如何使用8D立方体创建表格？ >样品插入 >查找 – 完全匹配 >查找 – 最近邻居匹配样本数据：为简单起见,我们可以假设所有值的范围都是0-100. 第1点：(1,1,1,1,1,1,1,1) 第2点：(2,2,2,2,2, 我有一个有8个维度的立方体.我想做最近邻居匹配.我对 postgresql完全不熟悉.我读到9.1支持多维上的最近邻匹配.如果有人能给出一个完整的例子,我真的很感激：

>如何使用8D立方体创建表格？
>样品插入
>查找 – 完全匹配
>查找 – 最近邻居匹配

样本数据：

为简单起见,我们可以假设所有值的范围都是0-100.

第1点：(1,1,1)

第2点：(2,2,2)

查找值：(1,2)

这应该与Point1匹配,而不是Point2.

参考文献：

What’s_new_in_PostgreSQL_9.1

https://en.wikipedia.org/wiki/K-d_tree#Nearest_neighbour_search

Postgresql支持距离运算符< - >据我了解,这可以用于分析文本(使用pg_trgrm模块)和 geometry数据类型.

我不知道如何使用它超过1维.也许您必须定义自己的距离函数或以某种方式将数据转换为具有文本或几何类型的一列.例如,如果您有8列(8维立方体)的表：

c1 c2 c3 c4 c5 c6 c7 c8 1  0  1  0  1  0  1  2

你可以将它转换为：

c1 c2 c3 c4 c5 c6 c7 c8 a  b  a  b  a  b  a  c

然后用一列表格：

c1abababac

然后你可以使用(在创建gist index之后)：

SELECT c1,c1 <-> 'ababab' FROM test_trgm  ORDER BY c1 <-> 'ababab';

例

创建样本数据

-- Create some temporary data-- ! Note that table are created in tmp schema (change sql to your scheme) and deleted if exists !drop table if exists tmp.test_data;-- Random integer matrix 100*8 create table tmp.test_data as (   select       trunc(random()*100)::int as input_variable_1,trunc(random()*100)::int as input_variable_2,trunc(random()*100)::int as input_variable_3,trunc(random()*100)::int as input_variable_4,trunc(random()*100)::int as input_variable_5,trunc(random()*100)::int as input_variable_6,trunc(random()*100)::int as input_variable_7,trunc(random()*100)::int as input_variable_8   from       generate_serIEs(1,100,1));

将输入数据转换为文本

drop table if exists tmp.test_data_trans;create table tmp.test_data_trans as (select    input_variable_1 || ';' ||   input_variable_2 || ';' ||   input_variable_3 || ';' ||   input_variable_4 || ';' ||   input_variable_5 || ';' ||   input_variable_6 || ';' ||   input_variable_7 || ';' ||   input_variable_8 as trans_variablefrom    tmp.test_data);

这将为您提供一个变量trans_variable,其中存储了所有8个维度：

trans_variable40;88;68;29;19;54;40;9080;49;56;57;42;36;50;6829;13;63;33;0;18;52;7744;68;18;81;28;24;20;8980;62;20;49;4;87;54;1835;37;32;25;8;13;42;548;58;3;42;37;1;41;4970;1;28;18;47;78;8;17

而不是||运算符您还可以使用以下语法(更短,但更神秘)：

select    array_to_string(string_to_array(t.*::text,''),'') as trans_variablefrom    tmp.test_data t

添加索引

create index test_data_gist_index on tmp.test_data_trans using gist(trans_variable);

测试距离
注意：我从表格中选择了一行 – 52; 42; 18; 50; 68; 29; 8; 55 – 并使用稍微改变的值(42; 42; 18; 52; 98; 29; 8; 55)测试距离.当然,测试数据中的值将完全不同,因为它是RANDOM矩阵.

select    *,trans_variable <->  '42;42;18;52;98;29;8;55' as distance,similarity(trans_variable,'42;42;18;52;98;29;8;55') as similarity,from    tmp.test_data_trans order by   trans_variable <-> '52;42;18;50;68;29;8;55';

您可以使用距离运算符< - >或类似功能.距离= 1 – 相似度

总结

以上是内存溢出为你收集整理的在多维立方体上的Postgresql k-最近邻(KNN)全部内容，希望文章能够帮你解决在多维立方体上的Postgresql k-最近邻(KNN)所遇到的程序开发问题。

如果觉得内存溢出网站内容还不错，欢迎将内存溢出网站推荐给程序员好友。

欢迎分享，转载请注明来源：内存溢出

原文地址: http://www.outofmemory.cn/sjk/1181577.html

在多维立方体上的Postgresql k-最近邻(KNN)

发表评论

评论列表（0条）