Error[8]: Undefined offset: 9, File: /www/wwwroot/outofmemory.cn/tmp/plugin_ss_superseo_model_superseo.php, Line: 121
File: /www/wwwroot/outofmemory.cn/tmp/plugin_ss_superseo_model_superseo.php, Line: 473, decode(

概述我有以下数据集. https://drive.google.com/drive/folders/1NRelNsXQJ7MTNKcm-T69N6r5ZsOyFmTS?usp=sharing 如果列名与表名相同,则将所有内容合并为一个单独的列,以下是代码 import pandas as pdimport globimport os#file directory that contains th 我有以下数据集.
https://drive.google.com/drive/folders/1NRelNsXQJ7MTNKcm-T69N6r5ZsOyFmTS?usp=sharing

如果列名与表名相同,则将所有内容合并为一个单独的列,以下是代码

import pandas as pdimport globimport os#file directory that contains the csv filesfiles = glob.glob('/Users/user/Desktop/demo/*.csv')dfs = [pd.read_csv(fp).assign(Sheetname=os.path.basename(fp).split('.')[0]) for fp in files]data = pd.concat(dfs,ignore_index=True)data.columns = data.columns.str.lower()data=data.rename(columns={'sheetname':'Source'})merged_data = data

运行上面的代码后的数据

merged_data

ID  user    product price[78]   price[79]       Source105 dummya  egg         22      28.0            sheet1119 dummy1  Soya        67      NaN             sheet1567 dummya  spinach     22      28.0            sheet2897 dummy1  rose        67      99.0            sheet2345 dummya  egg         87      98.0            sheet3121 dummy1  potato      98      99.0            sheet3​

如何在条件下合并文件
条件.

Sheet   ID  price1_col1 price1_col2 price1         price2_col1 price2_col2 price2                      sheetnamesheet1  yes     78                   price1_col1     78                    price2_col1                  yessheet2  yes     78        79         price1_col1+    78         79         price2_col1+                 yes                                         price1_col2                           price2_col2sheet3  yes     78        79         max(price1_col1,79        78         min(price2_col1,price2_col2) no                                          price1_col2)

上述代码段中的价格1指向sheet1,其列名包含int 78.
如果78 79表示对这些列求和,并将名称作为price1.

产量

ID  product price1      price2  sheetname105 egg     22             28       sheet1119 Soya    67                      sheet1567 spinach 50            28        sheet2897 rose    166           99        sheet2345 egg     98            87    121 potato  99             98
解决方法 使用:

print (merged_data)    ID    user  product  price[78]  price[79]  Source0  105  dummya      egg         22       28.0  sheet11  119  dummy1     Soya         67        NaN  sheet12  567  dummya  spinach         22       28.0  sheet23  897  dummy1     rose         67       99.0  sheet24  345  dummya      egg         87       98.0  sheet35  121  dummy1   potato         98       99.0  sheet3print (Condition)    Sheet   ID  price1_col1  price1_col2                    price1_out  
#merge data together by left join    df = merged_data.merge(Condition.rename(columns={'Sheet':'Source'}),on='Source',how='left')#replace columns to empty strings,remove sheetname and ID columnsdf['Source'] = np.where(df.pop('sheetname') == 'yes',df['Source'],'')df['ID'] = np.where(df.pop('ID') == 'yes',df['ID'],'')#filter integers between [] to ned DataFrame df1 = df.filter(regex='\[\d+\]').copy()#filter all columns with price,exclude df1 df2 = df[df.filter(regex='price').columns.difference(df1.columns)].copy()#convert column to integersdf1.columns = df1.columns.str.extract('\[(\d+)\]',expand=False).astype(int)#helper column for match missing valuesdf1['a'] = np.nan#filter columns without/with _outmask = df2.columns.str.endswith(('_col1','_col2'))final_cols = df2.columns[ ~mask]removed_cols = df2.columns[mask]#replace columns by match values from df2for c in removed_cols:    df2[c] = df1.lookup(df1.index,df2[c].fillna('a'))
sheet1 yes 78 NaN price1_col1 1 sheet2 yes 78 79.0 price1_col1+price1_col2 2 sheet3 yes 78 79.0 max(price1_col1,price1_col2) price2_col1 price2_col2 price2_out sheetname 0 78 NaN price2_col1 yes 1 78 79.0 price2_col1+price2_col2 yes 2 79 78.0 min(price2_col1,price2_col2) no
print (df2)   price1_col1  price1_col2                    price1_out  price2_col1  
#create MultiIndex for separate eah price groupsdf2.columns = df2.columns.str.split('_',expand=True)def f(x):    #remove first level    x.columns = x.columns.droplevel(0)    out = []    #loop each row    for v in x.itertuples(index=False):        #remove prefix        t = v.out.replace(x.name+'_','')        #loop each namedtuple and replace values        for k1,v1 in v._asdict().items():            t = t.replace(k1,str(v1))        #pd.eval cannot working with min,max,so handled different        if t.startswith('min'):            out.append(min(pd.eval(t[3:])))        elif t.startswith('max'):            out.append(max(pd.eval(t[3:])))        #handled +-*/        else:            out.append(pd.eval(t))    #return back    return pd.SerIEs(out)#overwrite original columnsdf[final_cols] = df2.groupby(level=0,axis=1).apply(f).add_suffix('_out')#if necessary remove helpersdf = df.drop(removed_cols,axis=1)
22 NaN price1_col1 22.0 1 67 NaN price1_col1 67.0 2 22 28.0 price1_col1+price1_col2 22.0 3 67 99.0 price1_col1+price1_col2 67.0 4 87 98.0 max(price1_col1,price1_col2) 98.0 5 98 99.0 max(price1_col1,price1_col2) 99.0 price2_col2 price2_out 0 NaN price2_col1 1 NaN price2_col1 2 28.0 price2_col1+price2_col2 3 99.0 price2_col1+price2_col2 4 87.0 min(price2_col1,price2_col2) 5 98.0 min(price2_col1,price2_col2)
print (df)    ID    user  product  price[78]  price[79]  Source  price1_out  price2_out0  105  dummya      egg         22       28.0  sheet1        22.0        22.01  119  dummy1     Soya         67        NaN  sheet1        67.0        67.02  567  dummya  spinach         22       28.0  sheet2        50.0        50.03  897  dummy1     rose         67       99.0  sheet2       166.0       166.04  345  dummya      egg         87       98.0                98.0        87.05  121  dummy1   potato         98       99.0                99.0        98.0
[+++] [+++] 总结

以上是内存溢出为你收集整理的如何合并具有不同标头但条件相同的数据的csv文件全部内容,希望文章能够帮你解决如何合并具有不同标头但条件相同的数据的csv文件所遇到的程序开发问题。

如果觉得内存溢出网站内容还不错,欢迎将内存溢出网站推荐给程序员好友。

)
File: /www/wwwroot/outofmemory.cn/tmp/route_read.php, Line: 126, InsideLink()
File: /www/wwwroot/outofmemory.cn/tmp/index.inc.php, Line: 166, include(/www/wwwroot/outofmemory.cn/tmp/route_read.php)
File: /www/wwwroot/outofmemory.cn/index.php, Line: 30, include(/www/wwwroot/outofmemory.cn/tmp/index.inc.php)
Error[8]: Undefined offset: 10, File: /www/wwwroot/outofmemory.cn/tmp/plugin_ss_superseo_model_superseo.php, Line: 121
File: /www/wwwroot/outofmemory.cn/tmp/plugin_ss_superseo_model_superseo.php, Line: 473, decode(

概述我有以下数据集. https://drive.google.com/drive/folders/1NRelNsXQJ7MTNKcm-T69N6r5ZsOyFmTS?usp=sharing 如果列名与表名相同,则将所有内容合并为一个单独的列,以下是代码 import pandas as pdimport globimport os#file directory that contains th 我有以下数据集.
https://drive.google.com/drive/folders/1NRelNsXQJ7MTNKcm-T69N6r5ZsOyFmTS?usp=sharing

如果列名与表名相同,则将所有内容合并为一个单独的列,以下是代码

import pandas as pdimport globimport os#file directory that contains the csv filesfiles = glob.glob('/Users/user/Desktop/demo/*.csv')dfs = [pd.read_csv(fp).assign(Sheetname=os.path.basename(fp).split('.')[0]) for fp in files]data = pd.concat(dfs,ignore_index=True)data.columns = data.columns.str.lower()data=data.rename(columns={'sheetname':'Source'})merged_data = data

运行上面的代码后的数据

merged_data

ID  user    product price[78]   price[79]       Source105 dummya  egg         22      28.0            sheet1119 dummy1  Soya        67      NaN             sheet1567 dummya  spinach     22      28.0            sheet2897 dummy1  rose        67      99.0            sheet2345 dummya  egg         87      98.0            sheet3121 dummy1  potato      98      99.0            sheet3​

如何在条件下合并文件
条件.

Sheet   ID  price1_col1 price1_col2 price1         price2_col1 price2_col2 price2                      sheetnamesheet1  yes     78                   price1_col1     78                    price2_col1                  yessheet2  yes     78        79         price1_col1+    78         79         price2_col1+                 yes                                         price1_col2                           price2_col2sheet3  yes     78        79         max(price1_col1,79        78         min(price2_col1,price2_col2) no                                          price1_col2)

上述代码段中的价格1指向sheet1,其列名包含int 78.
如果78 79表示对这些列求和,并将名称作为price1.

产量

ID  product price1      price2  sheetname105 egg     22             28       sheet1119 Soya    67                      sheet1567 spinach 50            28        sheet2897 rose    166           99        sheet2345 egg     98            87    121 potato  99             98
解决方法 使用:

print (merged_data)    ID    user  product  price[78]  price[79]  Source0  105  dummya      egg         22       28.0  sheet11  119  dummy1     Soya         67        NaN  sheet12  567  dummya  spinach         22       28.0  sheet23  897  dummy1     rose         67       99.0  sheet24  345  dummya      egg         87       98.0  sheet35  121  dummy1   potato         98       99.0  sheet3print (Condition)    Sheet   ID  price1_col1  price1_col2                    price1_out  
#merge data together by left join    df = merged_data.merge(Condition.rename(columns={'Sheet':'Source'}),on='Source',how='left')#replace columns to empty strings,remove sheetname and ID columnsdf['Source'] = np.where(df.pop('sheetname') == 'yes',df['Source'],'')df['ID'] = np.where(df.pop('ID') == 'yes',df['ID'],'')#filter integers between [] to ned DataFrame df1 = df.filter(regex='\[\d+\]').copy()#filter all columns with price,exclude df1 df2 = df[df.filter(regex='price').columns.difference(df1.columns)].copy()#convert column to integersdf1.columns = df1.columns.str.extract('\[(\d+)\]',expand=False).astype(int)#helper column for match missing valuesdf1['a'] = np.nan#filter columns without/with _outmask = df2.columns.str.endswith(('_col1','_col2'))final_cols = df2.columns[ ~mask]removed_cols = df2.columns[mask]#replace columns by match values from df2for c in removed_cols:    df2[c] = df1.lookup(df1.index,df2[c].fillna('a'))
sheet1 yes 78 NaN price1_col1 1 sheet2 yes 78 79.0 price1_col1+price1_col2 2 sheet3 yes 78 79.0 max(price1_col1,price1_col2) price2_col1 price2_col2 price2_out sheetname 0 78 NaN price2_col1 yes 1 78 79.0 price2_col1+price2_col2 yes 2 79 78.0 min(price2_col1,price2_col2) no
print (df2)   price1_col1  price1_col2                    price1_out  price2_col1  
#create MultiIndex for separate eah price groupsdf2.columns = df2.columns.str.split('_',expand=True)def f(x):    #remove first level    x.columns = x.columns.droplevel(0)    out = []    #loop each row    for v in x.itertuples(index=False):        #remove prefix        t = v.out.replace(x.name+'_','')        #loop each namedtuple and replace values        for k1,v1 in v._asdict().items():            t = t.replace(k1,str(v1))        #pd.eval cannot working with min,max,so handled different        if t.startswith('min'):            out.append(min(pd.eval(t[3:])))        elif t.startswith('max'):            out.append(max(pd.eval(t[3:])))        #handled +-*/        else:            out.append(pd.eval(t))    #return back    return pd.SerIEs(out)#overwrite original columnsdf[final_cols] = df2.groupby(level=0,axis=1).apply(f).add_suffix('_out')#if necessary remove helpersdf = df.drop(removed_cols,axis=1)
22 NaN price1_col1 22.0 1 67 NaN price1_col1 67.0 2 22 28.0 price1_col1+price1_col2 22.0 3 67 99.0 price1_col1+price1_col2 67.0 4 87 98.0 max(price1_col1,price1_col2) 98.0 5 98 99.0 max(price1_col1,price1_col2) 99.0 price2_col2 price2_out 0 NaN price2_col1 1 NaN price2_col1 2 28.0 price2_col1+price2_col2 3 99.0 price2_col1+price2_col2 4 87.0 min(price2_col1,price2_col2) 5 98.0 min(price2_col1,price2_col2)
print (df)    ID    user  product  price[78]  price[79]  Source  price1_out  price2_out0  105  dummya      egg         22       28.0  sheet1        22.0        22.01  119  dummy1     Soya         67        NaN  sheet1        67.0        67.02  567  dummya  spinach         22       28.0  sheet2        50.0        50.03  897  dummy1     rose         67       99.0  sheet2       166.0       166.04  345  dummya      egg         87       98.0                98.0        87.05  121  dummy1   potato         98       99.0                99.0        98.0
[+++] 总结

以上是内存溢出为你收集整理的如何合并具有不同标头但条件相同的数据的csv文件全部内容,希望文章能够帮你解决如何合并具有不同标头但条件相同的数据的csv文件所遇到的程序开发问题。

如果觉得内存溢出网站内容还不错,欢迎将内存溢出网站推荐给程序员好友。

)
File: /www/wwwroot/outofmemory.cn/tmp/route_read.php, Line: 126, InsideLink()
File: /www/wwwroot/outofmemory.cn/tmp/index.inc.php, Line: 166, include(/www/wwwroot/outofmemory.cn/tmp/route_read.php)
File: /www/wwwroot/outofmemory.cn/index.php, Line: 30, include(/www/wwwroot/outofmemory.cn/tmp/index.inc.php)
如何合并具有不同标头但条件相同的数据的csv文件_python_内存溢出

如何合并具有不同标头但条件相同的数据的csv文件

如何合并具有不同标头但条件相同的数据的csv文件,第1张

概述我有以下数据集. https://drive.google.com/drive/folders/1NRelNsXQJ7MTNKcm-T69N6r5ZsOyFmTS?usp=sharing 如果列名与表名相同,则将所有内容合并为一个单独的列,以下是代码 import pandas as pdimport globimport os#file directory that contains th 我有以下数据集.
https://drive.google.com/drive/folders/1NRelNsXQJ7MTNKcm-T69N6r5ZsOyFmTS?usp=sharing

如果列名与表名相同,则将所有内容合并为一个单独的列,以下是代码

import pandas as pdimport globimport os#file directory that contains the csv filesfiles = glob.glob('/Users/user/Desktop/demo/*.csv')dfs = [pd.read_csv(fp).assign(Sheetname=os.path.basename(fp).split('.')[0]) for fp in files]data = pd.concat(dfs,ignore_index=True)data.columns = data.columns.str.lower()data=data.rename(columns={'sheetname':'Source'})merged_data = data

运行上面的代码后的数据

merged_data

ID  user    product price[78]   price[79]       Source105 dummya  egg         22      28.0            sheet1119 dummy1  Soya        67      NaN             sheet1567 dummya  spinach     22      28.0            sheet2897 dummy1  rose        67      99.0            sheet2345 dummya  egg         87      98.0            sheet3121 dummy1  potato      98      99.0            sheet3​

如何在条件下合并文件
条件.

Sheet   ID  price1_col1 price1_col2 price1         price2_col1 price2_col2 price2                      sheetnamesheet1  yes     78                   price1_col1     78                    price2_col1                  yessheet2  yes     78        79         price1_col1+    78         79         price2_col1+                 yes                                         price1_col2                           price2_col2sheet3  yes     78        79         max(price1_col1,79        78         min(price2_col1,price2_col2) no                                          price1_col2)

上述代码段中的价格1指向sheet1,其列名包含int 78.
如果78 79表示对这些列求和,并将名称作为price1.

产量

ID  product price1      price2  sheetname105 egg     22             28       sheet1119 Soya    67                      sheet1567 spinach 50            28        sheet2897 rose    166           99        sheet2345 egg     98            87    121 potato  99             98
解决方法 使用:

print (merged_data)    ID    user  product  price[78]  price[79]  Source0  105  dummya      egg         22       28.0  sheet11  119  dummy1     Soya         67        NaN  sheet12  567  dummya  spinach         22       28.0  sheet23  897  dummy1     rose         67       99.0  sheet24  345  dummya      egg         87       98.0  sheet35  121  dummy1   potato         98       99.0  sheet3print (Condition)    Sheet   ID  price1_col1  price1_col2                    price1_out  
#merge data together by left join    df = merged_data.merge(Condition.rename(columns={'Sheet':'Source'}),on='Source',how='left')#replace columns to empty strings,remove sheetname and ID columnsdf['Source'] = np.where(df.pop('sheetname') == 'yes',df['Source'],'')df['ID'] = np.where(df.pop('ID') == 'yes',df['ID'],'')#filter integers between [] to ned DataFrame df1 = df.filter(regex='\[\d+\]').copy()#filter all columns with price,exclude df1 df2 = df[df.filter(regex='price').columns.difference(df1.columns)].copy()#convert column to integersdf1.columns = df1.columns.str.extract('\[(\d+)\]',expand=False).astype(int)#helper column for match missing valuesdf1['a'] = np.nan#filter columns without/with _outmask = df2.columns.str.endswith(('_col1','_col2'))final_cols = df2.columns[ ~mask]removed_cols = df2.columns[mask]#replace columns by match values from df2for c in removed_cols:    df2[c] = df1.lookup(df1.index,df2[c].fillna('a'))
sheet1 yes 78 NaN price1_col1 1 sheet2 yes 78 79.0 price1_col1+price1_col2 2 sheet3 yes 78 79.0 max(price1_col1,price1_col2) price2_col1 price2_col2 price2_out sheetname 0 78 NaN price2_col1 yes 1 78 79.0 price2_col1+price2_col2 yes 2 79 78.0 min(price2_col1,price2_col2) no
print (df2)   price1_col1  price1_col2                    price1_out  price2_col1  
#create MultiIndex for separate eah price groupsdf2.columns = df2.columns.str.split('_',expand=True)def f(x):    #remove first level    x.columns = x.columns.droplevel(0)    out = []    #loop each row    for v in x.itertuples(index=False):        #remove prefix        t = v.out.replace(x.name+'_','')        #loop each namedtuple and replace values        for k1,v1 in v._asdict().items():            t = t.replace(k1,str(v1))        #pd.eval cannot working with min,max,so handled different        if t.startswith('min'):            out.append(min(pd.eval(t[3:])))        elif t.startswith('max'):            out.append(max(pd.eval(t[3:])))        #handled +-*/        else:            out.append(pd.eval(t))    #return back    return pd.SerIEs(out)#overwrite original columnsdf[final_cols] = df2.groupby(level=0,axis=1).apply(f).add_suffix('_out')#if necessary remove helpersdf = df.drop(removed_cols,axis=1)
22 NaN price1_col1 22.0 1 67 NaN price1_col1 67.0 2 22 28.0 price1_col1+price1_col2 22.0 3 67 99.0 price1_col1+price1_col2 67.0 4 87 98.0 max(price1_col1,price1_col2) 98.0 5 98 99.0 max(price1_col1,price1_col2) 99.0 price2_col2 price2_out 0 NaN price2_col1 1 NaN price2_col1 2 28.0 price2_col1+price2_col2 3 99.0 price2_col1+price2_col2 4 87.0 min(price2_col1,price2_col2) 5 98.0 min(price2_col1,price2_col2)
print (df)    ID    user  product  price[78]  price[79]  Source  price1_out  price2_out0  105  dummya      egg         22       28.0  sheet1        22.0        22.01  119  dummy1     Soya         67        NaN  sheet1        67.0        67.02  567  dummya  spinach         22       28.0  sheet2        50.0        50.03  897  dummy1     rose         67       99.0  sheet2       166.0       166.04  345  dummya      egg         87       98.0                98.0        87.05  121  dummy1   potato         98       99.0                99.0        98.0
总结

以上是内存溢出为你收集整理的如何合并具有不同标头但条件相同的数据的csv文件全部内容,希望文章能够帮你解决如何合并具有不同标头但条件相同的数据的csv文件所遇到的程序开发问题。

如果觉得内存溢出网站内容还不错,欢迎将内存溢出网站推荐给程序员好友。

欢迎分享,转载请注明来源:内存溢出

原文地址: http://www.outofmemory.cn/langs/1191053.html

(0)
打赏 微信扫一扫 微信扫一扫 支付宝扫一扫 支付宝扫一扫
上一篇 2022-06-03
下一篇 2022-06-03

发表评论

登录后才能评论

评论列表(0条)

保存