# -*- Coding: utf-8 -*-import numpy as npimport pandas as pd# multi-indexed dataframedf = pd.DataFrame(np.random.randn(8760 * 3,3))df['concept'] = "some_value"df['datetime'] = pd.date_range(start='2016',periods=len(df),freq='60Min')df.set_index(['concept','datetime'],inplace=True)df.sort_index(inplace=True)
控制台输出:
df.head()Out[23]: 0 1 2datetime 2016 0.458802 0.413004 0.0910562016 -0.051840 -1.780310 -0.3041222016 -1.119973 0.954591 0.2790492016 -0.691850 -0.489335 0.5542722016 -1.278834 -1.292012 -0.637931df.head() ...: df.tail()Out[24]: 0 1 2datetime 2018 -1.872155 0.434520 -0.5265202018 0.345213 0.989475 -0.8920282018 -0.162491 0.908121 -0.9934992018 -1.094727 0.307312 0.5150412018 -0.880608 -1.065203 -1.438645
现在我想在’datetime’级别创建年度总和.
我的第一次尝试是以下,但这不起作用:
# sum along yearsyears = df.index.get_level_values('datetime').year.toList()df.index.set_levels([years],level=['datetime'],inplace=True)df = df.groupby(level=['datetime']).sum()
这对我来说似乎也很沉重,因为这个任务可能很容易实现.
所以这是我的问题:如何获得“日期时间”级别的年度总和?有没有一种简单的方法来通过将函数应用于DateTime级别值来实现这一点?
解决方法 您可以通过第二级multiindex和year
获得 groupby
: # -*- Coding: utf-8 -*-import numpy as npimport pandas as pd# multi-indexed dataframedf = pd.DataFrame(np.random.randn(8760 * 3,inplace=True)df.sort_index(inplace=True)print df.head() 0 1 2concept datetime some_value 2016-01-01 00:00:00 1.973437 0.101535 -0.693360 2016-01-01 01:00:00 1.221657 -1.983806 -0.075609 2016-01-01 02:00:00 -0.208122 -2.203801 1.254084 2016-01-01 03:00:00 0.694332 -0.235864 0.538468 2016-01-01 04:00:00 -0.928815 -1.417445 1.534218# sum along years#years = df.index.get_level_values('datetime').year.toList()#df.index.set_levels([years],inplace=True)print df.index.levels[1].year[2016 2016 2016 ...,2018 2018 2018]df = df.groupby(df.index.levels[1].year).sum()print df.head() 0 1 22016 -93.901914 -32.205514 -22.4609652017 205.681817 67.701669 -33.9608012018 67.438355 150.954614 -21.381809
或者您可以使用get_level_values
和year
:
df = df.groupby(df.index.get_level_values('datetime').year).sum()print df.head() 0 1 22016 -93.901914 -32.205514 -22.4609652017 205.681817 67.701669 -33.9608012018 67.438355 150.954614 -21.381809总结
以上是内存溢出为你收集整理的python – 具有MultiIndex的Pandas DataFrame:按DateTime级别值的年份分组全部内容,希望文章能够帮你解决python – 具有MultiIndex的Pandas DataFrame:按DateTime级别值的年份分组所遇到的程序开发问题。
如果觉得内存溢出网站内容还不错,欢迎将内存溢出网站推荐给程序员好友。
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)