I'i d like to dosee Europe as a nuclear-free zone.请问

Ahoy, squirts!
here with a review of a flick that's out in Utah (read below to find out why) and is going theater to theater currently. I've heard lots of great stuff about this documentary, a film that follows the big splash of Utah Valley State College inviting Michael Moore as a speaker back in September. This had a big impact on the conservative community and entertainment ensues as the right plays tug-o-war with the left. This doc is supposed to underline this whole Red State Vs. Blue State syndrome that is raging all over the news these days. Anyway, here's the review! Enjoy!
Hello boys,
I'm a loooong time reader and occasional reviewer.
Not often do I get
a chance to review advanced or limited release stuff from the black
hole of Utah, but for once I do.
(Well, you know every once in a
while... I wrote an AICN review a loooong time ago for that piece of
crap Star Trek Insurection under the psuedonym "SpaceWhore") but there
is a kickass documentary film playing in limited release in Salt Lake
before it opens up in New York, Seattle and San Francisco a couple of
others in August and according the website more in September before it
comes out on DVD.
I actually found out about the doc here on Elston's
kickass weekly recap.
I really like the work the guys at the Disinformation company do and I
thought "Outfoxed" was great so I got really excited to see "This
Divided State" when I found out they were putting the DVD out in
September sometime (I don't remember the date and it doesn't seem to
be on the website)...
But This Divided State happened in Utah, so I guess that's why they
opened it here instead of elsewhere.
It's running at the Tower
theatre in the moment (a really kickass art house in SLC, they just
did Man with the Screaming brain a couple of weeks ago, it was rad) It
seems like I'm rambling, so I'll go right into the review:
Back in September of last year Utah Valley State college decided it
would be rad if they decided to invite Michael Moore to speak on
What the documentary reveals (and what us Utah natives know)
is that Utah Valley State college is in the most backwards-ass
conservative community in the country and people went crazy at the
idea of such a liberal lightning rod coming into town.
closed in and so did Stephen Greenstreet, the director of the doc.
did a Q & A last night that I caught at the 7:00 showing.
The guy that comes right into the spotlight as the outspoken
conservative spokesperson is a dude named Kay Anderson.
real-estate millionaire who lives across the street from the college
and he is nuts.
He tried suing the school to prevent Michael Moore
from speaking, he tried bribing, he even got up on a table outside the
school and was like street preaching against it.
He also predicts
that Michael Moore will bring on the apocolypse repeatedly (it's
amazing, you can see it on the trailer.
I almost didn't beleive it
when I heard it).
There's also a Michael Moore look-alike/media
whore, a kid who has no idea what he's talking about who collects a
petition against the decision, the students who invited Moore on
student council and a number of other outspoken individuals and
college proffessors.
The film is interesting and almost always entertaining.
very short (90 minutes) and paced tightly.
It's all cut
chronologically too, and although the director said he admires michael
moore, he made a conscious effort to keep himself out of it.
in the doc and he doesn't narrate it.
Everyone is able to speak for
themselves and although you learn from one of the professors that
appears in the film that there is no such thing as an objective
viewpoint, this doc comes close.
The editing really impressed me and
his depth of footage impressed me as well.
He said afterwards that he
had 9 or 10 cameras going at a time, but he paid for everything by
(he said that because of it, he and the assistant directors
are all practically on welfare, even still).
The movie gets really emotional too.
There's some times that you get
choked up, others where you get pissed and others that just make you
In between Sean Hannity and Michael Moores speeches (oh yeah,
to "balance" the debate, the college brought in Sean Hannity) there's
this godspeed song that just builds and builds and builds and it makes
you really nervous, he almost makes you feel like Michael Moore IS the
apocolypse.
And then there's a really great version of woody guthries
"This Land is Your Land" that they had a kickass local beer-band do
(The Utah County Swillers) that covers a montage of the end, post
lawsuit, post election day epilogues and it gave me the chills.
I think this film is important for people everywhere to see for a
number of reasons and I'm glad it's going to play wider.
reason is this:
this is an accurate portayal of Utah.
Utah seems so
often like a different country.
I go to California a lot for my work
and when I come back to Utah the conservative tension in the air is
Second: I think this is a good example of the larger
conflict in the country, the red vs. blue divide (not halo red vs.
Third: I think it's great to support documentaries in the
How great would it be if one screen at everyone of these
multiplexes was dedicated to docs and revivals?
You'd probably go to
the movies a lot more.
So I think people should see it just to
support the genre.
Fourth: It's really, really, really good.
it's WAY better than outfoxed.
There's actually events unfolding in
this doc, not talking heads.
Fifth: Michael Moore isn't really in it
all that much.
I thought going into this, the doc would be following
Michael Moore around, following the controversey. It's not that at
Mike only appears toward the end.
He comes in, gives his speech and
leaves immediately.
They don't manage to prevent him from coming, but
the fight is amazing to watch
SPOILERS OVER
Don't take my word for it though, check out
and you will see.
If you use this rambling piece of shit then call me Darth Swank
Please enable JavaScript to view thezone的所有意思_百度知道
zone的所有意思
这个单词的所有意思
ZONEn.1. 【电脑】指在BBS网络中,对各BBS站的所在位置及其权责所编定的一个名称zonedisplay/hide example sentencesKK: []DJ: []n.[C]1. 地带;地区I'd like to see Europe as a nuclear-free zone.我希望欧洲成为无核区。2. 带;气候带;动植物分布带Most of the world's wheat is grown in the North Temperate zone.世界上大部分小麦种植在北温带。3. 区;邮区;(电话等的)分区;时区4. (铁路等的)区段5. 范围;区域6. 【地】层vt.1. 使分成区(或带)[H][(+as/for)]The U.S. and Canada are zoned for postal delivery.美国和加拿大划分了邮区。2. (为特定目的)划出...为区[H][(+as/for)]The downtown area is zoned for commercial use.市中心被划出供商业用。3. (用带)环绕vi.1. 分成区;分成带以上结果由 Dr.eye译典通字典 提供
采纳率:52%
为您推荐:
其他类似问题
您可能关注的内容
换一换
回答问题,赢新手礼包
个人、企业类
违法有害信息,请在下方选择后提交
色情、暴力
我们会通过消息、邮箱等方式尽快将举报结果通知您。Time Series / Date functionality — pandas 0.22.0 documentation
pandas provides a relatively compact and self-contained set of tools for
performing the above tasks.
Create a range of dates:
# 72 hours starting with midnight Jan 1st, 2011
In [1]: rng = pd.date_range('1/1/2011', periods=72, freq='H')
In [2]: rng[:5]
DatetimeIndex([' 00:00:00', ' 01:00:00',
' 02:00:00', ' 03:00:00',
' 04:00:00'],
dtype='datetime64[ns]', freq='H')
Index pandas objects with dates:
In [3]: ts = pd.Series(np.random.randn(len(rng)), index=rng)
In [4]: ts.head()
Freq: H, dtype: float64
Change frequency and fill gaps:
# to 45 minute frequency and forward fill
In [5]: converted = ts.asfreq('45Min', method='pad')
In [6]: converted.head()
Freq: 45T, dtype: float64
# Daily means
In [7]: ts.resample('D').mean()
Freq: D, dtype: float64
Following table shows the type of time-related classes pandas can handle and
how to create them.
Represents a single timestamp
to_datetime, Timestamp
DatetimeIndex
Index of Timestamp
to_datetime, date_range, bdate_range, DatetimeIndex
Represents a single time span
PeriodIndex
Index of Period
period_range, PeriodIndex
However, in many cases it is more natural to associate things like change
variables with a time span instead. The span represented by Period can be
specified explicitly, or inferred from datetime string format.
For example:
In [11]: pd.Period(';)
Out[11]: Period(';, 'M')
In [12]: pd.Period(';, freq='D')
Out[12]: Period('', 'D')
Timestamp and Period can be the index. Lists of Timestamp and
Period are automatically coerced to DatetimeIndex and PeriodIndex
respectively.
In [13]: dates = [pd.Timestamp(''), pd.Timestamp(''), pd.Timestamp('')]
In [14]: ts = pd.Series(np.random.randn(3), dates)
In [15]: type(ts.index)
Out[15]: pandas.core.indexes.datetimes.DatetimeIndex
In [16]: ts.index
Out[16]: DatetimeIndex(['', '', ''], dtype='datetime64[ns]', freq=None)
In [17]: ts
dtype: float64
In [18]: periods = [pd.Period(';), pd.Period(';), pd.Period(';)]
In [19]: ts = pd.Series(np.random.randn(3), periods)
In [20]: type(ts.index)
Out[20]: pandas.core.indexes.period.PeriodIndex
In [21]: ts.index
Out[21]: PeriodIndex([';, ';, ';], dtype='period[M]', freq='M')
In [22]: ts
Freq: M, dtype: float64
pandas allows you to capture both representations and
convert between them. Under the hood, pandas represents timestamps using
instances of Timestamp and sequences of timestamps using instances of
DatetimeIndex. For regular time spans, pandas uses Period objects for
scalar values and PeriodIndex for sequences of spans. Better support for
irregular intervals with arbitrary start and end points are forth-coming in
future releases.
If you use dates which start with the day first (i.e. European style),
you can pass the dayfirst flag:
In [25]: pd.to_datetime(['04-01-'], dayfirst=True)
Out[25]: DatetimeIndex([' 10:00:00'], dtype='datetime64[ns]', freq=None)
In [26]: pd.to_datetime(['14-01-2012', '01-14-2012'], dayfirst=True)
Out[26]: DatetimeIndex(['', ''], dtype='datetime64[ns]', freq=None)
You see in the above example that dayfirst isn’t strict, so if a date
can’t be parsed with the day being first it will be parsed as if
dayfirst were False.
If you pass a single string to to_datetime, it returns a single Timestamp.
Timestamp can also accept string input, but it doesn’t accept string parsing
options like dayfirst or format, so use to_datetime if these are required.
In [27]: pd.to_datetime('')
Out[27]: Timestamp(' 00:00:00')
In [28]: pd.Timestamp('')
Out[28]: Timestamp(' 00:00:00')
Providing a Format Argument
In addition to the required datetime string, a format argument can be passed to ensure specific parsing.
This could also potentially speed up the conversion considerably.
In [29]: pd.to_datetime('', format='%Y/%m/%d')
Out[29]: Timestamp(' 00:00:00')
In [30]: pd.to_datetime('12-11-', format='%d-%m-%Y %H:%M')
Out[30]: Timestamp(' 00:00:00')
For more information on how to specify the format options, see .
You can also pass a DataFrame of integer or string columns to assemble into a Series of Timestamps.
In [31]: df = pd.DataFrame({'year': [2015, 2016],
'month': [2, 3],
'day': [4, 5],
'hour': [2, 3]})
In [32]: pd.to_datetime(df)
dtype: datetime64[ns]
You can pass only the columns that you need to assemble.
In [33]: pd.to_datetime(df[['year', 'month', 'day']])
dtype: datetime64[ns]
pd.to_datetime looks for standard designations of the datetime component in the column names, including:
required: year, month, day
optional: hour, minute, second, millisecond, microsecond, nanosecond
Invalid Data
In version 0.17.0, the default for to_datetime is now errors='raise', rather than errors='ignore'. This means
that invalid parsing will raise rather that return the original input as in previous versions.
The default behavior, errors='raise', is to raise when unparseable:
In [2]: pd.to_datetime(['', 'asd'], errors='raise')
ValueError: Unknown string format
Pass errors='ignore' to return the original input when unparseable:
In [34]: pd.to_datetime(['', 'asd'], errors='ignore')
Out[34]: array(['', 'asd'], dtype=object)
Pass errors='coerce' to convert unparseable data to NaT (not a time):
In [35]: pd.to_datetime(['', 'asd'], errors='coerce')
Out[35]: DatetimeIndex(['', 'NaT'], dtype='datetime64[ns]', freq=None)
Epoch times will be rounded to the nearest nanosecond.
Conversion of float epoch times can lead to inaccurate and unexpected results.
have about 15 digits precision in
decimal. Rounding during conversion from float to high precision Timestamp is
unavoidable. The only way to achieve exact precision is to use a fixed-width
types (e.g. an int64).
In [38]: pd.to_datetime([.433, .], unit='s')
Out[38]: DatetimeIndex([' 15:16:45.;, ' 15:16:45.;], dtype='datetime64[ns]', freq=None)
In [39]: pd.to_datetime(3502912, unit='ns')
Out[39]: Timestamp(' 15:16:45.')
We convert the DatetimeIndex to an int64 array, then divide by the conversion unit.
In [42]: stamps.view('int64') // pd.Timedelta(1, unit='s')
Out[42]: array([, , , ])
Using the origin Parameter
New in version 0.20.0.
Using the origin parameter, one can specify an alternative starting point for creation
of a DatetimeIndex. For example, to use
as the starting date:
In [43]: pd.to_datetime([1, 2, 3], unit='D', origin=pd.Timestamp(''))
Out[43]: DatetimeIndex(['', '', ''], dtype='datetime64[ns]', freq=None)
The default is set at origin='unix', which defaults to
Commonly called ‘unix epoch’ or POSIX time.
In [44]: pd.to_datetime([1, 2, 3], unit='D')
Out[44]: DatetimeIndex(['', '', ''], dtype='datetime64[ns]', freq=None)
In practice this becomes very cumbersome because we often need a very long
index with a large number of timestamps. If we need timestamps on a regular
frequency, we can use the
to create a DatetimeIndex. The default frequency for date_range is a
calendar day while the default for bdate_range is a business day:
In [50]: start = datetime(2011, 1, 1)
In [51]: end = datetime(2012, 1, 1)
In [52]: index = pd.date_range(start, end)
In [53]: index
DatetimeIndex(['', '', '', '',
'', '', '', '',
'', '',
'', '', '', '',
'', '', '', '',
'', ''],
dtype='datetime64[ns]', length=366, freq='D')
In [54]: index = pd.bdate_range(start, end)
In [55]: index
DatetimeIndex(['', '', '', '',
'', '', '', '',
'', '',
'', '', '', '',
'', '', '', '',
'', ''],
dtype='datetime64[ns]', length=260, freq='B')
Convenience functions like date_range and bdate_range can utilize a
variety of :
In [56]: pd.date_range(start, periods=1000, freq='M')
DatetimeIndex(['', '', '', '',
'', '', '', '',
'', '',
'', '', '', '',
'', '', '', '',
'', ''],
dtype='datetime64[ns]', length=1000, freq='M')
In [57]: pd.bdate_range(start, periods=250, freq='BQS')
DatetimeIndex(['', '', '', '',
'', '', '', '',
'', '',
'', '', '', '',
'', '', '', '',
'', ''],
dtype='datetime64[ns]', length=250, freq='BQS-JAN')
date_range and bdate_range make it easy to generate a range of dates
using various combinations of parameters like start, end, periods,
and freq. The start and end dates are strictly inclusive, so dates outside
of those specified will not be generated:
In [58]: pd.date_range(start, end, freq='BM')
DatetimeIndex(['', '', '', '',
'', '', '', '',
'', '', '', ''],
dtype='datetime64[ns]', freq='BM')
In [59]: pd.date_range(start, end, freq='W')
DatetimeIndex(['', '', '', '',
'', '', '', '',
'', '', '', '',
'', '', '', '',
'', '', '', '',
'', '', '', '',
'', '', '', '',
'', '', '', '',
'', '', '', '',
'', '', '', '',
'', '', '', '',
'', '', '', '',
'', '', '', '',
''],
dtype='datetime64[ns]', freq='W-SUN')
In [60]: pd.bdate_range(end=end, periods=20)
DatetimeIndex(['', '', '', '',
'', '', '', '',
'', '', '', '',
'', '', '', '',
'', '', '', ''],
dtype='datetime64[ns]', freq='B')
In [61]: pd.bdate_range(start=start, periods=20)
DatetimeIndex(['', '', '', '',
'', '', '', '',
'', '', '', '',
'', '', '', '',
'', '', '', ''],
dtype='datetime64[ns]', freq='B')
Custom Frequency Ranges
This functionality was originally exclusive to cdate_range, which is
deprecated as of version 0.21.0 in favor of bdate_range.
cdate_range only utilizes the weekmask and holidays parameters
when custom business day, ‘C’, is passed as the frequency string. Support has
been expanded with bdate_range to work with any custom frequency string.
New in version 0.21.0.
bdate_range can also generate a range of custom frequency dates by using
the weekmask and holidays parameters.
These parameters will only be
used if a custom frequency string is passed.
In [62]: weekmask = 'Mon Wed Fri'
In [63]: holidays = [datetime(2011, 1, 5), datetime(2011, 3, 14)]
In [64]: pd.bdate_range(start, end, freq='C', weekmask=weekmask, holidays=holidays)
DatetimeIndex(['', '', '', '',
'', '', '', '',
'', '',
'', '', '', '',
'', '', '', '',
'', ''],
dtype='datetime64[ns]', length=154, freq='C')
In [65]: pd.bdate_range(start, end, freq='CBMS', weekmask=weekmask)
DatetimeIndex(['', '', '', '',
'', '', '', '',
'', '', '', ''],
dtype='datetime64[ns]', freq='CBMS')
One of the main uses for DatetimeIndex is as an index for pandas objects.
The DatetimeIndex class contains many time series related optimizations:
A large range of dates for various offsets are pre-computed and cached
under the hood in order to make generating subsequent date ranges very fast
(just have to grab a slice)
Fast shifting using the shift and tshift method on pandas objects
Unioning of overlapping DatetimeIndex objects with the same frequency is
very fast (important for fast data alignment)
Quick access to date fields via properties such as year, month, etc.
Regularization functions like snap and very fast asof logic
DatetimeIndex objects have all the basic functionality of regular Index
objects, and a smorgasbord of advanced time series specific methods for easy
frequency processing.
While pandas does not force you to have a sorted date index, some of these
methods may have unexpected or incorrect behavior if the dates are unsorted.
DatetimeIndex can be used like a regular index and offers all of its
intelligent functionality like selection, slicing, etc.
In [68]: rng = pd.date_range(start, end, freq='BM')
In [69]: ts = pd.Series(np.random.randn(len(rng)), index=rng)
In [70]: ts.index
DatetimeIndex(['', '', '', '',
'', '', '', '',
'', '', '', ''],
dtype='datetime64[ns]', freq='BM')
In [71]: ts[:5].index
DatetimeIndex(['', '', '', '',
''],
dtype='datetime64[ns]', freq='BM')
In [72]: ts[::2].index
DatetimeIndex(['', '', '', '',
'', ''],
dtype='datetime64[ns]', freq='2BM')
Partial String Indexing
Dates and strings that parse to timestamps can be passed as indexing parameters:
In [73]: ts['1/31/2011']
Out[73]: -1.9531
In [74]: ts[datetime(2011, 12, 25):]
Freq: BM, dtype: float64
In [75]: ts['10/31/2011':'12/31/2011']
Freq: BM, dtype: float64
To provide convenience for accessing longer time series, you can also pass in
the year or year and month as strings:
In [76]: ts['2011']
Freq: BM, dtype: float64
In [77]: ts[';]
Freq: BM, dtype: float64
This type of slicing will work on a DataFrame with a DatetimeIndex as well. Since the
partial string selection is a form of label slicing, the endpoints will be included. This
would include matching times on an included date:
In [78]: dft = pd.DataFrame(randn(100000,1),
columns=['A'],
index=pd.date_range(';,periods=100000,freq='T'))
In [79]: dft
00:02:00 -0.154951
00:04:00 -2.179861
00:05:00 -1.369849
00:06:00 -0.954208
10:33:00 -0.293083
10:34:00 -0.059881
10:38:00 -0.286539
[100000 rows x 1 columns]
In [80]: dft['2013']
00:02:00 -0.154951
00:04:00 -2.179861
00:05:00 -1.369849
00:06:00 -0.954208
10:33:00 -0.293083
10:34:00 -0.059881
10:38:00 -0.286539
[100000 rows x 1 columns]
This starts on the very first time in the month, and includes the last date & time for the month
In [81]: dft[';:';]
00:02:00 -0.154951
00:04:00 -2.179861
00:05:00 -1.369849
00:06:00 -0.954208
23:54:00 -1.303422
23:57:00 -1.624220
23:59:00 -1.087454
[84960 rows x 1 columns]
This specifies a stop time that includes all of the times on the last day
In [82]: dft[';:'']
00:02:00 -0.154951
00:04:00 -2.179861
00:05:00 -1.369849
00:06:00 -0.954208
23:54:00 -1.303422
23:57:00 -1.624220
23:59:00 -1.087454
[84960 rows x 1 columns]
This specifies an exact stop time (and is not the same as the above)
In [83]: dft[';:' 00:00:00']
00:02:00 -0.154951
00:04:00 -2.179861
00:05:00 -1.369849
00:06:00 -0.954208
23:55:00 -0.309230
23:59:00 -0.019734
[83521 rows x 1 columns]
We are stopping on the included end-point as it is part of the index
In [84]: dft['':' 12:30:00']
00:01:00 -0.605198
00:04:00 -2.228519
00:06:00 -1.188774
12:25:00 -0.737727
12:27:00 -0.774090
12:29:00 -0.631649
[751 rows x 1 columns]
New in version 0.18.0.
DatetimeIndex partial string indexing also works on a DataFrame with a MultiIndex:
In [85]: dft2 = pd.DataFrame(np.random.randn(20, 1),
columns=['A'],
index=pd.MultiIndex.from_product([pd.date_range(';,
periods=10,
freq='12H'),
['a', 'b']]))
In [86]: dft2
00:00:00 a -0.659574
12:00:00 a -0.778425
b -0.253355
00:00:00 a -2.816159
b -1.210929
12:00:00 a
00:00:00 b -1.624463
12:00:00 a
00:00:00 a -1.256173
12:00:00 a -1.067396
b -0.660996
[20 rows x 1 columns]
In [87]: dft2.loc['']
00:00:00 a -1.256173
12:00:00 a -1.067396
b -0.660996
In [88]: idx = pd.IndexSlice
In [89]: dft2 = dft2.swaplevel(0, 1).sort_index()
In [90]: dft2.loc[idx[:, ''], :]
00:00:00 -1.256173
12:00:00 -1.067396
12:00:00 -0.660996
Slice vs. Exact Match
Changed in version 0.20.0.
The same string used as an indexing parameter can be treated either as a slice or as an exact match depending on the resolution of the index. If the string is less accurate than the index, it will be treated as a slice, otherwise as an exact match.
Consider a Series object with a minute resolution index:
In [91]: series_minute = pd.Series([1, 2, 3],
pd.DatetimeIndex([' 23:59:00',
' 00:00:00',
' 00:02:00']))
In [92]: series_minute.index.resolution
Out[92]: 'minute'
A timestamp string less accurate than a minute gives a Series object.
In [93]: series_minute[' 23']
dtype: int64
A timestamp string with minute resolution (or more accurate), gives a scalar instead, i.e. it is not casted to a slice.
In [94]: series_minute[' 23:59']
Out[94]: 1
In [95]: series_minute[' 23:59:00']
Out[95]: 1
If index resolution is second, then, the minute-accurate timestamp gives a Series.
In [96]: series_second = pd.Series([1, 2, 3],
pd.DatetimeIndex([' 23:59:59',
' 00:00:00',
' 00:00:01']))
In [97]: series_second.index.resolution
Out[97]: 'second'
In [98]: series_second[' 23:59']
dtype: int64
If the timestamp string is treated as a slice, it can be used to index DataFrame with [] as well.
In [99]: dft_minute = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]},
index=series_minute.index)
In [100]: dft_minute[' 23']
However, if the string is treated as an exact match, the selection in DataFrame‘s [] will be column-wise and not row-wise, see . For example dft_minute[' 23:59'] will raise KeyError as ' 23:59' has the same resolution as the index and there is no column with such name:
To always have unambiguous selection, whether the row is treated as a slice or a single selection, use .loc.
In [101]: dft_minute.loc[' 23:59']
23:59:00, dtype: int64
Note also that DatetimeIndex resolution cannot be less precise than day.
In [102]: series_monthly = pd.Series([1, 2, 3],
pd.DatetimeIndex([';,
In [103]: series_monthly.index.resolution
Out[103]: 'day'
In [104]: series_monthly[';] # returns Series
dtype: int64
Exact Indexing
As discussed in previous section, indexing a DatetimeIndex with a partial string depends on the “accuracy” of the period, in other words how specific the interval is in relation to the resolution of the index. In contrast, indexing with Timestamp or datetime objects is exact, because the objects have exact meaning. These also follow the semantics of including both endpoints.
These Timestamp and datetime objects have exact hours, minutes, and seconds, even though they were not explicitly specified (they are 0).
In [105]: dft[datetime(2013, 1, 1):datetime(2013,2,28)]
00:02:00 -0.154951
00:04:00 -2.179861
00:05:00 -1.369849
00:06:00 -0.954208
23:55:00 -0.309230
23:59:00 -0.019734
[83521 rows x 1 columns]
With no defaults.
In [106]: dft[datetime(2013, 1, 1, 10, 12, 0):datetime(2013, 2, 28, 10, 12, 0)]
10:12:00 -0.246733
10:13:00 -1.429225
10:14:00 -1.265339
10:16:00 -0.818200
10:08:00 -0.490372
10:12:00 -0.945450
[83521 rows x 1 columns]
Truncating & Fancy Indexing
A truncate convenience function is provided that is similar to slicing.
Note that truncate assumes a 0 value for any unspecified date component
in a DatetimeIndex in contrast to slicing which returns any partially
matching dates:
In [107]: rng2 = pd.date_range('', '', freq='W')
In [108]: ts2 = pd.Series(np.random.randn(len(rng2)), index=rng2)
In [109]: ts2.truncate(before=';, after=';)
Freq: W-SUN, dtype: float64
In [110]: ts2[';:';]
Freq: W-SUN, dtype: float64
Even complicated fancy indexing that breaks the DatetimeIndex frequency
regularity will result in a DatetimeIndex, although frequency is lost:
In [111]: ts2[[0, 2, 6]].index
Out[111]: DatetimeIndex(['', '', ''], dtype='datetime64[ns]', freq=None)
We could have done the same thing with DateOffset:
In [114]: from pandas.tseries.offsets import *
In [115]: d + DateOffset(months=4, days=5)
Out[115]: Timestamp(' 09:00:00')
The key features of a DateOffset object are:
it can be added / subtracted to/from a datetime object to obtain a
shifted date
it can be multiplied by an integer (positive or negative) so that the
increment will be applied multiple times
it has rollforward and rollback methods for moving a date forward
or backward to the next or previous “offset date”
Subclasses of DateOffset define the apply function which dictates
custom date increment logic, such as adding business days:
class BDay(DateOffset):
&&&DateOffset increments between business days&&&
def apply(self, other):
In [116]: d - 5 * BDay()
Out[116]: Timestamp(' 09:00:00')
In [117]: d + BMonthEnd()
Out[117]: Timestamp(' 09:00:00')
The rollforward and rollback methods do exactly what you would expect:
In [118]: d
Out[118]: datetime.datetime(, 9, 0)
In [119]: offset = BMonthEnd()
In [120]: offset.rollforward(d)
Out[120]: Timestamp(' 09:00:00')
In [121]: offset.rollback(d)
Out[121]: Timestamp(' 09:00:00')
It’s definitely worth exploring the pandas.tseries.offsets module and the
various docstrings for the classes.
These operations (apply, rollforward and rollback) preserves time (hour, minute, etc) information by default. To reset time, use normalize=True keyword when creating the offset instance. If normalize=True, result is normalized after the function is applied.
In [122]: day = Day()
In [123]: day.apply(pd.Timestamp(' 09:00'))
Out[123]: Timestamp(' 09:00:00')
In [124]: day = Day(normalize=True)
In [125]: day.apply(pd.Timestamp(' 09:00'))
Out[125]: Timestamp(' 00:00:00')
In [126]: hour = Hour()
In [127]: hour.apply(pd.Timestamp(' 22:00'))
Out[127]: Timestamp(' 23:00:00')
In [128]: hour = Hour(normalize=True)
In [129]: hour.apply(pd.Timestamp(' 22:00'))
Out[129]: Timestamp(' 00:00:00')
In [130]: hour.apply(pd.Timestamp(' 23:00'))
Out[130]: Timestamp(' 00:00:00')
Parametric Offsets
Some of the offsets can be “parameterized” when created to result in different
behaviors. For example, the Week offset for generating weekly data accepts a
weekday parameter which results in the generated dates always lying on a
particular day of the week:
In [131]: d
Out[131]: datetime.datetime(, 9, 0)
In [132]: d + Week()
Out[132]: Timestamp(' 09:00:00')
In [133]: d + Week(weekday=4)
Out[133]: Timestamp(' 09:00:00')
In [134]: (d + Week(weekday=4)).weekday()
Out[134]: 4
In [135]: d - Week()
Out[135]: Timestamp(' 09:00:00')
normalize option will be effective for addition and subtraction.
In [136]: d + Week(normalize=True)
Out[136]: Timestamp(' 00:00:00')
In [137]: d - Week(normalize=True)
Out[137]: Timestamp(' 00:00:00')
Another example is parameterizing YearEnd with the specific ending month:
In [138]: d + YearEnd()
Out[138]: Timestamp(' 09:00:00')
In [139]: d + YearEnd(month=6)
Out[139]: Timestamp(' 09:00:00')
If the offset class maps directly to a Timedelta (Day, Hour,
Minute, Second, Micro, Milli, Nano) it can be
used exactly like a Timedelta - see the
for more examples.
In [146]: s - Day(2)
dtype: datetime64[ns]
In [147]: td = s - pd.Series(pd.date_range('', ''))
In [148]: td
dtype: timedelta64[ns]
In [149]: td + Minute(15)
3 days 00:15:00
3 days 00:15:00
3 days 00:15:00
dtype: timedelta64[ns]
Note that some offsets (such as BQuarterEnd) do not have a
vectorized implementation.
They can still be used but may
calculate significantly slower and will show a PerformanceWarning
In [150]: rng + BQuarterEnd()
Out[150]: DatetimeIndex(['', '', ''], dtype='datetime64[ns]', freq=None)
Custom Business Days
The CDay or CustomBusinessDay class provides a parametric
BusinessDay class which can be used to create customized business day
calendars which account for local holidays and local weekend conventions.
As an interesting example, let’s look at Egypt where a Friday-Saturday weekend is observed.
In [151]: from pandas.tseries.offsets import CustomBusinessDay
In [152]: weekmask_egypt = 'Sun Mon Tue Wed Thu'
# They also observe International Workers' Day so let's
# add that for a couple of years
In [153]: holidays = ['', datetime(2013, 5, 1), np.datetime64('')]
In [154]: bday_egypt = CustomBusinessDay(holidays=holidays, weekmask=weekmask_egypt)
In [155]: dt = datetime(2013, 4, 30)
In [156]: dt + 2 * bday_egypt
Out[156]: Timestamp(' 00:00:00')
Let’s map to the weekday names
In [157]: dts = pd.date_range(dt, periods=5, freq=bday_egypt)
In [158]: pd.Series(dts.weekday, dts).map(pd.Series('Mon Tue Wed Thu Fri Sat Sun'.split()))
Freq: C, dtype: object
Holiday calendars can be used to provide the list of holidays.
section for more information.
In [159]: from pandas.tseries.holiday import USFederalHolidayCalendar
In [160]: bday_us = CustomBusinessDay(calendar=USFederalHolidayCalendar())
# Friday before MLK Day
In [161]: dt = datetime(2014, 1, 17)
# Tuesday after MLK Day (Monday is skipped because it's a holiday)
In [162]: dt + bday_us
Out[162]: Timestamp(' 00:00:00')
Monthly offsets that respect a certain holiday calendar can be defined
in the usual way.
In [163]: from pandas.tseries.offsets import CustomBusinessMonthBegin
In [164]: bmth_us = CustomBusinessMonthBegin(calendar=USFederalHolidayCalendar())
# Skip new years
In [165]: dt = datetime(2013, 12, 17)
In [166]: dt + bmth_us
Out[166]: Timestamp(' 00:00:00')
# Define date index with custom offset
In [167]: pd.DatetimeIndex(start=';,end=';,freq=bmth_us)
DatetimeIndex(['', '', '', '',
'', '', '', '',
'', '', '', '',
'', '', '', '',
'', '', '', '',
'', '', '', ''],
dtype='datetime64[ns]', freq='CBMS')
The frequency string ‘C’ is used to indicate that a CustomBusinessDay
DateOffset is used, it is important to note that since CustomBusinessDay is
a parameterised type, instances of CustomBusinessDay may differ and this is
not detectable from the ‘C’ frequency string. The user therefore needs to
ensure that the ‘C’ frequency string is used consistently within the user’s
application.
Business Hour
The BusinessHour class provides a business hour representation on BusinessDay,
allowing to use specific start and end times.
By default, BusinessHour uses 9:00 - 17:00 as business hours.
Adding BusinessHour will increment Timestamp by hourly.
If target Timestamp is out of business hours, move to the next business hour then increment it.
If the result exceeds the business hours end, remaining is added to the next business day.
In [168]: bh = BusinessHour()
In [169]: bh
Out[169]: &BusinessHour: BH=09:00-17:00&
In [170]: pd.Timestamp(' 10:00').weekday()
Out[170]: 4
In [171]: pd.Timestamp(' 10:00') + bh
Out[171]: Timestamp(' 11:00:00')
# Below example is the same as: pd.Timestamp(' 09:00') + bh
In [172]: pd.Timestamp(' 08:00') + bh
Out[172]: Timestamp(' 10:00:00')
# If the results is on the end time, move to the next business day
In [173]: pd.Timestamp(' 16:00') + bh
Out[173]: Timestamp(' 09:00:00')
# Remainings are added to the next day
In [174]: pd.Timestamp(' 16:30') + bh
Out[174]: Timestamp(' 09:30:00')
# Adding 2 business hours
In [175]: pd.Timestamp(' 10:00') + BusinessHour(2)
Out[175]: Timestamp(' 12:00:00')
# Subtracting 3 business hours
In [176]: pd.Timestamp(' 10:00') + BusinessHour(-3)
Out[176]: Timestamp(' 15:00:00')
Also, you can specify start and end time by keywords.
Argument must be str which has hour:minute representation or datetime.time instance.
Specifying seconds, microseconds and nanoseconds as business hour results in ValueError.
In [177]: bh = BusinessHour(start='11:00', end=time(20, 0))
In [178]: bh
Out[178]: &BusinessHour: BH=11:00-20:00&
In [179]: pd.Timestamp(' 13:00') + bh
Out[179]: Timestamp(' 14:00:00')
In [180]: pd.Timestamp(' 09:00') + bh
Out[180]: Timestamp(' 12:00:00')
In [181]: pd.Timestamp(' 18:00') + bh
Out[181]: Timestamp(' 19:00:00')
Passing start time later than end represents midnight business hour.
In this case, business hour exceeds midnight and overlap to the next day.
Valid business hours are distinguished by whether it started from valid BusinessDay.
In [182]: bh = BusinessHour(start='17:00', end='09:00')
In [183]: bh
Out[183]: &BusinessHour: BH=17:00-09:00&
In [184]: pd.Timestamp(' 17:00') + bh
Out[184]: Timestamp(' 18:00:00')
In [185]: pd.Timestamp(' 23:00') + bh
Out[185]: Timestamp(' 00:00:00')
# Although
is Satuaday,
# it is valid because it starts from 08-01 (Friday).
In [186]: pd.Timestamp(' 04:00') + bh
Out[186]: Timestamp(' 05:00:00')
# Although
is Monday,
# it is out of business hours because it starts from 08-03 (Sunday).
In [187]: pd.Timestamp(' 04:00') + bh
Out[187]: Timestamp(' 18:00:00')
Applying BusinessHour.rollforward and rollback to out of business hours results in
the next business hour start or previous day’s end. Different from other offsets, BusinessHour.rollforward
may output different results from apply by definition.
This is because one day’s business hour end is equal to next day’s business hour start. For example,
under the default business hours (9:00 - 17:00), there is no gap (0 minutes) between
# This adjusts a Timestamp to business hour edge
In [188]: BusinessHour().rollback(pd.Timestamp(' 15:00'))
Out[188]: Timestamp(' 17:00:00')
In [189]: BusinessHour().rollforward(pd.Timestamp(' 15:00'))
Out[189]: Timestamp(' 09:00:00')
# It is the same as BusinessHour().apply(pd.Timestamp(' 17:00')).
# And it is the same as BusinessHour().apply(pd.Timestamp(' 09:00'))
In [190]: BusinessHour().apply(pd.Timestamp(' 15:00'))
Out[190]: Timestamp(' 10:00:00')
# BusinessDay results (for reference)
In [191]: BusinessHour().rollforward(pd.Timestamp(''))
Out[191]: Timestamp(' 09:00:00')
# It is the same as BusinessDay().apply(pd.Timestamp(''))
# The result is the same as rollworward because BusinessDay never overlap.
In [192]: BusinessHour().apply(pd.Timestamp(''))
Out[192]: Timestamp(' 10:00:00')
BusinessHour regards Saturday and Sunday as holidays. To use arbitrary holidays,
you can use CustomBusinessHour offset, see :
Custom Business Hour
New in version 0.18.1.
The CustomBusinessHour is a mixture of BusinessHour and CustomBusinessDay which
allows you to specify arbitrary holidays. CustomBusinessHour works as the same
as BusinessHour except that it skips specified custom holidays.
In [193]: from pandas.tseries.holiday import USFederalHolidayCalendar
In [194]: bhour_us = CustomBusinessHour(calendar=USFederalHolidayCalendar())
# Friday before MLK Day
In [195]: dt = datetime(2014, 1, 17, 15)
In [196]: dt + bhour_us
Out[196]: Timestamp(' 16:00:00')
# Tuesday after MLK Day (Monday is skipped because it's a holiday)
In [197]: dt + bhour_us * 2
Out[197]: Timestamp(' 09:00:00')
You can use keyword arguments supported by either BusinessHour and CustomBusinessDay.
In [198]: bhour_mon = CustomBusinessHour(start='10:00', weekmask='Tue Wed Thu Fri')
# Monday is skipped because it's a holiday, business hour starts from 10:00
In [199]: dt + bhour_mon * 2
Out[199]: Timestamp(' 10:00:00')
Offset Aliases
A number of string aliases are given to useful common time series
frequencies. We will refer to these aliases as offset aliases.
business day frequency
custom business day frequency
calendar day frequency
weekly frequency
month end frequency
semi-month end frequency (15th and end of month)
business month end frequency
custom business month end frequency
month start frequency
semi-month start frequency (1st and 15th)
business month start frequency
custom business month start frequency
quarter end frequency
business quarter end frequency
quarter start frequency
business quarter start frequency
year end frequency
business year end frequency
year start frequency
business year start frequency
business hour frequency
hourly frequency
minutely frequency
secondly frequency
milliseconds
microseconds
nanoseconds
Combining Aliases
As we have seen previously, the alias and the offset instance are fungible in
most functions:
In [200]: pd.date_range(start, periods=5, freq='B')
DatetimeIndex(['', '', '', '',
''],
dtype='datetime64[ns]', freq='B')
In [201]: pd.date_range(start, periods=5, freq=BDay())
DatetimeIndex(['', '', '', '',
''],
dtype='datetime64[ns]', freq='B')
You can combine together day and intraday offsets:
In [202]: pd.date_range(start, periods=10, freq='2h20min')
DatetimeIndex([' 00:00:00', ' 02:20:00',
' 04:40:00', ' 07:00:00',
' 09:20:00', ' 11:40:00',
' 14:00:00', ' 16:20:00',
' 18:40:00', ' 21:00:00'],
dtype='datetime64[ns]', freq='140T')
In [203]: pd.date_range(start, periods=10, freq='1D10U')
DatetimeIndex([
' 00:00:00', ' 00:00:00.;,
' 00:00:00.;, ' 00:00:00.;,
' 00:00:00.;, ' 00:00:00.;,
' 00:00:00.;, ' 00:00:00.;,
' 00:00:00.;, ' 00:00:00.;],
dtype='datetime64[ns]', freq='U')
Anchored Offsets
For some frequencies you can specify an anchoring suffix:
weekly frequency (Sundays). Same as ‘W’
weekly frequency (Mondays)
weekly frequency (Tuesdays)
weekly frequency (Wednesdays)
weekly frequency (Thursdays)
weekly frequency (Fridays)
weekly frequency (Saturdays)
(B)Q(S)-DEC
quarterly frequency, year ends in December. Same as ‘Q’
(B)Q(S)-JAN
quarterly frequency, year ends in January
(B)Q(S)-FEB
quarterly frequency, year ends in February
(B)Q(S)-MAR
quarterly frequency, year ends in March
(B)Q(S)-APR
quarterly frequency, year ends in April
(B)Q(S)-MAY
quarterly frequency, year ends in May
(B)Q(S)-JUN
quarterly frequency, year ends in June
(B)Q(S)-JUL
quarterly frequency, year ends in July
(B)Q(S)-AUG
quarterly frequency, year ends in August
(B)Q(S)-SEP
quarterly frequency, year ends in September
(B)Q(S)-OCT
quarterly frequency, year ends in October
(B)Q(S)-NOV
quarterly frequency, year ends in November
(B)A(S)-DEC
annual frequency, anchored end of December. Same as ‘A’
(B)A(S)-JAN
annual frequency, anchored end of January
(B)A(S)-FEB
annual frequency, anchored end of February
(B)A(S)-MAR
annual frequency, anchored end of March
(B)A(S)-APR
annual frequency, anchored end of April
(B)A(S)-MAY
annual frequency, anchored end of May
(B)A(S)-JUN
annual frequency, anchored end of June
(B)A(S)-JUL
annual frequency, anchored end of July
(B)A(S)-AUG
annual frequency, anchored end of August
(B)A(S)-SEP
annual frequency, anchored end of September
(B)A(S)-OCT
annual frequency, anchored end of October
(B)A(S)-NOV
annual frequency, anchored end of November
These can be used as arguments to date_range, bdate_range, constructors
for DatetimeIndex, as well as various other timeseries-related functions
in pandas.
Anchored Offset Semantics
For those offsets that are anchored to the start or end of specific
frequency (MonthEnd, MonthBegin, WeekEnd, etc) the following
rules apply to rolling forward and backwards.
When n is not 0, if the given date is not on an anchor point, it snapped to the next(previous)
anchor point, and moved |n|-1 additional steps forwards or backwards.
In [204]: pd.Timestamp('') + MonthBegin(n=1)
Out[204]: Timestamp(' 00:00:00')
In [205]: pd.Timestamp('') + MonthEnd(n=1)
Out[205]: Timestamp(' 00:00:00')
In [206]: pd.Timestamp('') - MonthBegin(n=1)
Out[206]: Timestamp(' 00:00:00')
In [207]: pd.Timestamp('') - MonthEnd(n=1)
Out[207]: Timestamp(' 00:00:00')
In [208]: pd.Timestamp('') + MonthBegin(n=4)
Out[208]: Timestamp(' 00:00:00')
In [209]: pd.Timestamp('') - MonthBegin(n=4)
Out[209]: Timestamp(' 00:00:00')
If the given date is on an anchor point, it is moved |n| points forwards
or backwards.
In [210]: pd.Timestamp('') + MonthBegin(n=1)
Out[210]: Timestamp(' 00:00:00')
In [211]: pd.Timestamp('') + MonthEnd(n=1)
Out[211]: Timestamp(' 00:00:00')
In [212]: pd.Timestamp('') - MonthBegin(n=1)
Out[212]: Timestamp(' 00:00:00')
In [213]: pd.Timestamp('') - MonthEnd(n=1)
Out[213]: Timestamp(' 00:00:00')
In [214]: pd.Timestamp('') + MonthBegin(n=4)
Out[214]: Timestamp(' 00:00:00')
In [215]: pd.Timestamp('') - MonthBegin(n=4)
Out[215]: Timestamp(' 00:00:00')
For the case when n=0, the date is not moved if on an anchor point, otherwise
it is rolled forward to the next anchor point.
In [216]: pd.Timestamp('') + MonthBegin(n=0)
Out[216]: Timestamp(' 00:00:00')
In [217]: pd.Timestamp('') + MonthEnd(n=0)
Out[217]: Timestamp(' 00:00:00')
In [218]: pd.Timestamp('') + MonthBegin(n=0)
Out[218]: Timestamp(' 00:00:00')
In [219]: pd.Timestamp('') + MonthEnd(n=0)
Out[219]: Timestamp(' 00:00:00')
Holidays / Holiday Calendars
Holidays and calendars provide a simple way to define holiday rules to be used
with CustomBusinessDay or in other analysis that requires a predefined
set of holidays.
The AbstractHolidayCalendar class provides all the necessary
methods to return a list of holidays and only rules need to be defined
in a specific holiday calendar class.
Further, start_date and end_date
class attributes determine over what date range holidays are generated.
should be overwritten on the AbstractHolidayCalendar class to have the range
apply to all calendar subclasses.
USFederalHolidayCalendar is the
only calendar that exists and primarily serves as an example for developing
other calendars.
For holidays that occur on fixed dates (e.g., US Memorial Day or July 4th) an
observance rule determines when that holiday is observed if it falls on a weekend
or some other non-observed day.
Defined observance rules are:
nearest_workday
move Saturday to Friday and Sunday to Monday
sunday_to_monday
move Sunday to following Monday
next_monday_or_tuesday
move Saturday to Monday and Sunday/Monday to Tuesday
previous_friday
move Saturday and Sunday to previous Friday”
next_monday
move Saturday and Sunday to following Monday
An example of how holidays and holiday calendars are defined:
In [220]: from pandas.tseries.holiday import Holiday, USMemorialDay,\
AbstractHolidayCalendar, nearest_workday, MO
In [221]: class ExampleCalendar(AbstractHolidayCalendar):
USMemorialDay,
Holiday('July 4th', month=7, day=4, observance=nearest_workday),
Holiday('Columbus Day', month=10, day=1,
offset=DateOffset(weekday=MO(2))), #same as 2*Week(weekday=2)
In [222]: cal = ExampleCalendar()
In [223]: cal.holidays(datetime(2012, 1, 1), datetime(2012, 12, 31))
Out[223]: DatetimeIndex(['', '', ''], dtype='datetime64[ns]', freq=None)
Using this calendar, creating an index or doing offset arithmetic skips weekends
and holidays (i.e., Memorial Day/July 4th).
For example, the below defines
a custom business day offset using the ExampleCalendar.
Like any other offset,
it can be used to create a DatetimeIndex or added to datetime
or Timestamp objects.
In [224]: from pandas.tseries.offsets import CDay
In [225]: pd.DatetimeIndex(start='7/1/2012', end='7/10/2012',
freq=CDay(calendar=cal)).to_pydatetime()
array([datetime.datetime(, 0, 0),
datetime.datetime(, 0, 0),
datetime.datetime(, 0, 0),
datetime.datetime(, 0, 0),
datetime.datetime(, 0, 0),
datetime.datetime(, 0, 0)], dtype=object)
In [226]: offset = CustomBusinessDay(calendar=cal)
In [227]: datetime(2012, 5, 25) + offset
Out[227]: Timestamp(' 00:00:00')
In [228]: datetime(2012, 7, 3) + offset
Out[228]: Timestamp(' 00:00:00')
In [229]: datetime(2012, 7, 3) + 2 * offset
Out[229]: Timestamp(' 00:00:00')
In [230]: datetime(2012, 7, 6) + offset
Out[230]: Timestamp(' 00:00:00')
Ranges are defined by the start_date and end_date class attributes
of AbstractHolidayCalendar.
The defaults are below.
In [231]: AbstractHolidayCalendar.start_date
Out[231]: Timestamp(' 00:00:00')
In [232]: AbstractHolidayCalendar.end_date
Out[232]: Timestamp(' 00:00:00')
These dates can be overwritten by setting the attributes as
datetime/Timestamp/string.
In [233]: AbstractHolidayCalendar.start_date = datetime(2012, 1, 1)
In [234]: AbstractHolidayCalendar.end_date = datetime(2012, 12, 31)
In [235]: cal.holidays()
Out[235]: DatetimeIndex(['', '', ''], dtype='datetime64[ns]', freq=None)
Every calendar class is accessible by name using the get_calendar function
which returns a holiday class instance.
Any imported calendar class will
automatically be available by this function.
Also, HolidayCalendarFactory
provides an easy interface to create calendars that are combinations of calendars
or calendars with additional rules.
In [236]: from pandas.tseries.holiday import get_calendar, HolidayCalendarFactory,\
USLaborDay
In [237]: cal = get_calendar('ExampleCalendar')
In [238]: cal.rules
[Holiday: MemorialDay (month=5, day=31, offset=&DateOffset: kwds={'weekday': MO(-1)}&),
Holiday: July 4th (month=7, day=4, observance=&function nearest_workday at 0x&),
Holiday: Columbus Day (month=10, day=1, offset=&DateOffset: kwds={'weekday': MO(+2)}&)]
In [239]: new_cal = HolidayCalendarFactory('NewExampleCalendar', cal, USLaborDay)
In [240]: new_cal.rules
[Holiday: Labor Day (month=9, day=1, offset=&DateOffset: kwds={'weekday': MO(+1)}&),
Holiday: MemorialDay (month=5, day=31, offset=&DateOffset: kwds={'weekday': MO(-1)}&),
Holiday: July 4th (month=7, day=4, observance=&function nearest_workday at 0x&),
Holiday: Columbus Day (month=10, day=1, offset=&DateOffset: kwds={'weekday': MO(+2)}&)]
The shift method accepts an freq argument which can accept a
DateOffset class or other timedelta-like object or also a :
In [243]: ts.shift(5, freq=offsets.BDay())
dtype: float64
In [244]: ts.shift(5, freq='BM')
Freq: BM, dtype: float64
Rather than changing the alignment of the data and the index, DataFrame and
Series objects also have a tshift convenience method that changes
all the dates in the index by a specified number of offsets:
In [245]: ts.tshift(5, freq='D')
dtype: float64
Note that with tshift, the leading entry is no longer NaN because the data
is not being realigned.
Frequency Conversion
The primary function for changing frequencies is the asfreq function.
For a DatetimeIndex, this is basically just a thin, but convenient wrapper
around reindex which generates a date_range and calls reindex.
In [246]: dr = pd.date_range('1/1/2010', periods=3, freq=3 * offsets.BDay())
In [247]: ts = pd.Series(randn(3), index=dr)
In [248]: ts
Freq: 3B, dtype: float64
In [249]: ts.asfreq(BDay())
Freq: B, dtype: float64
asfreq provides a further convenience so you can specify an interpolation
method for any gaps that may appear after the frequency conversion
In [250]: ts.asfreq(BDay(), method='pad')
Freq: B, dtype: float64
Filling Forward / Backward
Related to asfreq and reindex is the fillna function documented in
Resampling
The interface to .resample has changed in 0.18.0 to be more groupby-like and hence more flexible.
for a comparison with prior versions.
Pandas has a simple, powerful, and efficient functionality for
performing resampling operations during frequency conversion (e.g., converting
secondly data into 5-minutely data). This is extremely common in, but not
limited to, financial applications.
.resample() is a time-based groupby, followed by a reduction method on each of its groups.
for some advanced strategies
Starting in version 0.18.1, the resample() function can be used directly from
DataFrameGroupBy objects, see the .
.resample() is similar to using a .rolling() operation with a time-based offset, see a discussion
In [251]: rng = pd.date_range('1/1/2012', periods=100, freq='S')
In [252]: ts = pd.Series(np.random.randint(0, 500, len(rng)), index=rng)
In [253]: ts.resample('5Min').sum()
Freq: 5T, dtype: int64
The resample function is very flexible and allows you to specify many
different parameters to control the frequency conversion and resampling
operation.
Any function available via
is available as
a method of the returned object, including sum, mean, std, sem,
max, min, median, first, last, ohlc:
In [254]: ts.resample('5Min').mean()
Freq: 5T, dtype: float64
In [255]: ts.resample('5Min').ohlc()
In [256]: ts.resample('5Min').max()
Freq: 5T, dtype: int64
For downsampling, closed can be set to ‘left’ or ‘right’ to specify which
end of the interval is closed:
In [257]: ts.resample('5Min', closed='right').mean()
296.000000
256.131313
Freq: 5T, dtype: float64
In [258]: ts.resample('5Min', closed='left').mean()
Freq: 5T, dtype: float64
Parameters like label and loffset are used to manipulate the resulting
labels. label specifies whether the result is labeled with the beginning or
the end of the interval. loffset performs a time adjustment on the output
In [259]: ts.resample('5Min').mean() # by default label='left'
Freq: 5T, dtype: float64
In [260]: ts.resample('5Min', label='left').mean()
Freq: 5T, dtype: float64
In [261]: ts.resample('5Min', label='left', loffset='1s').mean()
dtype: float64
The default values for label and closed is ‘left’ for all
frequency offsets except for ‘M’, ‘A’, ‘Q’, ‘BM’, ‘BA’, ‘BQ’, and ‘W’
which all have a default of ‘right’.
In [262]: rng2 = pd.date_range('1/1/2012', end='3/31/2012', freq='D')
In [263]: ts2 = pd.Series(range(len(rng2)), index=rng2)
# default: label='right', closed='right'
In [264]: ts2.resample('M').max()
Freq: M, dtype: int64
# default: label='left', closed='left'
In [265]: ts2.resample('SM').max()
Freq: SM-15, dtype: int64
In [266]: ts2.resample('SM', label='right', closed='right').max()
Freq: SM-15, dtype: float64
The axis parameter can be set to 0 or 1 and allows you to resample the
specified axis for a DataFrame.
kind can be set to ‘timestamp’ or ‘period’ to convert the resulting index
to/from timestamp and time span representations. By default resample
retains the input representation.
convention can be set to ‘start’ or ‘end’ when resampling period data
(detail below). It specifies how low frequency periods are converted to higher
frequency periods.
Upsampling
For upsampling, you can specify a way to upsample and the limit parameter to interpolate over the gaps that are created:
# from secondly to every 250 milliseconds
In [267]: ts[:2].resample('250L').asfreq()
00:00:00.000
00:00:00.250
00:00:00.500
00:00:00.750
00:00:01.000
Freq: 250L, dtype: float64
In [268]: ts[:2].resample('250L').ffill()
00:00:00.000
00:00:00.250
00:00:00.500
00:00:00.750
00:00:01.000
Freq: 250L, dtype: int64
In [269]: ts[:2].resample('250L').ffill(limit=2)
00:00:00.000
00:00:00.250
00:00:00.500
00:00:00.750
00:00:01.000
Freq: 250L, dtype: float64
Sparse Resampling
Sparse timeseries are ones where you have a lot fewer points relative
to the amount of time you are looking to resample. Naively upsampling a sparse series can potentially
generate lots of intermediate values. When you don’t want to use a method to fill these values, e.g. fill_method is None,
then intermediate values will be filled with NaN.
Since resample is a time-based groupby, the following is a method to efficiently
resample only the groups that are not all NaN
In [270]: rng = pd.date_range('', periods=100, freq='D') + pd.Timedelta('1s')
In [271]: ts = pd.Series(range(100), index=rng)
If we want to resample to the full range of the series
In [272]: ts.resample('3T').sum()
Freq: 3T, Length: 47521, dtype: int64
We can instead only resample those groups where we have points as follows:
In [273]: from functools import partial
In [274]: from pandas.tseries.frequencies import to_offset
In [275]: def round(t, freq):
freq = to_offset(freq)
return pd.Timestamp((t.value // freq.delta.value) * freq.delta.value)
In [276]: ts.groupby(partial(round, freq='3T')).sum()
Length: 100, dtype: int64
Aggregation
Similar to the , , and
a Resampler can be selectively resampled.
Resampling a DataFrame, the default will be to act on all columns with the same function.
In [277]: df = pd.DataFrame(np.random.randn(1000, 3),
index=pd.date_range('1/1/2012', freq='S', periods=1000),
columns=['A', 'B', 'C'])
In [278]: r = df.resample('3T')
In [279]: r.mean()
00:00:00 -0...024750
0...029548
0...043691
00:09:00 -0...097222
0...167380
0...106213
We can select a specific column or columns using standard getitem.
In [280]: r['A'].mean()
Freq: 3T, Name: A, dtype: float64
In [281]: r[['A','B']].mean()
00:00:00 -0..085117
00:09:00 -0..053819
You can pass a list or dict of functions to do aggregation with, outputting a DataFrame:
In [282]: r['A'].agg([np.sum, np.mean, np.std])
-6...985150
9...078022
21...996365
00:09:00 -19...914070
5...100055
6...001532
On a resampled DataFrame, you can pass a list of functions to apply to each
column, which produces an aggregated result with a hierarchical index:
In [283]: r.agg([np.sum, np.mean])
-6....085117
9....061477
21..121377
-1..010630
00:09:00 -19..106814
-9...499920
5.....128432
-5...621260
00:00:00 -0.024750
00:06:00 -0.043691
00:15:00 -0.106213
By passing a dict to aggregate you can apply a different aggregation to the
columns of a DataFrame:
In [284]: r.agg({'A' : np.sum,
'B' : lambda x: np.std(x, ddof=1)})
-6..087752
21..954588
00:09:00 -19..027990
The function names can also be strings. In order for a string to be valid it
must be implemented on the Resampled object
In [285]: r.agg({'A' : 'sum', 'B' : 'std'})
-6..087752
21..954588
00:09:00 -19..027990
Furthermore, you can also specify multiple aggregation functions for each column separately.
In [286]: r.agg({'A' : ['sum','std'], 'B' : ['mean','std'] })
-6....087752
9....014552
21....954588
00:09:00 -19....027990
5....021503
6....004984
If a DataFrame does not have a datetimelike index, but instead you want
to resample based on datetimelike column in the frame, it can passed to the
on keyword.
In [287]: df = pd.DataFrame({'date': pd.date_range('', freq='W', periods=5),
'a': np.arange(5)},
index=pd.MultiIndex.from_arrays([
[1,2,3,4,5],
pd.date_range('', freq='W', periods=5)],
names=['v','d']))
In [288]: df
In [289]: df.resample('M', on='date').sum()
Similarly, if you instead want to resample by a datetimelike
level of MultiIndex, its name or location can be passed to the
level keyword.
In [290]: df.resample('M', level='d').sum()
Adding and subtracting integers from periods shifts the period by its own
frequency. Arithmetic is not allowed between Period with different freq (span).
In [295]: p = pd.Period('2012', freq='A-DEC')
In [296]: p + 1
Out[296]: Period('2013', 'A-DEC')
In [297]: p - 3
Out[297]: Period('2009', 'A-DEC')
In [298]: p = pd.Period(';, freq='2M')
In [299]: p + 2
Out[299]: Period(';, '2M')
In [300]: p - 1
Out[300]: Period(';, '2M')
In [301]: p == pd.Period(';, freq='3M')
---------------------------------------------------------------------------
IncompatibleFrequency
Traceback (most recent call last)
&ipython-input-301-4b67dc0b596c& in &module&()
----& 1 p == pd.Period(';, freq='3M')
~/Envs/pandas-dev/lib/python3.6/site-packages/pandas/pandas/_libs/period.pyx in pandas._libs.period._Period.__richcmp__()
IncompatibleFrequency: Input has different freq=3M from Period(freq=2M)
If Period freq is daily or higher (D, H, T, S, L, U, N), offsets and timedelta-like can be added if the result can have the same freq. Otherwise, ValueError will be raised.
In [302]: p = pd.Period(' 09:00', freq='H')
In [303]: p + Hour(2)
Out[303]: Period(' 11:00', 'H')
In [304]: p + timedelta(minutes=120)
Out[304]: Period(' 11:00', 'H')
In [305]: p + np.timedelta64(7200, 's')
Out[305]: Period(' 11:00', 'H')
In [1]: p + Minute(5)
ValueError: Input has different freq from Period(freq=H)
If Period has other freqs, only the same offsets can be added. Otherwise, ValueError will be raised.
In [306]: p = pd.Period(';, freq='M')
In [307]: p + MonthEnd(3)
Out[307]: Period(';, 'M')
In [1]: p + MonthBegin(3)
ValueError: Input has different freq from Period(freq=M)
Taking the difference of Period instances with the same frequency will
return the number of frequency units between them:
In [308]: pd.Period('2012', freq='A-DEC') - pd.Period('2002', freq='A-DEC')
Out[308]: 10
PeriodIndex and period_range
Regular sequences of Period objects can be collected in a PeriodIndex,
which can be constructed using the period_range convenience function:
In [309]: prng = pd.period_range('1/1/2011', '1/1/2012', freq='M')
In [310]: prng
PeriodIndex([';, ';, ';, ';, ';, ';,
';, ';, ';, ';, ';, ';,
dtype='period[M]', freq='M')
The PeriodIndex constructor can also be used directly:
In [311]: pd.PeriodIndex([';, ';, ';], freq='M')
Out[311]: PeriodIndex([';, ';, ';], dtype='period[M]', freq='M')
Passing multiplied frequency outputs a sequence of Period which
has multiplied span.
In [312]: pd.PeriodIndex(start=';, freq='3M', periods=4)
Out[312]: PeriodIndex([';, ';, ';, ';], dtype='period[3M]', freq='3M')
If start or end are Period objects, they will be used as anchor
endpoints for a PeriodIndex with frequency matching that of the
PeriodIndex constructor.
In [313]: pd.PeriodIndex(start=pd.Period(';, freq='Q'),
end=pd.Period(';, freq='Q'), freq='M')
Out[313]: PeriodIndex([';, ';, ';, ';], dtype='period[M]', freq='M')
Just like DatetimeIndex, a PeriodIndex can also be used to index pandas
In [314]: ps = pd.Series(np.random.randn(len(prng)), prng)
In [315]: ps
Freq: M, dtype: float64
PeriodIndex supports addition and subtraction with the same rule as Period.
In [316]: idx = pd.period_range(' 09:00', periods=5, freq='H')
In [317]: idx
PeriodIndex([' 09:00', ' 10:00', ' 11:00',
' 12:00', ' 13:00'],
dtype='period[H]', freq='H')
In [318]: idx + Hour(2)
PeriodIndex([' 11:00', ' 12:00', ' 13:00',
' 14:00', ' 15:00'],
dtype='period[H]', freq='H')
In [319]: idx = pd.period_range(';, periods=5, freq='M')
In [320]: idx
Out[320]: PeriodIndex([';, ';, ';, ';, ';], dtype='period[M]', freq='M')
In [321]: idx + MonthEnd(3)
Out[321]: PeriodIndex([';, ';, ';, ';, ';], dtype='period[M]', freq='M')
PeriodIndex has its own dtype named period, refer to .
Period Dtypes
New in version 0.19.0.
PeriodIndex has a custom period dtype. This is a pandas extension
dtype similar to the
(datetime64[ns, tz]).
The period dtype holds the freq attribute and is represented with
period[freq] like period[D] or period[M], using .
In [322]: pi = pd.period_range('', periods=3, freq='M')
In [323]: pi
Out[323]: PeriodIndex([';, ';, ';], dtype='period[M]', freq='M')
In [324]: pi.dtype
Out[324]: period[M]
The period dtype can be used in .astype(...). It allows one to change the
freq of a PeriodIndex like .asfreq() and convert a
DatetimeIndex to PeriodIndex like to_period(

我要回帖

更多关于 i d like to do 的文章

 

随机推荐