pandas.Series
class pandas.Series(data=None, index=None, dtype=None, name=None, copy=False, fastpath=False)
One-dimensional ndarray with axis labels (including time series).
Labels need not be unique but must be any hashable type. The object supports both integer- and label-based indexing and provides a host of methods for performing operations involving the index. Statistical methods from ndarray have been overridden to automatically exclude missing data (currently represented as NaN)
Operations between Series (+, -, /, , *) align values based on their associated index values– they need not be the same length. The result index will be the sorted union of the two indexes.
Parameters : | data : array-like, dict, or scalar value
index : array-like or Index (1d)
dtype : numpy.dtype or None
copy : boolean, default False
|
---|
Series 类似数组,但是它有标签(label) 或者索引(index).
1. 从最简单的series开始看。
from pandas import Series, DataFrameimport pandas as pd ser1 = Series([1,2,3,4])print(ser1)#0 1#1 2#2 3#3 4#dtype: int64
此时因为没有设置index,所以用默认
2. 加上索引
ser2 = Series(range(4),index=['a','b','c','d'])print(ser2)#a 0#b 1#c 2#d 3#dtype: int64
3. dictionnary 作为输入
dict1 = { 'ohio':35000,'Texas':71000,'Oregon':1600,'Utah':500}ser3 = Series(dict1)#Oregon 1600#Texas 71000#Utah 500#ohio 35000#dtype: int64
key:默认设置为index
dict1 = { 'ohio':35000,'Texas':71000,'Oregon':1600,'Utah':500}ser3 = Series(dict1)#Oregon 1600#Texas 71000#Utah 500#ohio 35000#dtype: int64print(ser3)states = ['California', 'Ohio', 'Oregon', 'Texas']ser4 = Series(dict1,index = states)print(ser4)#California NaN#Ohio NaN#Oregon 1600.0#Texas 71000.0#dtype: float64
用了dictionary时候,也是可以特定的制定index的,当没有map到value的时候,给NaN.
print(pd.isnull(ser4))#California True#Ohio True#Oregon False#Texas False#dtype: bool
函数isnull判断是否为null
print(pd.isnull(ser4))#California True#Ohio True#Oregon False#Texas False#dtype: bool
函数notnull判断是否为非null
print(pd.notnull(ser4))#California False#Ohio False#Oregon True#Texas True#dtype: bool
4. 访问元素和索引用法
print (ser2['a']) #0#print (ser2['a','c']) errorprint (ser2[['a','c']]) #a 0#c 2#dtype: int64print(ser2.values) #[0 1 2 3]print(ser2.index) #Index(['a', 'b', 'c', 'd'], dtype='object')
5. 运算, pandas的series保留Numpy的数组操作
print(ser2[ser2>2])#d 3#dtype: int64print(ser2*2)#a 0#b 2#c 4#d 6#dtype: int64print(np.exp(ser2))#a 1.000000#b 2.718282#c 7.389056#d 20.085537#dtype: float64
6. series 的自动匹配,这个有点类似sql中的full join,会基于索引键链接,没有的设置为null
print (ser3+ser4)#California NaN#Ohio NaN#Oregon 3200.0#Texas 142000.0#Utah NaN#ohio NaN#dtype: float64
7. series对象和索引都有一个name属性
ser4.index.name = 'state'ser4.name = 'population count'print(ser4)#state#California NaN#Ohio NaN#Oregon 1600.0#Texas 71000.0#Name: population count, dtype: float64
8.预览数据
print(ser4.head(2))print(ser4.tail(2))#state#California NaN#Ohio NaN#Name: population count, dtype: float64#state#Oregon 1600.0#Texas 71000.0#Name: population count, dtype: float64