Python数据分析模块 | pandas做数据分析(三):统计相关函数
计算操作
1、pandas.series.value_counts
Series.value_counts(normalize=False,sort=True,ascending=False, bins=None, dropna=True)
作用:返回一个包含值和该值出现次数的Series对象,次序按照出现的频率由高到低排序.
参数: normalize : 布尔值,默认为False,如果是True的话,就会包含该值出现次数的频率. sort : 布尔值,默认为True.排序控制. ascending : 布尔值,默认为False,以升序排序 bins : integer, optional Rather than count values, group them into half-open bins, a convenience for pd.cut, only works with numeric data dropna : 布尔型,默认为True,表示不包括NaN
2.pandas.DataFrame.count
DataFrame.count(axis=0, level=None, numeric_only=False) Return Series with number of non-NA/null observations over requested axis. Works with non-floating point data as well (detects NaN and None)
Parameters: axis : {0 or ‘index’, 1 or ‘columns’}, default 0 0 or ‘index’ for row-wise, 1 or ‘columns’ for column-wise level : int or level name, default None If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a DataFrame numeric_only : boolean, default False Include only float, int, boolean data Returns: count : Series (or DataFrame if level specified)
最大最小值
标准统计函数
pandas.dataframe.sum
返回指定轴上值的和.
DataFrame.sum(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)
参数:
axis : {index (0), columns (1)}
skipna : 布尔值,默认为True.表示跳过NaN值.如果整行/列都是NaN,那么结果也就是NaN
level : int or level name, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.
Returns:
sum : Series or DataFrame (if level specified)
import numpy as np
import pandas as pd
df=pd.DataFrame(data=[[1.4,np.nan],[7.1,-4.5],[np.nan,np.nan],[0.75,-1.3]], index=["a","b","c","d"],
columns=["one","two"])
print("df:")
print(df)
#直接使用sum()方法,返回一个列求和的Series,自动跳过NaN值
print("df.sum()")
print(df.sum())
#当轴为1.就会按行求和
print("df.sum(axis=1)")
print(df.sum(axis=1))
#选择skipna=False可以禁用跳过Nan值
print("df.sum(axis=1,skipna=False):")
print(df.sum(axis=1,skipna=False))
结果:
2、pandas.dataframe.mean
返回指定轴上值的平均数.
DataFrame.mean(axis=None,skipna=None,level=None,numeric_only=None, **kwargs)
参数: axis : {index (0), columns (1)} skipna :布尔值,默认为True.表示跳过NaN值.如果整行/列都是NaN,那么结果也就是NaN level : int or level name, default None If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series numeric_only : boolean, default None Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.
例子:
import numpy as np
import pandas as pd
df=pd.DataFrame(data=[[1.4,np.nan],[7.1,-4.5],[np.nan,np.nan],[0.75,-1.3]],
index=["a","b","c","d"],
columns=["one","two"])
print("df:")
print(df)
#直接使用mean()方法,返回一个列求平均数的Series,自动跳过NaN值
print("df.mean()")
print(df.mean())
#当轴为1.就会按行求平均数
print("df.mean(axis=1)")
print(df.mean(axis=1))
#选择skipna=False可以禁用跳过Nan值
print("df.mean(axis=1,skipna=False):")
print(df.mean(axis=1,skipna=False))
结果:
排序
1、pandas.dataframe.sort_values
DataFrame.sort_values(by,axis=0,ascending=True,inplace=False, kind='quicksort', na_position='last')
Sort by the values along either axis
参数: by : str or list of str Name or list of names which refer to the axis items. axis : {0 or ‘index’, 1 or ‘columns’}, default 0 Axis to direct sorting ascending : bool or list of bool, default True Sort ascending vs. descending. Specify list for multiple sort orders. If this is a list of bools, must match the length of the by. inplace : bool, default False if True, perform operation in-place kind : {‘quicksort’, ‘mergesort’, ‘heapsort’}, default ‘quicksort’ Choice of sorting algorithm. See also ndarray.np.sort for more information. mergesort is the only stable algorithm. For DataFrames, this option is only applied when sorting on a single column or label. na_position : {‘first’, ‘last’}, default ‘last’ first puts NaNs at the beginning, last puts NaNs at the end Returns: sorted_obj : DataFrame
- Volley解析之表单提交篇
- JAVA中重写equals()方法的同时要重写hashcode()方法
- 调用CodeSmith类库实现代码生成(含源码)
- 1分钟生成Net对象的注释
- Android Studio之gradle的配置与介绍
- MLlib中的随机森林和提升方法
- android JNI调用机制
- Android开发小窍门通过泛型简化findViewById类型转换
- lodash源码分析之缓存使用方式的进一步封装
- Android LruCache技术原理
- RecyclerView 必知必会
- AndFix使用感想
- iOS GPUImage源码解读(一)
- Android 几种网络请求的区别与联系
- JavaScript 教程
- JavaScript 编辑工具
- JavaScript 与HTML
- JavaScript 与Java
- JavaScript 数据结构
- JavaScript 基本数据类型
- JavaScript 特殊数据类型
- JavaScript 运算符
- JavaScript typeof 运算符
- JavaScript 表达式
- JavaScript 类型转换
- JavaScript 基本语法
- JavaScript 注释
- Javascript 基本处理流程
- Javascript 选择结构
- Javascript if 语句
- Javascript if 语句的嵌套
- Javascript switch 语句
- Javascript 循环结构
- Javascript 循环结构实例
- Javascript 跳转语句
- Javascript 控制语句总结
- Javascript 函数介绍
- Javascript 函数的定义
- Javascript 函数调用
- Javascript 几种特殊的函数
- JavaScript 内置函数简介
- Javascript eval() 函数
- Javascript isFinite() 函数
- Javascript isNaN() 函数
- parseInt() 与 parseFloat()
- escape() 与 unescape()
- Javascript 字符串介绍
- Javascript length属性
- javascript 字符串函数
- Javascript 日期对象简介
- Javascript 日期对象用途
- Date 对象属性和方法
- Javascript 数组是什么
- Javascript 创建数组
- Javascript 数组赋值与取值
- Javascript 数组属性和方法
- 【010期】JavaSE面试题(十):集合之Map18连环炮!
- 每日打卡 373. 查找和最小的K对数字
- IE11出现"__doPostBack”未定义
- latex()、ploy2sym()、symsum()的妙用
- 性能最佳实践:MongoDB数据建模和内存大小调整
- 节省你生命的一个小技术No.193
- K8S 生态周报| Traefik v2.3.0-rc2 发布
- 如何从 Notion 批量导出 Markdown?
- 【LeetCode】1518. 换酒问题
- OBS推流工具使用说明
- Python多进程
- app反编译遇到360加固,傻瓜式脱壳
- Java底层-本地接口(JNI)
- 如何从最坏、平均、最好的情况分析复杂度?
- 利用Python进行MR栅格数据处理