Python Pandas
Pandas
DataFrame is a collection of series
Read
co2 = pd.read_csv(
'co2_mm_mlo.txt',
sep='\s+',
skiprows=lambda x : x < 58,
header=None,
)
pd.DataFrame(covid_json['data'],
columns = [c['name'] for c in covid_json['meta']['view']['columns']])
iloc, loc, []
-
loc[index, column], parapmeters can be a slice, list or single value of labels
-
iloc is a index based version of loc
-
[] is index access when paramter is a slice, otherwise column acess
-
the first parameter can also be a boolean list for filter
query and eval
simple expression for filter
mat.query('type == "fuel" and burnup not in @burnuplist')
ins_missing_score_pivot.eval('Total = `True` + `False`', inplace=True)
’@’ used for outer variable reference
groupby
some use cases
mat.groupby('burnup').size()
mat.query('type == "fuel"') \
.drop('type', axis='columns') \
.groupby(['enrichment', 'burnup']) \
.agg({
'atom_density': lambda series : series.sum()
})
mat.grouby('type').filter(lambda df : df)
mat.grouby('type').apply(lambda df : df)
agg used for each group series filter, apply used for each group dataframe
map, apply, applymap
map, apply, applymap used for columns, rows, single item manipulation
function used for sort, drop and count
drop_duplicates, value_counts, unique, sort_values,sort_indexs
miscellaneous
idmax, describe, isin
manipulate items with str
df['str']str.split(' ').str[0]
pivot_table VS melt
a example for changing a df into a pivot_table and recover it
piv = mats.pivot_table(
values='atom_density',
index=['type', 'enrichment', 'burnup'],
columns='nuclide'
)
piv.reset_index().melt(id_vars=['enrichment', 'burnup', 'type'],
var_name='nuclide',
value_name='atom_density'
).head()
mats.sort_values(['nuclide', 'enrichment', 'burnup']).head()
Change Type
pd.Series.astype("string")
Pandas Time
ins['timestamp'] = pd.to_datetime(ins['date'], errors='raise', format='%m/%d/%Y %I:%M:%S %p')
Pandas merge
scores = pd.merge(first_scores, second_scores, on = "bid") \
.rename(columns = {"score_x": "first score",
"score_y": "second score"})
Pandas rolling
manipulate about every adjacent three items
pd.rolling(3).agg(sum)
Pandas replace
table can be nested table for column specified or a table for all elements
pd.replace({})