Python Pandas

February 3, 2023 1 minute read

Pandas

DataFrame is a collection of series

Read

co2 = pd.read_csv(
    'co2_mm_mlo.txt',
    sep='\s+',
    skiprows=lambda x : x < 58,
    header=None,
)
pd.DataFrame(covid_json['data'],
            columns = [c['name'] for c in covid_json['meta']['view']['columns']])

iloc, loc, []

loc[index, column], parapmeters can be a slice, list or single value of labels
iloc is a index based version of loc
[] is index access when paramter is a slice, otherwise column acess
the first parameter can also be a boolean list for filter

query and eval

simple expression for filter

mat.query('type == "fuel" and burnup not in @burnuplist')
ins_missing_score_pivot.eval('Total = `True` + `False`', inplace=True)

’@’ used for outer variable reference

groupby

some use cases

mat.groupby('burnup').size()
mat.query('type == "fuel"') \
   .drop('type', axis='columns') \
   .groupby(['enrichment', 'burnup']) \
   .agg({
        'atom_density': lambda series : series.sum()
   })
mat.grouby('type').filter(lambda df : df)
mat.grouby('type').apply(lambda df : df)

agg used for each group series filter, apply used for each group dataframe

map, apply, applymap

map, apply, applymap used for columns, rows, single item manipulation

function used for sort, drop and count

drop_duplicates, value_counts, unique, sort_values,sort_indexs

miscellaneous

idmax, describe, isin

manipulate items with str

df['str']str.split(' ').str[0]

pivot_table VS melt

a example for changing a df into a pivot_table and recover it

piv = mats.pivot_table(
    values='atom_density',
    index=['type', 'enrichment', 'burnup'],
    columns='nuclide'
)
piv.reset_index().melt(id_vars=['enrichment', 'burnup', 'type'],
                       var_name='nuclide',
                       value_name='atom_density'
                       ).head()
mats.sort_values(['nuclide', 'enrichment', 'burnup']).head()

Change Type

pd.Series.astype("string")

Pandas Time

ins['timestamp'] = pd.to_datetime(ins['date'], errors='raise', format='%m/%d/%Y %I:%M:%S %p')

Pandas merge

scores = pd.merge(first_scores, second_scores, on = "bid") \
                        .rename(columns = {"score_x": "first score",
                                           "score_y": "second score"})

Pandas rolling

manipulate about every adjacent three items

pd.rolling(3).agg(sum)

Pandas replace

table can be nested table for column specified or a table for all elements

pd.replace({})

indesperate