aio
abc_analysis(df, primary_dimension, …[, …])
abc_analysis
Multi-Dimensional ABC Analysis provides ABC classification for a multi-dimensional, granular input.
xyz_analysis(df, primary_dimension_keys, …)
xyz_analysis
The XYZ Analysis provides a XYZ variability & frequency classification for a multi-dimensional, granular time series input dataset.
aio.
DataFrame holding the object to be classified, if applicable additional secondary_dimensions, and numeric values used for classification, e.g.
df.columns = [“product”, “country”, “quantity”].
Column name in input DataFrame holding object to be classified, e.g. product.
List of columns names in input DataFrame holding additional attributes of primary_dimension to structure classification on a more granular level, e.g. country, region, city
Column name in input DataFrame holding numeric values to be used for classification.
Threshold for classification.
Provides DataFrame with columns primary_dimension, secondary_dimension, numeric_dimension and class in originally provided naming.
input DataFrame grouped by provided primary- & secondary dimensions with respective classification and cumulative values.
Examples
>>> import aio >>> # create sample data >>> products, quantities = {}, {} >>> np.random.seed(seed=0) >>> for i in range(1000): >>> products[i] = "{:04d}".format(np.random.randint(15)) >>> quantities[i] = np.random.randint(1000) >>> # prepare sample data DataFrame >>> df = pd.DataFrame() >>> df["Product"] = products.values() >>> df["Quantity"] = quantities.values() >>> >>> results = aio.abc_analysis( >>> df, primary_dimension="Product", numeric_dimension="Quantity" >>> )
DataFrame holding the object to be classified, if applicable additional secondary_dimensions, and numeric values used for classification, e.g. df.columns = [“product”, “country”, “quantity”].
Column name(s) in the input DataFrame holding the object(s) to be classified, e.g. a product number. The primary_dimension_keys can be provided on the level of granularity the classification should be performed on, e.g. product, country, region or product, plant, storage location.
Column name in the input DataFrame holding numeric values to be used for classification, e.g. periods with demand for a product.
Column in the input DataFrame holding the dates to the relevant_numeric_dimension values.
Start date of the classification to be provided in format YYYY-MM or YYYY-MM-DD. Start_date should be provided together with periods and frequency to enable the function to complete the period range to be considered for classification, e.g. start_date = “01.01.2020”, periods = 12, frequency = “M” resulting in a period range of 12 monthly buckets starting in January 2020 like 2020-01, 2020-02, … ,2020-12.
Number of periods the classification is performed for.
Frequency of the periods the classification is performed for, e.g. “D” for days, “M” for months, “Q” for quarters, “Y” for years
Threshold values to distinct the provided data into three variability classes X, Y & Z. e.g. X =< 0.5; 0.5 < Y =< 1; Z > 1
Threshold values to distinct the provided data into three frequency classes Low, Medium, High. e.g. Low =< 0.5; 0.5 < Medium =< 1; High > 1
Output DataFrame returned grouped by provided primary- & secondary dimensions with respective classification and cumulative values
>>> import aio >>> In [1]: >>> # create sample data >>> quantities = {} >>> np.random.seed(seed=42) >>> df = pd.DataFrame() >>> # create random time series with aio.create_time_series function >>> for i in range(10): >>> quantities = aio.create_time_series( >>> distribution='normal', >>> p_mean=1000, >>> p_std=300, >>> num_periods=12, >>> periodicity='M', >>> start_date='2020-01-01', >>> actual_material_number=str('{:04d}'.format(np.random.randint(1000))) + str("-") + str('{:02d}'.format(np.random.randint(20))) + str("-") + str('{:05d}'.format(np.random.randint(5))), >>> standard_price=1, intermittency=0.2 >>> ) >>> df = df.append(quantities) >>> # post process sample data >>> df = df.reset_index() >>> df = df.drop(columns=["Value", "index"]) >>> # shorten date format from YYYY-MM-DD to YYYY-MM >>> df["Date"] = df["Date"].astype("str").str[:5] + df["Date"].astype("str").str[-2:] >>> # split key return from function create_time_series into three columns >>> df[["Material","Country", "Region"]] = df["Material"].str.split('-', expand=True) >>> # sort columns into more logical order >>> df = df[['Material','Country', 'Region', 'Date', 'Quantity']] >>> # delete random periods as actual data a likely to be incomplete >>> df = df.drop(np.random.choice(len(df),(int(len(df)/2)))) >>> Out[1]: >>> Material Country Region Date Quantity >>> 0 0102 19 00004 2020-01 1163.0 >>> 2 0102 19 00004 2020-03 641.0 >>> 3 0102 19 00004 2020-04 1642.0 >>> 4 0102 19 00004 2020-05 972.0 >>> 5 0102 19 00004 2020-06 721.0 >>> ... ... ... ... ... ... >>> 110 0459 18 00004 2020-03 419.0 >>> 111 0459 18 00004 2020-04 746.0 >>> 112 0459 18 00004 2020-05 1409.0 >>> 116 0459 18 00004 2020-09 1835.0 >>> 119 0459 18 00004 2020-12 1057.0 >>> In [2]: >>> result = aio.yz_analysis( >>> df=df,primary_dimension_keys=["Material","Country", "Region"], >>> relevant_numeric_dimension="Quantity", >>> relevant_date_dimension="Date", >>> periods=12, >>> start_date="2020-01-01", >>> frequency="M" >>> ) >>> result.head() >>> Out [2]: >>> Mean Standard_Deviation Non_Zero_Count Coefficient_of_Variation Relative_Non_Zero_Period_Count XYZ_Class Frequency_Class Material Country Region >>> 0 592.500000 637.290358 6 1.075596 0.500000 Z Medium 0008 08 00002 >>> 1 604.833333 586.178662 7 0.969157 0.583333 Y Medium 0102 19 00004 >>> 2 475.000000 619.921109 5 1.305097 0.416667 Z Medium 0402 02 00002 >>> 3 561.583333 676.746959 6 1.205070 0.500000 Z Medium 0459 18 00004 >>> 4 327.333333 516.059780 4 1.576557 0.333333 Z Low 0498 16 00002