AIO Supply Chain Analytics¶
List of Functions¶
|
Multi-Dimensional ABC Analysis provides ABC classification for a multi-dimensional, granular input. |
|
The XYZ Analysis provides a XYZ variability & frequency classification for a multi-dimensional, granular time series input dataset. |
|
Creates a time series with a given distribution |
Definition of Functions¶
-
aio.
abc_analysis
(df, primary_dimension, numeric_dimension, secondary_dimensions=None, A=0.8, B=0.95, classified_only=False)¶ Multi-Dimensional ABC Analysis provides ABC classification for a multi-dimensional, granular input.
- Parameters
- dfPandas.DataFrame
DataFrame holding the object to be classified, if applicable additional secondary_dimensions, and numeric values used for classification, e.g.
df.columns = [“product”, “country”, “quantity”].
- primary_dimensionstring
Column name in input DataFrame holding object to be classified, e.g. product.
- secondary_dimensionlist of strings = None
List of columns names in input DataFrame holding additional attributes of primary_dimension to structure classification on a more granular level, e.g. country, region, city
- numeric_dimensionstring
Column name in input DataFrame holding numeric values to be used for classification.
- A, Bfloat = 0.8, 0.95
Threshold for classification.
- classified_onlybool = False
Provides DataFrame with columns primary_dimension, secondary_dimension, numeric_dimension and class in originally provided naming.
- Returns
- df_groupedPandas.DataFrame
input DataFrame grouped by provided primary- & secondary dimensions with respective classification and cumulative values.
Examples
>>> import aio >>> # create sample data >>> products, quantities = {}, {} >>> np.random.seed(seed=0) >>> for i in range(1000): >>> products[i] = "{:04d}".format(np.random.randint(15)) >>> quantities[i] = np.random.randint(1000) >>> # prepare sample data DataFrame >>> df = pd.DataFrame() >>> df["Product"] = products.values() >>> df["Quantity"] = quantities.values() >>> >>> results = aio.abc_analysis( >>> df, primary_dimension="Product", numeric_dimension="Quantity" >>> )
-
aio.
xyz_analysis
(df, primary_dimension_keys, relevant_numeric_dimension, relevant_date_dimension, start_date, periods, frequency, X=0.5, Y=1, L=0.4, M=0.7)¶ The XYZ Analysis provides a XYZ variability & frequency classification for a multi-dimensional, granular time series input dataset.
- Parameters
- dfPandas.DataFrame
DataFrame holding the object to be classified, if applicable additional secondary_dimensions, and numeric values used for classification, e.g. df.columns = [“product”, “country”, “quantity”].
- primary_dimension_keysstring or list of strings
Column name(s) in the input DataFrame holding the object(s) to be classified, e.g. a product number. The primary_dimension_keys can be provided on the level of granularity the classification should be performed on, e.g. product, country, region or product, plant, storage location.
- relevant_numeric_dimensionstring
Column name in the input DataFrame holding numeric values to be used for classification, e.g. periods with demand for a product.
- relevant_date_dimensionstring
Column in the input DataFrame holding the dates to the relevant_numeric_dimension values.
- start_datestring
Start date of the classification to be provided in format YYYY-MM or YYYY-MM-DD. Start_date should be provided together with periods and frequency to enable the function to complete the period range to be considered for classification, e.g. start_date = “01.01.2020”, periods = 12, frequency = “M” resulting in a period range of 12 monthly buckets starting in January 2020 like 2020-01, 2020-02, … ,2020-12.
- periodsint
Number of periods the classification is performed for.
- frequencystring
Frequency of the periods the classification is performed for, e.g. “D” for days, “M” for months, “Q” for quarters, “Y” for years
- X, Yfloat = 0.5, 1
Threshold values to distinct the provided data into three variability classes X, Y & Z. e.g. X =< 0.5; 0.5 < Y =< 1; Z > 1
- L, Mfloat = 0.4, 0.7
Threshold values to distinct the provided data into three frequency classes Low, Medium, High. e.g. Low =< 0.5; 0.5 < Medium =< 1; High > 1
- Returns
- df_returnPandas.DataFrame
Output DataFrame returned grouped by provided primary- & secondary dimensions with respective classification and cumulative values
Examples
>>> import aio >>> >>> # create sample data >>> quantities = {} >>> np.random.seed(seed=42) >>> df = pd.DataFrame() >>> # create random time series with aio.create_time_series function >>> for i in range(10): >>> quantities = aio.create_time_series( >>> distribution='normal', >>> p_mean=1000, >>> p_std=300, >>> num_periods=12, >>> periodicity='M', >>> start_date='2020-01-01', >>> actual_material_number=str('{:04d}'.format(np.random.randint(1000))) + str("-") + str('{:02d}'.format(np.random.randint(20))) + str("-") + str('{:05d}'.format(np.random.randint(5))), >>> standard_price=1, intermittency=0.2 >>> ) >>> df = df.append(quantities) >>> # post process sample data >>> df = df.reset_index() >>> df = df.drop(columns=["Value", "index"]) >>> # shorten date format from YYYY-MM-DD to YYYY-MM >>> df["Date"] = df["Date"].astype("str").str[:5] + df["Date"].astype("str").str[-2:] >>> # split key return from function create_time_series into three columns >>> df[["Material","Country", "Region"]] = df["Material"].str.split('-', expand=True) >>> # sort columns into more logical order >>> df = df[['Material','Country', 'Region', 'Date', 'Quantity']] >>> # delete random periods as actual data a likely to be incomplete >>> df = df.drop(np.random.choice(len(df),(int(len(df)/2)))) >>> >>> Out[1]: >>> Material Country Region Date Quantity >>> 0 0102 19 00004 2020-01 1163.0 >>> 2 0102 19 00004 2020-03 641.0 >>> 3 0102 19 00004 2020-04 1642.0 >>> 4 0102 19 00004 2020-05 972.0 >>> 5 0102 19 00004 2020-06 721.0 >>> ... ... ... ... ... ... >>> 110 0459 18 00004 2020-03 419.0 >>> 111 0459 18 00004 2020-04 746.0 >>> 112 0459 18 00004 2020-05 1409.0 >>> 116 0459 18 00004 2020-09 1835.0 >>> 119 0459 18 00004 2020-12 1057.0 >>> In [2]: >>> result = aio.yz_analysis( >>> df=df,primary_dimension_keys=["Material","Country", "Region"], >>> relevant_numeric_dimension="Quantity", >>> relevant_date_dimension="Date", >>> periods=12, >>> start_date="2020-01-01", >>> frequency="M" >>> ) >>> result.head() >>> Out [2]: >>> Mean Standard_Deviation Non_Zero_Count Coefficient_of_Variation Relative_Non_Zero_Period_Count XYZ_Class Frequency_Class Material Country Region >>> 0 592.500000 637.290358 6 1.075596 0.500000 Z Medium 0008 08 00002 >>> 1 604.833333 586.178662 7 0.969157 0.583333 Y Medium 0102 19 00004 >>> 2 475.000000 619.921109 5 1.305097 0.416667 Z Medium 0402 02 00002 >>> 3 561.583333 676.746959 6 1.205070 0.500000 Z Medium 0459 18 00004 >>> 4 327.333333 516.059780 4 1.576557 0.333333 Z Low 0498 16 00002
-
aio.
create_time_series
(distribution='uniform', p_mean=10, p_std=1, num_periods=365, periodicity='D', start_date='2020-01-01', actual_material_number='Mat-ID-generated', standard_price=1, intermittency=0)¶ Creates a time series with a given distribution
- Parameters
- distributionstr = “uniform”
const | p_mean, normal | p_mean, p_std, uniform | p_mean, p_std or poisson | p_mean
- num_periodsint = 365
number of increments the time series must be created for
- start_datestr = “2020-01-01”
reference start date | format yyyy-mm-dd
- actual_material_numberstr = “Mat-ID-generated”
any material identifier
- standard_priceint = 1
any float/integer value as price of 1 quantity unit
- intermittencyfloat = 0.0
percentage of quantity data points = 0 | range 0 to 1, format e.g. 0.4 ~ 40 %
Examples
>>> df = pd.DataFrame() >>> # create random time-series with aio.create_time_series function >>> for i in range(100): >>> quantities = aio.create_time_series( >>> distribution="normal", >>> p_mean=1000, >>> p_std=300, >>> num_periods=12, >>> periodicity="M", >>> start_date="2020-01-01", >>> actual_material_number=str("{:04d}".format(np.random.randint(1000))) >>> + str("-") >>> + str("{:02d}".format(np.random.randint(20))) >>> + str("-") >>> + str("{:05d}".format(np.random.randint(5))), >>> standard_price=1, >>> intermittency=0.2, >>> ) >>> df = df.append(quantities) >>> df.head()