AIO Supply Chain Analytics¶

List of Functions¶

`abc_analysis`(df, primary_dimension, …[, …])	Multi-Dimensional ABC Analysis provides ABC classification for a multi-dimensional, granular input.
`xyz_analysis`(df, primary_dimension_keys, …)	The XYZ Analysis provides a XYZ variability & frequency classification for a multi-dimensional, granular time series input dataset.
`create_time_series`([distribution, p_mean, …])	Creates a time series with a given distribution

Definition of Functions¶

aio.abc_analysis(df, primary_dimension, numeric_dimension, secondary_dimensions=None, A=0.8, B=0.95, classified_only=False)¶

Multi-Dimensional ABC Analysis provides ABC classification for a multi-dimensional, granular input.

Parameters

dfPandas.DataFrame

DataFrame holding the object to be classified, if applicable additional secondary_dimensions, and numeric values used for classification, e.g.

df.columns = [“product”, “country”, “quantity”].

primary_dimensionstring

Column name in input DataFrame holding object to be classified, e.g. product.

secondary_dimensionlist of strings = None

List of columns names in input DataFrame holding additional attributes of primary_dimension to structure classification on a more granular level, e.g. country, region, city

numeric_dimensionstring

Column name in input DataFrame holding numeric values to be used for classification.

A, Bfloat = 0.8, 0.95

Threshold for classification.

classified_onlybool = False

Provides DataFrame with columns primary_dimension, secondary_dimension, numeric_dimension and class in originally provided naming.

Returns

df_groupedPandas.DataFrame: input DataFrame grouped by provided primary- & secondary dimensions with respective classification and cumulative values.

Examples

>>> import aio
>>> # create sample data
>>> products, quantities = {}, {}
>>> np.random.seed(seed=0)
>>> for i in range(1000):
>>>     products[i] = "{:04d}".format(np.random.randint(15))
>>>     quantities[i] = np.random.randint(1000)
>>> # prepare sample data DataFrame
>>> df = pd.DataFrame()
>>> df["Product"] = products.values()
>>> df["Quantity"] = quantities.values()
>>>
>>> results = aio.abc_analysis(
>>>     df, primary_dimension="Product", numeric_dimension="Quantity"
>>> )

aio.xyz_analysis(df, primary_dimension_keys, relevant_numeric_dimension, relevant_date_dimension, start_date, periods, frequency, X=0.5, Y=1, L=0.4, M=0.7)¶

The XYZ Analysis provides a XYZ variability & frequency classification for a multi-dimensional, granular time series input dataset.

Parameters

dfPandas.DataFrame: DataFrame holding the object to be classified, if applicable additional secondary_dimensions, and numeric values used for classification, e.g. df.columns = [“product”, “country”, “quantity”].
primary_dimension_keysstring or list of strings: Column name(s) in the input DataFrame holding the object(s) to be classified, e.g. a product number. The primary_dimension_keys can be provided on the level of granularity the classification should be performed on, e.g. product, country, region or product, plant, storage location.
relevant_numeric_dimensionstring: Column name in the input DataFrame holding numeric values to be used for classification, e.g. periods with demand for a product.
relevant_date_dimensionstring: Column in the input DataFrame holding the dates to the relevant_numeric_dimension values.
start_datestring: Start date of the classification to be provided in format YYYY-MM or YYYY-MM-DD. Start_date should be provided together with periods and frequency to enable the function to complete the period range to be considered for classification, e.g. start_date = “01.01.2020”, periods = 12, frequency = “M” resulting in a period range of 12 monthly buckets starting in January 2020 like 2020-01, 2020-02, … ,2020-12.
periodsint: Number of periods the classification is performed for.
frequencystring: Frequency of the periods the classification is performed for, e.g. “D” for days, “M” for months, “Q” for quarters, “Y” for years
X, Yfloat = 0.5, 1: Threshold values to distinct the provided data into three variability classes X, Y & Z. e.g. X =< 0.5; 0.5 < Y =< 1; Z > 1
L, Mfloat = 0.4, 0.7: Threshold values to distinct the provided data into three frequency classes Low, Medium, High. e.g. Low =< 0.5; 0.5 < Medium =< 1; High > 1

Returns

df_returnPandas.DataFrame: Output DataFrame returned grouped by provided primary- & secondary dimensions with respective classification and cumulative values

Examples

>>> import aio
>>> 
>>> # create sample data 
>>> quantities = {}
>>> np.random.seed(seed=42)
>>> df = pd.DataFrame()
>>> # create random time series with aio.create_time_series function
>>> for i in range(10):
>>>     quantities = aio.create_time_series(
>>>         distribution='normal',
>>>         p_mean=1000,
>>>         p_std=300,
>>>         num_periods=12,
>>>         periodicity='M',
>>>         start_date='2020-01-01',
>>>         actual_material_number=str('{:04d}'.format(np.random.randint(1000))) + str("-") + str('{:02d}'.format(np.random.randint(20))) + str("-") + str('{:05d}'.format(np.random.randint(5))),
>>>         standard_price=1, intermittency=0.2
>>>         )
>>> df = df.append(quantities)
>>> # post process sample data 
>>> df = df.reset_index()
>>> df = df.drop(columns=["Value", "index"])
>>> # shorten date format from YYYY-MM-DD to YYYY-MM
>>> df["Date"] = df["Date"].astype("str").str[:5] + df["Date"].astype("str").str[-2:]
>>> # split key return from function create_time_series into three columns
>>> df[["Material","Country", "Region"]] = df["Material"].str.split('-', expand=True)
>>> # sort columns into more logical order
>>> df = df[['Material','Country', 'Region', 'Date', 'Quantity']]
>>> # delete random periods as actual data a likely to be incomplete
>>> df = df.drop(np.random.choice(len(df),(int(len(df)/2))))
>>>
>>> Out[1]:
    >>>     Material Country    Region      Date        Quantity
    >>> 0   0102        19      00004       2020-01     1163.0
    >>> 2   0102        19      00004       2020-03     641.0
    >>> 3   0102        19      00004       2020-04     1642.0
    >>> 4   0102        19      00004       2020-05     972.0
    >>> 5   0102        19      00004       2020-06     721.0
    >>> ... ...         ...     ...         ...          ...
    >>> 110 0459        18      00004       2020-03     419.0
    >>> 111 0459        18      00004       2020-04     746.0
    >>> 112 0459        18      00004       2020-05     1409.0
    >>> 116 0459        18      00004       2020-09     1835.0
    >>> 119 0459        18      00004       2020-12     1057.0   
>>> In [2]:
>>> result = aio.yz_analysis(
>>>        df=df,primary_dimension_keys=["Material","Country", "Region"],
>>>        relevant_numeric_dimension="Quantity", 
>>>        relevant_date_dimension="Date",
>>>        periods=12,
>>>        start_date="2020-01-01",
>>>        frequency="M"
>>>        )
>>> result.head()
>>> Out [2]:
>>>         Mean        Standard_Deviation  Non_Zero_Count  Coefficient_of_Variation        Relative_Non_Zero_Period_Count  XYZ_Class       Frequency_Class Material Country Region
>>> 0       592.500000      637.290358              6                   1.075596                    0.500000                   Z            Medium          0008     08       00002
>>> 1       604.833333      586.178662              7                   0.969157                    0.583333                   Y            Medium          0102     19       00004
>>> 2       475.000000      619.921109              5                   1.305097                    0.416667                   Z            Medium          0402     02       00002
>>> 3       561.583333      676.746959              6                   1.205070                    0.500000                   Z            Medium          0459     18       00004
>>> 4       327.333333      516.059780              4                   1.576557                    0.333333                   Z            Low             0498     16       00002

aio.create_time_series(distribution='uniform', p_mean=10, p_std=1, num_periods=365, periodicity='D', start_date='2020-01-01', actual_material_number='Mat-ID-generated', standard_price=1, intermittency=0)¶

Creates a time series with a given distribution

Parameters

distributionstr = “uniform”: const | p_mean, normal | p_mean, p_std, uniform | p_mean, p_std or poisson | p_mean
num_periodsint = 365: number of increments the time series must be created for
start_datestr = “2020-01-01”: reference start date | format yyyy-mm-dd
actual_material_numberstr = “Mat-ID-generated”: any material identifier
standard_priceint = 1: any float/integer value as price of 1 quantity unit
intermittencyfloat = 0.0: percentage of quantity data points = 0 | range 0 to 1, format e.g. 0.4 ~ 40 %

Examples

>>> df = pd.DataFrame()
>>> # create random time-series with aio.create_time_series function
>>> for i in range(100):
>>>     quantities = aio.create_time_series(
>>>         distribution="normal",
>>>         p_mean=1000,
>>>         p_std=300,
>>>         num_periods=12,
>>>         periodicity="M",
>>>         start_date="2020-01-01",
>>>         actual_material_number=str("{:04d}".format(np.random.randint(1000)))
>>>         + str("-")
>>>         + str("{:02d}".format(np.random.randint(20)))
>>>         + str("-")
>>>         + str("{:05d}".format(np.random.randint(5))),
>>>         standard_price=1,
>>>         intermittency=0.2,
>>>     )
>>>     df = df.append(quantities)
>>> df.head()

AIO Documentation Azure Key Vault API