AIO Supply Chain Analytics

List of Functions

abc_analysis(df, primary_dimension, …[, …])

Multi-Dimensional ABC Analysis provides ABC classification for a multi-dimensional, granular input.

xyz_analysis(df, primary_dimension_keys, …)

The XYZ Analysis provides a XYZ variability & frequency classification for a multi-dimensional, granular time series input dataset.

create_time_series([distribution, p_mean, …])

Creates a time series with a given distribution

Definition of Functions

aio.abc_analysis(df, primary_dimension, numeric_dimension, secondary_dimensions=None, A=0.8, B=0.95, classified_only=False)

Multi-Dimensional ABC Analysis provides ABC classification for a multi-dimensional, granular input.

Parameters
dfPandas.DataFrame

DataFrame holding the object to be classified, if applicable additional secondary_dimensions, and numeric values used for classification, e.g.

df.columns = [“product”, “country”, “quantity”].

primary_dimensionstring

Column name in input DataFrame holding object to be classified, e.g. product.

secondary_dimensionlist of strings = None

List of columns names in input DataFrame holding additional attributes of primary_dimension to structure classification on a more granular level, e.g. country, region, city

numeric_dimensionstring

Column name in input DataFrame holding numeric values to be used for classification.

A, Bfloat = 0.8, 0.95

Threshold for classification.

classified_onlybool = False

Provides DataFrame with columns primary_dimension, secondary_dimension, numeric_dimension and class in originally provided naming.

Returns
df_groupedPandas.DataFrame

input DataFrame grouped by provided primary- & secondary dimensions with respective classification and cumulative values.

Examples

>>> import aio
>>> # create sample data
>>> products, quantities = {}, {}
>>> np.random.seed(seed=0)
>>> for i in range(1000):
>>>     products[i] = "{:04d}".format(np.random.randint(15))
>>>     quantities[i] = np.random.randint(1000)
>>> # prepare sample data DataFrame
>>> df = pd.DataFrame()
>>> df["Product"] = products.values()
>>> df["Quantity"] = quantities.values()
>>>
>>> results = aio.abc_analysis(
>>>     df, primary_dimension="Product", numeric_dimension="Quantity"
>>> )
aio.xyz_analysis(df, primary_dimension_keys, relevant_numeric_dimension, relevant_date_dimension, start_date, periods, frequency, X=0.5, Y=1, L=0.4, M=0.7)

The XYZ Analysis provides a XYZ variability & frequency classification for a multi-dimensional, granular time series input dataset.

Parameters
dfPandas.DataFrame

DataFrame holding the object to be classified, if applicable additional secondary_dimensions, and numeric values used for classification, e.g. df.columns = [“product”, “country”, “quantity”].

primary_dimension_keysstring or list of strings

Column name(s) in the input DataFrame holding the object(s) to be classified, e.g. a product number. The primary_dimension_keys can be provided on the level of granularity the classification should be performed on, e.g. product, country, region or product, plant, storage location.

relevant_numeric_dimensionstring

Column name in the input DataFrame holding numeric values to be used for classification, e.g. periods with demand for a product.

relevant_date_dimensionstring

Column in the input DataFrame holding the dates to the relevant_numeric_dimension values.

start_datestring

Start date of the classification to be provided in format YYYY-MM or YYYY-MM-DD. Start_date should be provided together with periods and frequency to enable the function to complete the period range to be considered for classification, e.g. start_date = “01.01.2020”, periods = 12, frequency = “M” resulting in a period range of 12 monthly buckets starting in January 2020 like 2020-01, 2020-02, … ,2020-12.

periodsint

Number of periods the classification is performed for.

frequencystring

Frequency of the periods the classification is performed for, e.g. “D” for days, “M” for months, “Q” for quarters, “Y” for years

X, Yfloat = 0.5, 1

Threshold values to distinct the provided data into three variability classes X, Y & Z. e.g. X =< 0.5; 0.5 < Y =< 1; Z > 1

L, Mfloat = 0.4, 0.7

Threshold values to distinct the provided data into three frequency classes Low, Medium, High. e.g. Low =< 0.5; 0.5 < Medium =< 1; High > 1

Returns
df_returnPandas.DataFrame

Output DataFrame returned grouped by provided primary- & secondary dimensions with respective classification and cumulative values

Examples

>>> import aio
>>> 
>>> # create sample data 
>>> quantities = {}
>>> np.random.seed(seed=42)
>>> df = pd.DataFrame()
>>> # create random time series with aio.create_time_series function
>>> for i in range(10):
>>>     quantities = aio.create_time_series(
>>>         distribution='normal',
>>>         p_mean=1000,
>>>         p_std=300,
>>>         num_periods=12,
>>>         periodicity='M',
>>>         start_date='2020-01-01',
>>>         actual_material_number=str('{:04d}'.format(np.random.randint(1000))) + str("-") + str('{:02d}'.format(np.random.randint(20))) + str("-") + str('{:05d}'.format(np.random.randint(5))),
>>>         standard_price=1, intermittency=0.2
>>>         )
>>> df = df.append(quantities)
>>> # post process sample data 
>>> df = df.reset_index()
>>> df = df.drop(columns=["Value", "index"])
>>> # shorten date format from YYYY-MM-DD to YYYY-MM
>>> df["Date"] = df["Date"].astype("str").str[:5] + df["Date"].astype("str").str[-2:]
>>> # split key return from function create_time_series into three columns
>>> df[["Material","Country", "Region"]] = df["Material"].str.split('-', expand=True)
>>> # sort columns into more logical order
>>> df = df[['Material','Country', 'Region', 'Date', 'Quantity']]
>>> # delete random periods as actual data a likely to be incomplete
>>> df = df.drop(np.random.choice(len(df),(int(len(df)/2))))
>>>
>>> Out[1]:
    >>>     Material Country    Region      Date        Quantity
    >>> 0   0102        19      00004       2020-01     1163.0
    >>> 2   0102        19      00004       2020-03     641.0
    >>> 3   0102        19      00004       2020-04     1642.0
    >>> 4   0102        19      00004       2020-05     972.0
    >>> 5   0102        19      00004       2020-06     721.0
    >>> ... ...         ...     ...         ...          ...
    >>> 110 0459        18      00004       2020-03     419.0
    >>> 111 0459        18      00004       2020-04     746.0
    >>> 112 0459        18      00004       2020-05     1409.0
    >>> 116 0459        18      00004       2020-09     1835.0
    >>> 119 0459        18      00004       2020-12     1057.0   
>>> In [2]:
>>> result = aio.yz_analysis(
>>>        df=df,primary_dimension_keys=["Material","Country", "Region"],
>>>        relevant_numeric_dimension="Quantity", 
>>>        relevant_date_dimension="Date",
>>>        periods=12,
>>>        start_date="2020-01-01",
>>>        frequency="M"
>>>        )
>>> result.head()
>>> Out [2]:
>>>         Mean        Standard_Deviation  Non_Zero_Count  Coefficient_of_Variation        Relative_Non_Zero_Period_Count  XYZ_Class       Frequency_Class Material Country Region
>>> 0       592.500000      637.290358              6                   1.075596                    0.500000                   Z            Medium          0008     08       00002
>>> 1       604.833333      586.178662              7                   0.969157                    0.583333                   Y            Medium          0102     19       00004
>>> 2       475.000000      619.921109              5                   1.305097                    0.416667                   Z            Medium          0402     02       00002
>>> 3       561.583333      676.746959              6                   1.205070                    0.500000                   Z            Medium          0459     18       00004
>>> 4       327.333333      516.059780              4                   1.576557                    0.333333                   Z            Low             0498     16       00002
aio.create_time_series(distribution='uniform', p_mean=10, p_std=1, num_periods=365, periodicity='D', start_date='2020-01-01', actual_material_number='Mat-ID-generated', standard_price=1, intermittency=0)

Creates a time series with a given distribution

Parameters
distributionstr = “uniform”

const | p_mean, normal | p_mean, p_std, uniform | p_mean, p_std or poisson | p_mean

num_periodsint = 365

number of increments the time series must be created for

start_datestr = “2020-01-01”

reference start date | format yyyy-mm-dd

actual_material_numberstr = “Mat-ID-generated”

any material identifier

standard_priceint = 1

any float/integer value as price of 1 quantity unit

intermittencyfloat = 0.0

percentage of quantity data points = 0 | range 0 to 1, format e.g. 0.4 ~ 40 %

Examples

>>> df = pd.DataFrame()
>>> # create random time-series with aio.create_time_series function
>>> for i in range(100):
>>>     quantities = aio.create_time_series(
>>>         distribution="normal",
>>>         p_mean=1000,
>>>         p_std=300,
>>>         num_periods=12,
>>>         periodicity="M",
>>>         start_date="2020-01-01",
>>>         actual_material_number=str("{:04d}".format(np.random.randint(1000)))
>>>         + str("-")
>>>         + str("{:02d}".format(np.random.randint(20)))
>>>         + str("-")
>>>         + str("{:05d}".format(np.random.randint(5))),
>>>         standard_price=1,
>>>         intermittency=0.2,
>>>     )
>>>     df = df.append(quantities)
>>> df.head()