Pseudo Data Generator - Configuration Guide (`rules.ini`)

This guide explains how to set up the rules.ini file used by the Pseudo Data Generator to create synthetic datasets. You can define base columns, apply rules for dependencies between them, add additional columns, and reorder them before exporting to CSV.

`[rec]` — Record Generation Settings

Controls how many records to generate and what processing steps to apply.

Keys:

num: Number of records (e.g., 150)
mode:
- 1: Generate only the base records
- 2: Generate + apply post-generation column edits
- 3: Generate + post edits + reorder columns
cols: Total number of core columns, typically the count of [cX] sections.

Example:

[rec]
num = 100
mode = 2
cols = 5

`[cX]` — Core Column Definitions

Each [cX] block creates one column in the dataset. X is a number starting from 1.

Common Keys:

name: Final column name in the CSV.
dtype: int, str, float, or decimal
data: Data source type (see list below)
description: Optional comment to describe the field

Optional Keys:

options: Values to randomly pick from (e.g., A,B,C)
weights: Matching weights for options (e.g., 50,30,20)
cols: Column number to reference for logic
value, range, condition, operation, operands, faker_method: Used for complex logic

Supported `data` Types

`random`

Selects one value from a list, optionally with weights.

data = random
options = Red,Green,Blue
weights = 30,50,20

`company`

Auto-generates a unique ID (e.g., 1, 2, 3, ...).

`reference`

Uses a referenced column’s value to look up a match in range and return a corresponding value.

data = reference
cols = 2
value = 100,200,300
range = 1,2,3

`reference_range`

Uses thresholds to map a value range to a label.

data = reference_range
cols = 3
value = Low,Medium,High
range = 3,6,9

`reference_boolean`

Checks if a value equals a specific condition, and returns value A or B.

data = reference_boolean
cols = 5
value = Yes,No
condition = 1

`reference_boolean2`

Returns one of multiple values if a condition matches; otherwise returns 0.

data = reference_boolean2
cols = 4
value = 1,2,3
condition = 1

`total`

Performs a calculation on other columns.

data = total
operation = *
operands = c2,c3

`discount`

Applies a percentage increase or decrease.

data = discount
cols = 6
value = 10
operation = -

`increment`

Increments by a step from a start value.

data = increment
start = 100
interval = 5

`faker`

Uses Faker to generate realistic values.

data = faker
faker_method = name

`[aX]` — Append or Modify Columns

Used after base data generation. You can replace values or create new columns.

Replace Example:

[a1]
operation = replace
cols = 3
col_name = Category
find = 20
replace = Premium

Generate Random Example:

[a2]
operation = generate
data = random
new_col = Customer_Type
options = Enterprise,SMB,Individual
weights = 40,40,20
nullable = 0.1

Generate Faker Example:

[a3]
operation = generate
data = faker
new_col = Company_Name
faker_method = company
nullable = 0.05

Field Explanations:

operation: Either generate or replace
new_col: Name of column to add (for generate)
data: Data type to generate (random or faker)
nullable: Fraction (e.g., 0.1) of records that should be left empty
cols: Column to act upon (for replace)
col_name: Renames the column
find / replace: Target value and what to replace it with

`[reorder]` — Final Column Order

Sets the final column order for the output file.

Example:

[reorder]
order = 1,2,3,6,7,4,5

Each number refers to the original column number defined via [cX] or added later.

How It Works in Code

The script reads rules.ini using configparser.
If mode >= 1, it generates base records using [cX] rules.
If mode >= 2, it applies any [aX] append operations.
If mode == 3, it reorders columns based on the [reorder] block.

The result is saved as a CSV file with column names and sample data.

Example Output Flow

You want to generate 100 sales records with:

Product name and quantity
Offer price and discounted total
Customer type (some rows null)
Company name via Faker

Your config would include:

[c1] to [c10] for base columns
[a1] to [a3] for renaming and extra columns
[reorder] to organize final output

The result is a clean, realistic-looking dataset for analytics, testing, or demos.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
backend		backend
config-gen		config-gen
README.md		README.md
rules.ini		rules.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Pseudo Data Generator - Configuration Guide (`rules.ini`)

`[rec]` — Record Generation Settings

Keys:

Example:

`[cX]` — Core Column Definitions

Common Keys:

Optional Keys:

Supported `data` Types

`random`

`company`

`reference`

`reference_range`

`reference_boolean`

`reference_boolean2`

`total`

`discount`

`increment`

`faker`

`[aX]` — Append or Modify Columns

Replace Example:

Generate Random Example:

Generate Faker Example:

Field Explanations:

`[reorder]` — Final Column Order

Example:

How It Works in Code

Example Output Flow

About

Uh oh!

Releases

Packages

Languages

AldenVivian/Pseudo_Data_Generator

Folders and files

Latest commit

History

Repository files navigation

Pseudo Data Generator - Configuration Guide (rules.ini)

[rec] — Record Generation Settings

Keys:

Example:

[cX] — Core Column Definitions

Common Keys:

Optional Keys:

Supported data Types

random

company

reference

reference_range

reference_boolean

reference_boolean2

total

discount

increment

faker

[aX] — Append or Modify Columns

Replace Example:

Generate Random Example:

Generate Faker Example:

Field Explanations:

[reorder] — Final Column Order

Example:

How It Works in Code

Example Output Flow

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Pseudo Data Generator - Configuration Guide (`rules.ini`)

`[rec]` — Record Generation Settings

`[cX]` — Core Column Definitions

Supported `data` Types

`random`

`company`

`reference`

`reference_range`

`reference_boolean`

`reference_boolean2`

`total`

`discount`

`increment`

`faker`

`[aX]` — Append or Modify Columns

`[reorder]` — Final Column Order

Packages