This guide explains how to set up the rules.ini
file used by the Pseudo Data Generator to create synthetic datasets. You can define base columns, apply rules for dependencies between them, add additional columns, and reorder them before exporting to CSV.
Controls how many records to generate and what processing steps to apply.
num
: Number of records (e.g.,150
)mode
:1
: Generate only the base records2
: Generate + apply post-generation column edits3
: Generate + post edits + reorder columns
cols
: Total number of core columns, typically the count of[cX]
sections.
[rec]
num = 100
mode = 2
cols = 5
Each [cX]
block creates one column in the dataset. X
is a number starting from 1.
name
: Final column name in the CSV.dtype
:int
,str
,float
, ordecimal
data
: Data source type (see list below)description
: Optional comment to describe the field
options
: Values to randomly pick from (e.g.,A,B,C
)weights
: Matching weights for options (e.g.,50,30,20
)cols
: Column number to reference for logicvalue
,range
,condition
,operation
,operands
,faker_method
: Used for complex logic
Selects one value from a list, optionally with weights.
data = random
options = Red,Green,Blue
weights = 30,50,20
Auto-generates a unique ID (e.g., 1, 2, 3, ...
).
Uses a referenced column’s value to look up a match in range
and return a corresponding value.
data = reference
cols = 2
value = 100,200,300
range = 1,2,3
Uses thresholds to map a value range to a label.
data = reference_range
cols = 3
value = Low,Medium,High
range = 3,6,9
Checks if a value equals a specific condition, and returns value A or B.
data = reference_boolean
cols = 5
value = Yes,No
condition = 1
Returns one of multiple values if a condition matches; otherwise returns 0.
data = reference_boolean2
cols = 4
value = 1,2,3
condition = 1
Performs a calculation on other columns.
data = total
operation = *
operands = c2,c3
Applies a percentage increase or decrease.
data = discount
cols = 6
value = 10
operation = -
Increments by a step from a start value.
data = increment
start = 100
interval = 5
Uses Faker to generate realistic values.
data = faker
faker_method = name
Used after base data generation. You can replace values or create new columns.
[a1]
operation = replace
cols = 3
col_name = Category
find = 20
replace = Premium
[a2]
operation = generate
data = random
new_col = Customer_Type
options = Enterprise,SMB,Individual
weights = 40,40,20
nullable = 0.1
[a3]
operation = generate
data = faker
new_col = Company_Name
faker_method = company
nullable = 0.05
operation
: Eithergenerate
orreplace
new_col
: Name of column to add (for generate)data
: Data type to generate (random or faker)nullable
: Fraction (e.g., 0.1) of records that should be left emptycols
: Column to act upon (for replace)col_name
: Renames the columnfind
/replace
: Target value and what to replace it with
Sets the final column order for the output file.
[reorder]
order = 1,2,3,6,7,4,5
Each number refers to the original column number defined via [cX]
or added later.
- The script reads
rules.ini
usingconfigparser
. - If
mode >= 1
, it generates base records using[cX]
rules. - If
mode >= 2
, it applies any[aX]
append operations. - If
mode == 3
, it reorders columns based on the[reorder]
block.
The result is saved as a CSV file with column names and sample data.
You want to generate 100 sales records with:
- Product name and quantity
- Offer price and discounted total
- Customer type (some rows null)
- Company name via Faker
Your config would include:
[c1]
to[c10]
for base columns[a1]
to[a3]
for renaming and extra columns[reorder]
to organize final output
The result is a clean, realistic-looking dataset for analytics, testing, or demos.