databrickslabs
diff --git a/‎docs/source/generating_cdc_data.rst
Lines changed: 13 additions & 4 deletions b/‎docs/source/generating_cdc_data.rst
Lines changed: 13 additions & 4 deletions
diff --git a/‎docs/source/index.rst
Lines changed: 1 addition & 1 deletion b/‎docs/source/index.rst
Lines changed: 1 addition & 1 deletion
@@ -6,7 +6,7 @@
 Generating Change Data Capture data
 ===================================
 
-This section explores some of the features for generating CDC style data - that is exploring the abilitty to
+This section explores some of the features for generating CDC style data - that is exploring the ability to
 generate a base data set and then apply changes such as updates to existing rows and
 new rows that will be inserts to the existing data
 
@@ -123,15 +123,24 @@ We will also generate a set of updates by sampling from the existing data and ad
            .withColumn("customer_id", F.expr(f"customer_id + {start_of_new_ids}"))
                  )
 
-   df1_updates = (df1.sample(False, 0.1)
+   # read the written data - if we simply recompute, timestamps of original will be lost
+   df_original = spark.read.format("delta").load(customers1_location)
+
+   df1_updates = (df_original.sample(False, 0.1)
            .limit(50 * 1000)
            .withColumn("alias", F.lit('modified alias'))
-           .withColumn("modified_ts",F.expr('current_timestamp()'))
+           .withColumn("modified_ts",F.expr('now()'))
            .withColumn("memo", F.lit("update")))
 
-
    df_changes = df1_inserts.union(df1_updates)
 
+   # randomize ordering
+   df_changes = (df_changes.withColumn("order_rand", F.expr("rand()"))
+                 .orderBy("order_rand")
+                 .drop("order_rand")
+                 )
+
+
    display(df_changes)
 
 Merging in the changes
 
@@ -29,7 +29,7 @@ to Scala or R based Spark applications also.
    Generating repeatable data  <repeatable_data_generation>
    Using streaming data <using_streaming_data>
    Generating Change Data Capture (CDC) data<generating_cdc_data>
-   Multi table data <multi_table_data>
+   Using multiple tables <multi_table_data>
    Extending text generation  <extending_text_generation>
    Troubleshooting data generation <troubleshooting>