Shuffle the Training Data in TensorFlow

What is Data Shuffling?

It is a shuffling technique which mixes the data randomly from a dataset, within an attribute or a set of attributes. Between the columns, it will try retaining the logical relationship.

 

Why do we shuffle data?

Training, testing and validation are the phases that our presented dataset will be further splitting into, in our machine learning model. We need to shuffle these datasets well, avoiding any possible elements in the split datasets before training the ML model.

Data shuffling satisfies the purpose of variance reduction. It’s goal is to keep the model general and makes sure that it doesn’t over fit a lot.

In simple words,

  • helps the training converge fast
  • prevents the model from learning the order of the training
  • improves the ML model quality
  • prevents any bias during the training

The data sorted by their target/class, are the most seen case where you would shuffle your data. The reason why we will want to shuffle for making sure that our validation/test/training sets are representative of the whole distribution.

 

Data Shuffling in TensorFlow

Let’s look at the piece of code below,

import tensorflow as tf
import numpy as np

a = tf.placeholder(tf.float32, (None, 1, 1, 1))
b = tf.placeholder(tf.int32, (None))

indices = tf.range(start=0, limit=tf.shape(a)[0], dtype=tf.int32)
shuffled_indices = tf.random.shuffle(indices)

shuffled_a = tf.gather(a, shuffled_indices)
shuffled_b = tf.gather(b, shuffled_indices)

The above code will return a transformed dataset, which will be going through loading and testing for our machine learning model.

shuffled_indices = tf.random.shuffle(indices)‘ can be seen above. This directly shuffles the indices within the dataset and applies the changes to it.

The syntax for Shuffling method is:

tf.random.shuffle(
    value, seed=None, name=None
)

tf.random.shuffle() will randomly shuffle the tensors, which contains the data of our datasets.

out put:
[[1, 9],       [[5, 5],
 [3, 7],  ==>   [1, 9],
 [5, 5],        [2, 8] [2, 8]]         [3, 7]]

As we can see, The tensor is shuffled along with dimension 0, such that each value[x] is mapped to one and only one output[y]. Above is an example of mapping that might occur for a 4×2 tensor.

Here,

  • value : contains value of tensor that is to be shuffled.
  • seed : random seed used for distribution.
  • name: an optional argument to name the operation.

 

These arguments will return the shuffled tensor.

Leave a Reply

Your email address will not be published.