Electron microscopy
 
tf.data.Dataset
- Python for Integrated Circuits -
- An Online Book -
Python for Integrated Circuits                                                                                   http://www.globalsino.com/ICs/        


Chapter/Index: Introduction | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | Appendix

=================================================================================

For TensorFlow, tf.data is used to build an input pipeline to batch and shuffle the rows.

tf.data.Dataset is a dataset object in TensorFlow, which represents a large set of elements (i.e., a dataset). tf.data.Dataset allows you to:
          i) Create data pipelines (dataset) from the given data:
            i.a) In-memory dictionary and lists of tensors
            i.b) Out-of-memory sharded data files
          ii) Preprocess data in parallel (and cache result of costly operations) and transform the dataset with collective functions such as
map:
                dataset = dataset.map(preproc_fun).cache()
          iii) Iterate over the dataset and process individual elements, and configure the way the data is fed into a model with a number of chaining methods
                dataset = dataset.shuffle(1000).repeat(epochs).batch(batch_size), drop_remainder = True)
          iv) Data from a tf.data.Dataset can be taken to refactor linear regression, and then implement stochastic gradient descent with it. In this case, the dataset will be synthetic and read by the tf.data API directly from memory, and then tf.data API is used to load a dataset when the dataset resides on disk. The steps for this application are:
            iv.a) Use tf.data to read data from memory.
            iv.b) Use tf.data in a training loop.
            iv.c) Use tf.data to read data from disk.
            iv.d) Write production input pipelines with features engineering (batching, shuffling, etc.)

Distinct ways to create a dataset are:
          i) A data source constructs a Dataset from data stored in memory or in one or more files.         
          ii) A data transformation constructs a dataset from one or more tf.data.Dataset objects.

tf.data.Dataset APT supports various file formats and Python objects and accepts as model input used for training and specifically designed for input pipelines:
          i) Dataset from a Python list, NumPy array, Pandas
             DataFrame with from_tensor_slices function
                    ds = tf.data.Dataset.from_tensor_slices([1, 2, 3])
                    ds = tf.data.Dataset.from_tensor_slices(numpy_array)
                    ds = tf.data.Dataset.from_tensor_slices(df.values)
                          E.g. Create a tf.data.Dataset object, which can be used to train a model:
                                     train_dataset = tf.data.Dataset.from_tensor_slices(train images).shuffle(BUFFER_SIZE).batch(BATCH_SIZE)
                                 char_dataset = tf.data.Dataset.from_tensor_slices(text_as_int)
          ii) Dataset from a text file with TextLineDataset function
                    ds = tf.data.TextLineDataset("file.txt")
          iii) Dataset from TensorFlow’s TFRecord format with TFRecordDataset function
                    ds = tf.data.TFRecordDataset("file.tfrecord")
          iv) Dataset from fixed size records from binary files
                    ds = tf.data.FixedLengthRecordDataset()
          v) Dataset from CSV file with
                    ds = tf.data.experimental.make_csv_dataset ( "file.csv", batch_size=5)
          vi) Dataset from TensorFlow Datasets catalog.

============================================

         
         
         
         
         
         
         
         
         
         
         
         
         
         
         
         
         
         

 

 

 

 

 



















































 

 

 

 

 

=================================================================================