Training Neural Networks  |  Machine Learning  |  Google for Developers (2024)

Stay organized with collections Save and categorize content based on your preferences.

Backpropagation is the most common training algorithm for neural networks.It makes gradient descent feasible for multi-layer neural networks.TensorFlow handles backpropagation automatically, so you don't need a deepunderstanding of the algorithm. To get a sense of how it works, walk throughthe following:Backpropagation algorithm visual explanation.As you scroll through the preceding explanation, note the following:

  • How data flows through the graph.
  • How dynamic programming lets us avoid computing exponentially manypaths through the graph. Here "dynamic programming" just means recordingintermediate results on the forward and backward passes.

Backprop: What You Need To Know

  • Gradients are important
    • If it's differentiable, we can probably learn on it

Backprop: What You Need To Know

  • Gradients are important
    • If it's differentiable, we can probably learn on it
  • Gradients can vanish
    • Each additional layer can successively reduce signal vs. noise
    • ReLus are useful here

Backprop: What You Need To Know

  • Gradients are important
    • If it's differentiable, we can probably learn on it
  • Gradients can vanish
    • Each additional layer can successively reduce signal vs. noise
    • ReLus are useful here
  • Gradients can explode
    • Learning rates are important here
    • Batch normalization (useful knob) can help

Backprop: What You Need To Know

  • Gradients are important
    • If it's differentiable, we can probably learn on it
  • Gradients can vanish
    • Each additional layer can successively reduce signal vs. noise
    • ReLus are useful here
  • Gradients can explode
    • Learning rates are important here
    • Batch normalization (useful knob) can help
  • ReLu layers can die
    • Keep calm and lower your learning rates

Normalizing Feature Values

  • We'd like our features to have reasonable scales
    • Roughly zero-centered, [-1, 1] range often works well
    • Helps gradient descent converge; avoid NaN trap
    • Avoiding outlier values can also help
  • Can use a few standard methods:
    • Linear scaling
    • Hard cap (clipping) to max, min
    • Log scaling

Dropout Regularization

  • Dropout: Another form of regularization, useful for NNs
  • Works by randomly "dropping out" units in a network for a single gradient step
    • There's a connection to ensemble models here
  • The more you drop out, the stronger the regularization
    • 0.0 = no dropout regularization
    • 1.0 = drop everything out! learns nothing
    • Intermediate values more useful

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2022-07-18 UTC.

Training Neural Networks  |  Machine Learning  |  Google for Developers (2024)
Top Articles
Latest Posts
Article information

Author: Kareem Mueller DO

Last Updated:

Views: 5379

Rating: 4.6 / 5 (46 voted)

Reviews: 85% of readers found this page helpful

Author information

Name: Kareem Mueller DO

Birthday: 1997-01-04

Address: Apt. 156 12935 Runolfsdottir Mission, Greenfort, MN 74384-6749

Phone: +16704982844747

Job: Corporate Administration Planner

Hobby: Mountain biking, Jewelry making, Stone skipping, Lacemaking, Knife making, Scrapbooking, Letterboxing

Introduction: My name is Kareem Mueller DO, I am a vivacious, super, thoughtful, excited, handsome, beautiful, combative person who loves writing and wants to share my knowledge and understanding with you.