Training Neural Networks | Machine Learning | Google for Developers (2024)

Machine Learning

English
Deutsch
Español
Español – América Latina
Français
Indonesia
Italiano
Polski
Português – Brasil
Tiếng Việt
Türkçe
Русский
עברית
العربيّة
فارسی
हिंदी
বাংলা
ภาษาไทย
中文 – 简体
中文 – 繁體
日本語
한국어

Foundational courses

Stay organized with collections Save and categorize content based on your preferences.

Backpropagation is the most common training algorithm for neural networks.It makes gradient descent feasible for multi-layer neural networks.TensorFlow handles backpropagation automatically, so you don't need a deepunderstanding of the algorithm. To get a sense of how it works, walk throughthe following:Backpropagation algorithm visual explanation.As you scroll through the preceding explanation, note the following:

How data flows through the graph.
How dynamic programming lets us avoid computing exponentially manypaths through the graph. Here "dynamic programming" just means recordingintermediate results on the forward and backward passes.

Backprop: What You Need To Know

Gradients are important

If it's differentiable, we can probably learn on it

Backprop: What You Need To Know

Gradients are important

If it's differentiable, we can probably learn on it

Gradients can vanish

Each additional layer can successively reduce signal vs. noise
ReLus are useful here

Backprop: What You Need To Know

Gradients are important

If it's differentiable, we can probably learn on it

Gradients can vanish

Each additional layer can successively reduce signal vs. noise
ReLus are useful here

Gradients can explode

Learning rates are important here
Batch normalization (useful knob) can help

Backprop: What You Need To Know

Gradients are important

If it's differentiable, we can probably learn on it

Gradients can vanish

Each additional layer can successively reduce signal vs. noise
ReLus are useful here

Gradients can explode

Learning rates are important here
Batch normalization (useful knob) can help

ReLu layers can die

Keep calm and lower your learning rates

Normalizing Feature Values

We'd like our features to have reasonable scales

Roughly zero-centered, [-1, 1] range often works well
Helps gradient descent converge; avoid NaN trap
Avoiding outlier values can also help

Can use a few standard methods:

Linear scaling
Hard cap (clipping) to max, min
Log scaling

Dropout Regularization

Dropout: Another form of regularization, useful for NNs
Works by randomly "dropping out" units in a network for a single gradient step

There's a connection to ensemble models here

The more you drop out, the stronger the regularization

0.0 = no dropout regularization
1.0 = drop everything out! learns nothing
Intermediate values more useful

Help Center

arrow_back Programming Exercise

Best Practices arrow_forward

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2022-07-18 UTC.