Backpropagation is the most common training algorithm for neural networks.It makes gradient descent feasible for multi-layer neural networks.TensorFlow handles backpropagation automatically, so you don't need a deepunderstanding of the algorithm. To get a sense of how it works, walk throughthe following:Backpropagation algorithm visual explanation.As you scroll through the preceding explanation, note the following:
- How data flows through the graph.
- How dynamic programming lets us avoid computing exponentially manypaths through the graph. Here "dynamic programming" just means recordingintermediate results on the forward and backward passes.
Backprop: What You Need To Know
- Gradients are important
- If it's differentiable, we can probably learn on it
Backprop: What You Need To Know
- Gradients are important
- If it's differentiable, we can probably learn on it
- Gradients can vanish
- Each additional layer can successively reduce signal vs. noise
- ReLus are useful here
Backprop: What You Need To Know
- Gradients are important
- If it's differentiable, we can probably learn on it
- Gradients can vanish
- Each additional layer can successively reduce signal vs. noise
- ReLus are useful here
- Gradients can explode
- Learning rates are important here
- Batch normalization (useful knob) can help
Backprop: What You Need To Know
- Gradients are important
- If it's differentiable, we can probably learn on it
- Gradients can vanish
- Each additional layer can successively reduce signal vs. noise
- ReLus are useful here
- Gradients can explode
- Learning rates are important here
- Batch normalization (useful knob) can help
- ReLu layers can die
- Keep calm and lower your learning rates
Normalizing Feature Values
- We'd like our features to have reasonable scales
- Roughly zero-centered, [-1, 1] range often works well
- Helps gradient descent converge; avoid NaN trap
- Avoiding outlier values can also help
- Can use a few standard methods:
- Linear scaling
- Hard cap (clipping) to max, min
- Log scaling
Dropout Regularization
- Dropout: Another form of regularization, useful for NNs
- Works by randomly "dropping out" units in a network for a single gradient step
- There's a connection to ensemble models here
- The more you drop out, the stronger the regularization
- 0.0 = no dropout regularization
- 1.0 = drop everything out! learns nothing
- Intermediate values more useful
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2022-07-18 UTC.
[{ "type": "thumb-down", "id": "missingTheInformationINeed", "label":"Missing the information I need" },{ "type": "thumb-down", "id": "tooComplicatedTooManySteps", "label":"Too complicated / too many steps" },{ "type": "thumb-down", "id": "outOfDate", "label":"Out of date" },{ "type": "thumb-down", "id": "samplesCodeIssue", "label":"Samples / code issue" },{ "type": "thumb-down", "id": "otherDown", "label":"Other" }] [{ "type": "thumb-up", "id": "easyToUnderstand", "label":"Easy to understand" },{ "type": "thumb-up", "id": "solvedMyProblem", "label":"Solved my problem" },{ "type": "thumb-up", "id": "otherUp", "label":"Other" }]