What Is Gradient Clipping Là Gì, Nghĩa Của Từ Clipping, Nghĩa Của Từ Clipping

Photo by Ahsan S. on Unsplash
Recurrent Neural Networks (RNN) work very well with sequential data by utilizing hidden states that stores information about past inputs: the values of hidden states at time t depend on their values at time t -1 and also the inputs at time t. This architecture, while powerful, causes two problems in training: exploding gradients and vanishing gradients. In this article, we will look into gradient clipping which deals with the exploding gradients problem.

Bạn đang xem: Clipping

Intuition behind Exploding and Vanishing Gradients

Exploding gradients refer to the problem that the gradients get too large in training, making the model unstable. Similarly, vanishing gradients refer to gradients getting too small in training. This prevents the network weights from changing their values. Both problems cause the model unable to learn from the training data. The following informal discussion is not that rigorous but is sufficient to give us an intuition about the source of exploding and vanishing gradients.

When we train a RNN by Backpropagation Through Time, it means we first unroll the RNN in time by creating a copy of the network for each time step, viewing it as a multi-layer feedforward neural network, where the number of layers is equal to the number of time steps. Then we do backpropagation on the unrolled network, taking into account the weight sharing:

where W is the recurrent weight matrix. It can be shown that the gradient of the loss function consists of a product of n copies of Wᵀ, where n is the number of layers going back in time. This product of matrices is the source of exploding and vanishing gradients.

For a scaler a 1, aⁿ shrink or grow exponentially. For example, consider n = 30. Then, for instance, 1.1 ≈ 17.45 and 0.9 ≈ 0.042. We pick 30 because it is quite common for a sentence in a Natural Language Processing task to have 30 words, and it is also quite typical for time series analysis to process 30 days of data. The situation for products of matrices (Wᵀ) is very similar. The easiest way to see this is to assume W is diagonalizable. Then Wᵀ = QDQ⁻¹ for some diagonal matrix D = diag(λ, …, λₓ), and (Wᵀ) = QDQ⁻¹ with D= diag(λⁿ, …, λ).

We refer to <2> for a rigorous treatment of the exploding and vanishing gradients problems.

Gradient Clipping

Gradient clipping is a technique that tackles exploding gradients. The idea of gradient clipping is very simple: If the gradient gets too large, we rescale it to keep it small. More precisely, if ‖g‖ ≥ c, then

gc · g/‖g‖where c is a hyperparameter, g is the gradient, and ‖g‖ is the norm of g. Since g/‖g‖ is a unit vector, after rescaling the new g will have norm c. Note that if ‖g

Frequently Asked Questions

Q: How do we choose the hyperparameter c?

A: We can train our neural networks for some epochs and look at the statistics of the gradient norms. The average value of gradient norms is a good initial trial.

Xem thêm: Bản dịch của " at that time tiếng việt là gì ? dấu hiệu nhận biết cách dùng

Q: Can we use gradient clipping in training neural architectures other than RNN?

A: Yes. We can use gradient clipping for any neural architectures whenever we have exploding gradients.

Further Reading

Chapter 10.11 of <1> has a good overview of how gradient clipping works.<3> introduces a new smoothness condition to provide a theoretical explanation for the effectiveness of gradient clipping.
Anh-Việt Việt-Anh Nga-Việt Việt-Nga Lào-Việt Việt-Lào Trung-Việt Việt-Trung Pháp-Việt
Việt-Pháp Hàn-Việt Nhật-Việt Italia-Việt Séc-Việt Tây Ban Nha-Việt Bồ Đào Nha-Việt Đức-Việt Na Uy-Việt Khmer-Việt Việt-Khmer


clipping /"klipiɳ/ danh từ sự cắt, sự xén (từ Mỹ,nghĩa Mỹ) bài báo cắt ra mẩu cắt ra tính từ sắc nhanh (từ lóng) cừ, tuyệt, chiếnto come in clipping time đến rất đúng giờsự cắtsự cắt xénsự hạn chếsự xénfine clipping: sự xén tinhpeak clipping: sự xén đỉnhreverse clipping: sự xén ngượcsignal clipping: sự xén tín hiệuspeech clipping: sự xén tiếng nóitail clipping: sự xén đuôisự xén bớtxénbidirectional clipping circuit: mạch xén hai chiềuclipping circuit: mạch xénclipping level: mức xénclipping path: đường xénclipping region: miền xéndiode clipping circuit: mạch xén đầu dùng điotdiode clipping circuit: mạch xén dùng diodefine clipping: sự xén tinhpeak clipping: sự xén đỉnhreverse clipping: sự xén ngượcsignal clipping: sự xén tín hiệuspeech clipping: sự xén tiếng nóitail clipping: sự xén đuôitransistor clipping circuit: mạch xén transistortriode clipping circuit: mạch xén triodewhite clipping: xén đỉnh màu trắng
Lĩnh vực: cơ khí & công trìnhmẩu (cắt ra)sự cắt rìa (xờm)sự sửa mép
Lĩnh vực: xây dựngmẩu báo
Lĩnh vực: điện lạnhsự xén đỉnhback clipping planemặt cắt phía sauclipping pathđường cắtclipping planemặt phẳng cắtclipping pointđiểm rìaclipping regionvùng giaoreverse clippingsự chọn xungwindow clippingsự phân chia cửa sổ (màn hình máy tính)window clippingsự tạo cửa sổbài báo cắt raclipping bureauphòng cung cấp tư liệu báo chíclipping serviceấn phẩmclipping servicedịch vụ thu lượm tin o sự cắt, sự xén



Tra câu | Đọc báo tiếng Anh


Từ điển Word



any of various small fasteners used to hold loose articles togetheran article of jewelry that can be clipped onto a hat or dressa sharp slanting blow

he gave me a clip on the ear

Leave a Reply

Your email address will not be published. Required fields are marked *


Welcome Back!

Login to your account below

Retrieve your password

Please enter your username or email address to reset your password.