Deep Learning Lecture Notes (Q&A with Ian Goodfellow)

Deep Learning Lecture Notes (Q&A with Ian Goodfellow)
Deep Learning Lecture Notes (Q&A with Ian Goodfellow)


Q1. What are some strategies for making your machine learning model work well when you don’t have much data?

One of the main deep learning algorithms for small data set is called dropout algorithm, where every time you run the neural net you randomly turn off some of the neurons in the neural net. Because it’s different set of random neurons each time you run it, it’s kinda of like having more data. There is also an approach called Bayesian inference where the idea is instead of finding one neural net to explain the data, we think about all the possible different neural nets that could explain the data in many different ways. In practice, you can’t enumerate all the different possible neural nets but you can write down a mathematical integral that describes what the voting would look like if every neural net was able to vote weighted by how well it could explain the training set. And if you could approximate that integral, you can do really well even if you don’t have very much training data.

Q2. What scenarios should you not use neural nets?

One answer would be if you have very limited computational budget, like if you are working at high frequency trading company, and it’s really important to make your trades very very fast, then you might use some kind of shallow algorithm. Another answer is if there just isn’t much structure in your data compared to how much noise there is. If there isn’t any highly complicated pattern you can reliably extract, then there isn’t any need to use a complicated model that describes those complicated patterns. For most tasks you consider to be artificial intelligence tasks, you usually want to use deep neural net, so if you are doing something like understanding speech, recognising objects, generating images, playing video games, making a robot cook, that kind of thing, those are usually tasks where you want to use deep learning.

Q3. Android speech recognition has to respond to low latency, but they still use deep nets?

When I talk about latency, I am talking about latency that is not on the scale of a human mind where micro seconds in the training algorithm will change the difference in how much profit you make. Humans will tolerate fractions of a second, if you ask your android phone a question, it responds in absolutely micro seconds after you quit speaking, it would be a little bit creepy and annoying. It’s okay for there to be several hundred mili seconds of delay for speech recognition.

Q4. Can you see mistakes that happen at different layers of the neural net and can you go back to correct it?

The answer is that is a much deeper question than you might realise. It’s really hard for us to understand exactly what the meaning of each of the network layer is. It’s really hard for us to tell exactly what the network is doing. So it’s possible there are correcting mistakes at different layer of the network. There are some models that are designed explicitly to do that, but those aren’t the models that are the most popular right now. The models I describe in this lecture just move in one direction through the network, until maybe about 5 years ago, a very popular research direction was to understand if information flows backwards as well. We know in human brain there are 10 times more backwards connections than forward connections, we don’t necessarily know what those do, some of them might be for the learning process and not actually used to recognise new images, but they might also be used in the recognition process itself. So there are a lot of algorithms such as deep boltzman machine where more abstract layers can go back and change more concrete layers, that can possibly fix the mistakes you are talking about, but so far it has not ended up being the most popular and effective algorithm and we don’t really know why.

There are a lot of research and different visualisation techniques for understanding what the intermediate layers are doing, but a lot of the different analysis techniques will show you very different results and it’s kinda of hard to understand which one we should take more seriously than others. Part of the issue is that it is such a complicated system that if you expect to find something, you can probably what you are looking for somewhere and you just don’t know if that’s the most common thing or it happens only occasionally. In biology, an example I give people is for gene transcription, you gotta to read a gene out of a DNA and how does the body actually decides where to start reading and where to stop reading? Well the default mechanism we usually have is there’s a start code and stop code, but then there are other mechanisms, such like as the DNA transcription enzyme starts to read DNA and copy the RNA, the RNA is designed to swing around and hit the enzyme and knocks it off the DNA. So it’s a completely crazy mechanism but the body actually decides to use it somewhere. I think neural nets are a little bit like that, if you can think of a mechanism, it probably happens somewhere in some neural net, and it can be hard to tell if you are finding a mechanism that happens only occasionally. This is a popular research area but at the moment I would say techniques for analysing neural nets haven’t reached a solid conclusion yet. And a lot of them find almost contradictory things to each other.

Q5. Do you need more than three layers in neural nets?

I think by three layers you meant input layer, hidden layer and output layer. So basically, do you need to learn more than one hidden layer or not? So there are a few questions here, what functions can neural net represent? and what functions can neural net learn? It turns out if you have just one hidden layer, you can represent any function with neural net. You might need to give it lots lots of neurons, but you can represent it. But when you actually start to learn from the training set, it might be really hard to learn with that size of hidden layer. Basically if you have just one hidden layer you might be able to solve in principle most of the problems, but you might make it very difficult for yourself, you might be able to solve it with fewer neurons if you make it deeper, or you might be able to generalize to the test set with fewer training examples if you make it deeper. And it changes a lot from one task to another. Partly it depends on the structure of the task you are solving, a deeper network is saying that the task has recursive structure to it, so like objects are made of object parts and objects parts are made of contours and corners, contours and corners are made of edges and edges are made of pixels, right. So that kinda of tells us you want to have several layers of processing. Some other tasks like whether you should give the patient a particular drug or not, that might be a very simple function and you don’t need much depth to solve it.

Q6. Can you tell how noisy your data is ahead of time? 

A lot of time you have a pretty good idea, you just guess base on your knowledge of where the data is coming from. There are many different sources of noise in the world, one source is just things are really very random like you got heat in the physical world that just scatters and you are measuring variables that are related to that. You can imagine some of the physical processes you are measuring can have very random effects. Or your understanding of the physics tells you it will be fairly noisy. Another source of randomness is if you have very incomplete information. If you are trying to predict whether a user will buy a particular product but you don’t know much of anything about that user, you might know something where they are located or what website they used to navigate to your website, but you don’t really have any idea whether they already have the shoes that you are selling or something like that. If you have complete information about them, you might have a better idea whether it makes sense for them to click on a particular product. That’s why a lot of ad models still use linear models for quite a long time. You can do things like measuring the randomness in your system, but it’s a little hard to know for sure if you are measuring it correctly, basically if you try to fit structure and you don’t gain anything in fitting that structure, then it’s a sign there might be just the noise there. But you won’t know if it’s some structure you are failing to detect or it’s noisy because there’s not structure to start with.

Q7. How do you approach debugging?

That’s almost like my whole job. Debugging is probably the hardest thing, it’s the reason that I cannot just write a program that does machine learning for you. There’re all kinds of bugs that can come up everywhere from the way you prepare the data to the way you write the code for the machine learning algorithm to where you can choose the hyper parameters, you need a lot of experience, really. There are so much you can say about that, it is really hard to answer that question on the spot.

Q8. Is there an area of research that’s more interesting than the amount of hype it’s receiving?

Maybe it’s fairness in machine learning, that’s an issue I think a lot of people are not even aware it exists. When you start using machine learning algorithms to make decisions that affect people’s lives, like whether to approve their mortgages or not, you need to think a lot about how that algorithm is actually working. It’s just a difficult technical problem, nobody designs unfair machine learning algorithms because they are cruel cold hearted people, it’s just because the algorithms that work the best are really hard to understand how they work, the algorithms we know how they are working are not usually very effective. A lot of people are starting to dive into this area, but it has not become white hot as things like reinforcement learning or supervised learning. I am also working on another area, which maybe appropriately hyped, which is machine learning security. Machine learning security is how you make sure that your machine learning algorithm will work correctly even if someone is intentionally trying to interfere with it. What if they are changing the training set examples to make it learn the wrong functions, what if they are changing the input to make it mis-recognise things and recommend that you take a bad course of action, what if they are trying to study the parameters of your training model in order to recover training examples that are sensitive information you don’t want to publish? That’s something that really just took off last year or so.

Q9. Do you want to make an end-to-end system or do you want to make a system that’s divided into several different components?

So you can imagine a system that’s divided into different components maybe a system say that you want to read a piece of text, a system that’s divided into components will find each of the different letters of the text and then another component will go through and recognise each letter individually and then the end system will just look at the text and output the whole sentence all at once. There’s always this debate about whether you could do end-to-end learning or whether you need to split things into components. There are some theoretical reasons that end-to-end learning could be hard for some problems but it also requires a lot less engineering effort, so if you can get it to work, it’s great. There might be some problems that this just doesn’t work. In practice, it seems end-to-end learning has been very successful for a lot of problems and a lot of times we see papers that overstate how hard end-to-end learning is. There was a paper that said you can’t train a convolutional neural net to recognise sequence of symbols and then my co-authors and I did basically that at Google a year later. So sometimes you think things are really difficult then it turns out the problem just goes away if you make the network bigger or train it with more data or something like that. In other cases, you can actually prove theoretically that you can’t learn them end-to-end but maybe the proof only applies to very weird problems that don’t resemble anything that comes up in real life very much.

Q10. Is there a good way for deep learning to deal with missing data?

Most of deep learning models don’t have any good way to deal with missing data but there are some that can. A lot of the reasons people study generative models is that generative models gives a good way to deal with missing data. That problem hasn’t been very popular lately.

Q11.  Is there difference between the way deep learning works and the way children and adults learn?

There is not a huge amount of literature on that topic, one thing I can think of is my friend Andrew Saxe’s paper Exact Solutions to the Non-linear Dynamic Learning. That’s probably one under-hyped paper rather than an under-hyped field. That paper influenced my thinking a lot for several years since then. One aspect of the paper is comparing the way that children learn to the way deep learning algorithms learn. Both deep learning and child learning have this funny thing that they learn very fast, their error rate goes down really fast, then they kind of level off for a while and then they suddenly go down really fast again. That turns out to be related to the shape of the surface of the cost function they are minimising. The learning algorithm can slow down when it comes near the saddle point. The saddle point is when it comes to the cost function, it looks like a minimum in one cross section but then in another cross section it looks like a maximum. So when you come down on the cross section that looks like a minimum, you get stuck on the bottom of it for a while, then you discover that you are actually on top of a maximum, you start moving on the other direction.

Q12. What’s interesting about moving the architecture forward about deep learning? Is it just more layers and more layers?

One of the things that I think about the most now is adversarial examples.

Deep Learning Lecture Notes (Q&A with Ian Goodfellow)

A lot of the reason is that the machine learning algorithms you use together are very linear as a function of their input, even though they have lots of layers, they still end up looking a lot like a linear function. It’s really nonlinear function of their parameters, but not of their input itself. I am really interested in designing new architectures that are less linear and more non-linear, and in particular that they able not to be fooled by these tiny little changes. That’s the thing I am personally most excited about and I spend like 60-70 percent of time working on that.

Deep Learning Lecture Notes (Q&A with Ian Goodfellow)

These learning notes are well designed and help you a lot to prepare new things for exams. The review share few of them free but others you have to pay them well.


I go dally at university and attend lectures. but I learn in every subject to online notes. The notes give me the best information about all subjects.this per your health login site has medical notes you can get info about medical.


这老哥搞的 Cleverhans 好像比较弱


Learning through lectures is a good way to learn about different things. I get this way from cat food refrigerated reviews and I start learning about cat care.


Thank you for sharing this useful article information. I am really impressed with the article you provided.

bmi calculator


Thanks to the creator for writing the post, it was quite necessary for me and liked it. I wrote a note on the about this. I will be happy if you read it and accept it. Thank you for your concern.

Thank you for the read. Honestly you covered the topic and broadly examined all areas. If i was to write this i would have done a few things differently myself but you have definitely inspired me to get into the world of blogging. Thanks heaps for the post i really appreciate it. Have a good day and keep blogging

On the off chance that you choose to enlist engineers from abroad, you will go over various techniques, for example, reevaluating, nearshoring, seaward programming, among others. Albeit the recruitment interaction reevaluating might be intricate and tedious, many organizations rethink their product improvement undertakings and entire activities to a dependable seaward programming re-appropriating company.Offshore programming ended up being the most advantageous model of re-appropriating. In this article, we will investigate the benefits your organization can acquire on the off chance that you recruit seaward engineers, and what seaward programming improvement rates by country are.If you know about the idea of reevaluating, then, at that point, the guideline of the seaward programming advancement administrations ought not deceive you. Indeed, this is a similar exchange of the product advancement errands to a specialist, a programming group, or an outsider seaward programming improvement organization situated in another country.The scope of seaward programming improvement administrations changes from creating one help, like a portable application or even just its UI/UX configuration, to full-cycle seaward programming for the customer. You can discover independent seaward programming software engineers who will do part of the turn of events. In any case, on the off chance that you intend to have a bunch of programming arrangements, the keen move is to appoint the undertaking to a seaward programming improvement organization that has a standing, trust, and a strong involvement with the field.There are many benefits and drawbacks related with seaward programming advancement administrations. But then, despite the potential downsides of working with the seaward programming advancement organization, numerous organizations recruit seaward programming engineers on the grounds that the upsides of seaward programming improvement administrations beat disadvantages.Offshore rates for programming advancement rely upon many elements including the experience of the SW designers, the nation's market pay rates, length of the venture. Hence, frequently, seaward programming improvement rates by country contrast a ton, and assuming you need to enlist seaward programming engineers, think about this. seaward programming improvement rates may significantly contrast regardless of whether seaward programming software engineers are situated in Europe. In this way, if the seaward programming advancement cost is one of the essential focuses while picking a seaward programming improvement administrations supplier, reaching a Ukrainian seaward programming re-appropriating organization is a superb method to get a good deal on work. It is feasible to recruit offshoring it programming engineers in Ukraine at reasonable seaward rates for programming improvement and simultaneously get excellent coding assistance.No matter what sort of seaward programming advancement administrations you need, picking the right seaward programming advancement organization is half of your prosperity. In the past area, we characterized that Ukraine offers keen work costs, quality administrations, and has an extraordinary number of rethinking merchants. Mobilunity, a strong player on the Ukrainian IT market, that has been conveying its product advancement administrations for more than 10 years to customers in the United States, Canada, Japan, just as West European countries.Mobilunity is a notable and believed seaward programming improvement organization with huge aptitude in making programming in long and momentary ventures, to various enterprises, at a reasonable seaward programming improvement cost. By utilizing the Mobilunity's noteworthy experience and alluring seaward rates for programming advancement, you can arrive at your business goals.Mobilunity has constructed a solid picture of a dependable seaward programming reevaluating organization, continually dealing with upgrades and offers quality types of assistance, performed by the best SW engineers to meet the special requirements of clients that recruit seaward programming designers at our organization for their tasks.

Today I would like to share with you a very important link, where you will find a tool that will become a great helper for you for years to come! This tool specializes in the use of prepositions when writing text works. This is very convenient as you don't have to spend a lot of time looking for errors manually. The tool will check the text in a few minutes and indicate errors. Thanks to this, you can quickly and easily fix them. Good luck friends

登录注册 留言