Scalable Learning and Inference in Hierarchical Models of…

July 25, 2019 posted by

I so the neocortex for a human your neocortex is often likened to a folded sheet about the size of a dinner napkin now it's not it's planar from a topless apologist point of view but it certainly isn't flat even if you tried to flatten it out it has been studied probably more than any other portion of the brain possibly because people think that it's one of the things that distinguish us from from other mammals but one of the things that was early on discovered is that it has a very regular structure throughout both in a hierarchical sense but also in terms of there being modular components and these were first identified at the anatomical level by vernon mountcastle and some of his colleagues back in the 90s and they referred to these as cortical columns on in fact they invented a whole slew of columnar type structures the Li if you look at the cortex from the side you'll see that it's striated in a number of levels and those levels have been identified and there's different types of cells that are in each of the levels the circuitry for those cells has been mapped out in some detail and that not only does it have a well-defined sort of structure with sort of a sheet a laminar sheet on the top which corresponds to a lot of connections its myelinated it's the white matter that you see on the top and another sheet at the bottom which has additional connections and in between they are these calendar structures and I'm going to almost treat them as cartoons and there's some controversy about whether or not the colander structures are real in any sense but certainly they they they are real in the sense that an anatomist can pick them out by using the various kinds of stains so my cartoon picture of the cortex is what's shown in the lower right a set of columns arranged on a sheet with a lot of connectivity both bottom and top various areas of the cortex specifically those having to do with the visual cortex have different areas and those areas actually have mappings that map from the periphery of the body say from the retina back to the earliest processing areas and then mapping from those two other processing areas and and each one of the maps basically preserves a lot of the spatial characteristics of the original signal so if they correspond to the retina then the patterns on the retina that points on the retina cells in the retina that are close to one another are close to one another in area v1 the first area visual area v2 v4 inferotemporal cortex etcetera they map through the cortex in that fashion don't ask me where v3 is it's just it it's not a mistake v3 just wasn't interesting enough to have a number i guess the processing that goes on between these areas tends to have both a bottom-up component where data propagates up through the layers and so i often depicted as a stack but obviously it's not represented it doesn't it is instantiated as a stack on the cortex it's a bunch of plates that are connected together what got me interested in this in the first place was a colleague of brown David Mumford and one of the students now a faculty member at Carnegie Mellon developed a very simple elegant model of processing in the visual or striate cortex and the idea was that the data came in from the periphery of the body either the data itself or some proxy or summary of the data was propagated up to the various levels and that constituted the bottom-up component and the top-down component was in terms of expectations that were generated by the data had seen in the first place and so the data often was incomplete and the top-down component the expectations in terms of prior distributions allowed you to fill in portions the data that weren't there and the idea is that these are the kinds of things that enable you to hallucinate or two to fill in the details where perhaps the data is occluded in the case of vision or a lead in the case of text so their model was something called a hierarchical model it was a it's called a generative asian model which is a whole school of statistical inference due to a fellow named of Granada the particular way that they modeled the generative models with in terms of something called a Markov random field but that's essentially a detail and we'll talk a little bit more about various kinds of instantiations in terms of what are generally called graphical models markov random fields being a particular case so again in the spirit of using cartoons in order to sort of understand the high-level concepts I want to start with a very simple pattern recognition problem and I want to use that in order to motivate the use of generative models so suppose that we have the task of recognizing or distinguishing between different kinds of buildings and we only have two different two different buildings we have garages which I'm going to tell you in this cartoon world consist of usually one story building with a shed roof and houses which are usually two-story buildings though sometimes they're one-story buildings and they tend to have gable roof we're going to divide the visual field up into four quadrants and those are usually termed receptive fields and the idea of a receptive field was first introduced by you balloon vessel where the idea is that their portions little areas on the retina that Matt back to cells and those cells are tuned to various patterns so in our cartoon here the the four receptive fields are tuned to a number of different features and they can recognize a left pitch low of a gable roof or the left pitch of this corresponds to one portion of a shed roof a right pitch left two-story building etc etc so this is a hierarchical model it's a generative model it happens to be also called a Beijing network and each of the boxes corresponds to a random variable the random variables are indicated as building roof frame and then more and more detail as you go down through the hierarchy the random variable building can take on three possible values house garage none etc and you can see that the variables high in the hierarchy essentially are consists of a composition of the features that are at the next level down so what we want to do this this instance basically outlines the the structure of the model we also want to instantiate it and provide a set of parameters that allow us to connect quantitative distinctions as well as qualitative distinctions so to add to the the graphical structure of the model we're going to establish a set of parameters and for each of the random variables will give a conditional or marginal distribution the conditional distribution is in the case of variables that have parents and for variables don't have parents to simple marginal so the probability of building that's our prior on whether or not we're going to see a building in the visual field and the prior on the probability of roof given buildings just a conditional probability that says if in fact we've seen a building then what's the probability that we're going to see different kinds of roof structures one of the reasons that that these models are used so much in statistics and computer science and artificial intelligence nowadays is that they have a solid semantics so with those distributions the distributions on the previous page we can completely describe the corresponding distribution and statisticians like that because it's the it's the the coin of the realm and statistics so what we want to be able to do is to describe a joint distribution and that's what the the variables on the previous page allow us to do but we're usually not interested specifically in the in that the joint we're interested in marginalizing out some of the variables and getting probabilities with respect to particular variables so there's a process of inference whereby you can compute the marginal given some evidence and if the evidence is empty then you're still going to compute a marginal and this gives an indication of what you would compute for a particular instantiation of the parameters on the previous page so in this case what I've done essentially is run an inference algorithm I gave it some initial parameters for example I said that the probability of a priori seeing a house is about half of seeing a garage is about a third and seeing nothing at all is about a fifth the time so running the inference algorithm because of some subtleties in the way that things propagate that the distribution that results is not exactly that distribution and here's what you would get in what's called the belief function for the case of the graphical model where there's no evidence and it again it indicates that yeah pretty much at the time you see a house and if you look at the upper left corner it says most of the time you're going to see a left pitch which is consistent with the idea that we're looking at a house okay anybody I'm having trouble with understanding of the basics of this these models okay so now we can add some evidence and here we're going to fill in the evidence for three of the quadrants and leave the other one to float so the little box is shown in red correspond to the instantiation of those random variables and as a consequence of that the data sort of percolates up and we see that the probability of there being a house is much higher but the priors and the expectations also propagate down and you see it's a much higher probability of there being a left pitch okay so this is an example of both that the bottom up and the top down inference that these graphical models are capable of and it also gives you an example of a kind of pattern recognition or pattern completion we're shown a portion of the pattern it fills in the other portions of the pattern there's a lot of tricks that are required to make it more useful and in particular one of the most difficult problems in machine vision or for that matter in machine translation is deal with with various kinds of invariance and the particular translation and scale invariance there's also compositionality constraints so if we were to take the generative model on the previous page and do what's often done as an example of a generative model is you fix the top variable so you set it to be bill link and then you sample from the distribution that is you sample at the leaves and the question is what kinds of images would would appear and if you did you for that particular model you get things that shone like that so the pieces might be right at least statistically but obviously they don't cohere in any interesting way so somehow you have to add compositionality constraints you also might want to look at problems where there are multiple instances the same object in in the in the image in particular you'd often see a house with a garage adjoining and you'd like to be able to recognize that as well in order to do that you have to construct a more complicated model and it's done primarily by adding additional dependencies and correlations between the random variables but the basic structure essentially stays the same so we're talking about the cortex not about graphs that you can draw on blackboards and the kind of connectivity in the cortex is of a relatively benign sort so they're on the order of 10 to the 15 excuse me 10 to the 11th neurons but only ten to the fifteenth connections so this is usually referred to as a very sparse graph it has what's called the small world properties essentially the the distance between any two nodes two neurons in the graph is relatively short because there's a blend of short-range connections and long-range connections this kind of connectivity is rapidly becoming possible to model on cluster computers and that's one of the things that we're going to be talking about in the remainder of the talk so another aspect besides the ones I just mentioned having to do with the compositionality constraints and dealing with scale and translation invariance is that really what the cortex is good doing is fusing data from multiple sources handling multiple modalities it also allows us basically the the cortex is a sequence machine it's it's used to recognize sequences more than anything else whether the sequences arise acoustically visually or otherwise so we have large pipes coming in corresponding to data and we would like to determine correlations between that data we'd like to be able to recall an image based upon some sound recall a sound based or song based upon an image etc and exactly how we structure this is is critical and one of the goals that we have in this project so at this point I'm going to give you another cartoon and this cartoon essentially is an example of the kind of hints that engineers can get from looking at at the cortex and one of the ideas that drove the models that you'll see in just a minute so this is the primary visual pathways in from back from the retina to the striped or visual cortex and as you can see that that the left side of the brain gets information from both the left and the right eye it's the left side of each of those two eyes but nevertheless it's getting information from both it gets mapped back to the lateral geniculate you can sort of think of the lateral geniculate and if there any neuroscientist in the audience they'd probably object to this but you can sort of think as the lateral geniculate as a kind of image buffer though there's a good deal of processing that happens not just if the latter was ridiculous but even earlier actually on the retina from a computational point of view the people who design or design the people who try to elicit and understand the circuitry of the brain have have mapped it into various components that perform operations on on images or the or images and the the mappings from images through various processing things so the retina maps back to the lateral geniculate which maps all the way back to the back of the brain and the striate cortex and then it branches out to several other areas in area 2 and the medial temporal cortex the notion of a visual field is is most often associated with areas of the retina and as they map back on to cells so the on the far left the green nodes are supposed to correspond to rods and cones they map back to what are called bipolar cells and they back to retinal ganglion cells and at each point you can see that cells are taking in information from cells in the layer in front of them and as you get further and further back from the periphery of the body the the cells essentially are agglomerating data from a much larger section of the retina so the visual field as it were you can either think of it as the set of cells in the layer closer to the periphery or you can think of it as mapped all the way back onto the original visual field at the periphery of the body and as you move further and further along the visual pathways the cells are capturing larger and larger portions of the overall visual field that is their computing features of the images that constitute a much larger visual field the cells especially in the lateral geniculate many of them are referred to as simple cells that are anything but simple and those cells are said to be simple because they it can often be likened to computing something like a difference of gaussians or essentially the the regions of the visual field so each one of the square boxes on the Left corresponds to an image of the visual field and then the the ISO lines essentially are meant to be the third dimension indicating the amount of intensity or the response of the cells in that portion of the eye and so that basically the four simple cells the receptive field maps into portions that are always in one performing one type of computation either doing a difference or enhancing the data that they have or or just the opposite reducing the intensity of the database ii complex cells are different for a number of reasons but the most fundamental one is that whereas in a simple cell the type of phenomena that responds to is typically centered it has one orientation has one position in the visual field in complex cells they respond to that same sort of phenomenon no matter where it appears in the visual field so for example and it in simple cells that sort of call center surround cells the one that we shot in the top on the previous slide essentially what they do is they're looking for a high intensity dot surrounded by low intensity cells you can a generalization of that is is that you can respond to a bar of light or a bar of dark surrounded by sheets of white on either side so in a simple sell the bar may be at a particular orientation but it's always centered in the visual field and a complex sell the bar can be positioned anywhere in the visual field so obviously complex cells are able to handle a much more complicated kind of inference and in particular they are able to essentially respond invariant to the position of the type of phenomena that you're looking for the position in the visual field and there are a lot of them in fact most of the cells beyond the strike cortex are our of this sort so now we have the data moving back again back to the picture that we started off with the information that is computed in the visual field in visual area 1 is essentially orientation information so it's determining the orientation of little bars in the image field or or half planes in the image field and you get both information from from the left visual field in the right visual field and in this case I'm asking the question given that there are cells that respond to these things and some are coming from the left and the right how might they be organized on the surface of area v1 you can imagine a checkerboard you could imagine slices as shown in the middle or you could imagine some other topology in fact given that it's called a strike cortex you might guess that it's the one in the middle and indeed that's exactly how the strike cortex is organized with bands of corresponding to the left visual field and bands corresponding to the right visual field interspersed and then within each one of those bands they're cut up in terms of cells that are tuned to specific orientations so the box in the bottom essentially is a what's called a hyper column it's an anatomically distinct portion of the brain and what is actually doing apart from the left and right portions it's computing its output indicates one of those cells is basically the light up and say that there's a bar in that particular portion of the visual field and its orientation is one of those increments from 0 to 180 degrees in 10-degree variations so that's a unit in some sense it's anatomically distinct and you can also think that that it's obviously computationally distinct that's essentially the level that we're going to be using as a random variable in the models that we're looking at and it's well enough studying well enough understood that even though it's a noisy a characterization of the the angle of bars that are positioned on the visual cortex it's a pretty good attraction to use so that's the cartoon oh and this actually is a a the black visual field up in the upper left-hand corner corresponds to an image that was shown to a monkey that was otherwise paralyzed and I was only allowed to see out of one side of its visual field and they then took the poor monkey injected a dye in it and then basically killed the monkey and and took a look at the resulting portions of the strike cortex and you can see the pattern distinctly on imposed on the striate cortex even seeing the little interruptions and the lines that constitute the the visual pattern so back to the model of lien mumford and what we want to take is take this model assume that that the X is the X the outside observation the X is corresponding to visual area one visual area 2 etc correspond to random variables the ones lower on we have a fairly good idea of what they might correspond to in particular they allow you to detect lines at somewhat higher levels they compose lines to get sort of longer lines and if for example you were looking at drawings or at cursive text the the higher levels would would compose those lines into more complicated lines or lines with angles and the more curses or portions of a written text the the model that was discussed in the lien mumford paper was relatively abstract and it was primarily used as a basis for for explaining various data from performing operations on on humans when they're in being operated on and there is a medical reason in order to probe but also of course in in monkeys which are the primary method for still for for looking at cortical behavior so the the basic idea that that in some sense the cortex is a a generative model that it can be described as a graphical model or a Markov random field is is fine but if you're talking about a graphical model that is on the order of the size of the cortex you're talking about millions and millions of variables and a very large number of connections so what kind of learning algorithms would suffice for that kind of a model there's been a lot of of work on artificial neural networks in the past and so many of the ideas that you're going to see in the following have their roots in earlier work but in some sense they've they've found a more clear instantiation in the form of graphical models for the reasons that I mentioned before that the graphical models have the gold standard of having a clear and concise semantics in terms of a joint distribution over all the random variables so I'm going to give you a concrete example and the example is we're going to build a a generative model to recognize digits and so in this case the digits come from a database it's comprised of some 500 people with each of them writing a few hundred different instances of their digits 0 through 9 and the the images were compiled by the National Institutes of Standards and Technology for the purpose of having a competition so that we could so that they could find a vendor or a company to provide the the pattern recognition software that the post office uses to recognize zip codes so these are essentially their simplified in the following sense that the digits are centered in 28 x 28 images they all have 8 bit depth and they are scanned in and sort of the standard standard way oh and they're also light and dark adjusted so that basically the luminance is the same overall the brain does an extraordinary job in dealing with variation in luminance if you can just imagine the the problem of being inside your kitchen where it's relatively dark reading a newspaper walking outside and continuing the whole time to follow the the story that you're reading that's an extraordinary feat with several orders of magnitude change in the overall luminance so we're finessing some problems that are obviously hard but there's still plenty of problems left to be had so what we're going to do is we're going to build the same kind of generative model that we saw for our cartoon of recognizing buildings distinguishing garages from houses and the model is at the bottom essentially going to be have enough random variables to to cover the image so the image is 28 x 28 that's 784 so at the bottom level of this cortical model we have 784 random variables as i mentioned in in the cartoon the idea of generative models is that our compositional so the features at one level are composed into sort of meta features at the next level and at the very least unless you learn the structure which we're not going to do right now you have to define how the features at one level map to the features of the next level and so what we're going to do is we're going to divide up the 28 x 28 images into four by four regions that constitutes seven by seven four by four regions and I did that relatively arbitrarily the first time and the model that results is the one that you're seeing here we also can define whether or not the receptive fields at one level overlap with one another or not and in this case we're going to indicate that at the lowest level that the receptive fields don't overlap that they are that they they cover they basically partition the image into these 49 regions but at the next level they do share information and the reason that we want them to share information at various levels is that we think that their correlations between portions of the image at one level or portions of the image in one region and the adjoining regions and as you go higher and higher those kinds of dependencies between adjoining visual fields allow you to encompass a concept that that encompasses the whole image in this case the designation of a digit so the software that we've developed allows us to to specify what's generally called a pyramid graph Bayes net for obvious reasons you specify the number of levels the way the receptive fields at one level map onto the receptive fields at the next level they're not always purely pyramidal they can be truncated pyramid all pyramids essentially because it's not the case that you always want a single feature at the top in this case we do the top level feature is going to correspond to the classification of whether we're looking at a zero or nine or five or a four or three so the one on the left you can see it has no dependents it has the receptive fields at the bottom level overlap one and but it has no intro level connections whereas the one on the right is exactly the same as the one on the left except it has entry level connections and there's a lot of subtleties about the use of those interlevel connections and the degree to which they help in the recognition process now typically when you learn a graphical model you use an algorithm called expectation maximization in which you essentially start with a sign Minh to all of the parameters of the model remember when I showed you that cartoon initially I told you that I specified the parameters in this case we don't know that we were assuming we don't know anything about the concept and so we want our the the corresponding cortical model to learn about the features of the of the phenomena that's looking at namely the writing of digits and we want to compose those automatically so we have to learn those parameters and an e/m essentially you said all the parameters initially random and you perform a form of gradient ascent or descent in order to to adjust those parameters to get them closer to a local Maxima rather than a global Maxima and often that's that's quite good enough but when we're talking about models that have millions of variables that turns out to be very difficult indeed so what we're going to do instead is we're going to learn the features from the bottom up and in fact there's a lot of developmental studies that indicate that that's exactly what what humans and animals in general do they learn those features and those features go through basically follow you for the rest of your life there are great examples great examples of extraordinary cruelty in which they've taken cats and they put them in rooms in which there's only horizontal lines and they've raised them for three or four weeks and then they've exposed them to environments in which there are things other than horizontal lines they cannot see the vertical lines so and they cannot learn them either so there are certain levels of our visual processing that that once you have one opportunity essentially to learn them so you've heard lots of things about the plasticity of the brain but there's also some areas in which it has great difficulty sort of going back and filling in the details after they the opportunity has been lost or for one reason or another you've had some sort of a lesion and you've lost that ability so at the bottom level remember I said that these are 2828 images they have eight bit depth as far as I'm concerned 256 values that's a continuum so I'm going to model them as though their continuous variables and I do something called a mixture of gaussians which you really don't have to understand how this is meant as an audience for as a general audience but I'll give you sort of a graphical depiction of what we're learning in this case so remember at the bottom level each of the nodes takes in a 4 by 4 or 16 different pixels each one of them being for bit depth and here's an example of the kind of features that the mixture of gaussians learns so up in the upper left hand corner you see the 16 pixels and this is after learning after we basically shown the the cortical the lower left lowest large portion of the cortical models will model thousands and thousands of digits and what I've done here is in order to depict sort of what's going on in the learning model is for each eye I've set pretty much arbitrarily I've said that the random variable can take on 12 values the random variable at that level the level below it there are again a vector of 16 real values but we know they're 0 to 255 so what I then did was in the same way that i mentioned that i could take a generative model i could fix the value at the top and then i could sample it i've done the exact same thing at the bottom level as though i could probe into the cortex and get some idea of the readings that follow and so what I did was I sampled it a hundred times I probed it a hundred times and got a hundred samples having fixed it at one of its levels and what you see on the left is the summary of those hundred examples where I've stretched out the 16 pixels and from left to right and then for each of the possible values that the 12 values that it can take on I've shown two little graphs one is the one where I've stretched it out and shown all hundred values and the little box next to it what I've done for that I've taken the average of all hundred samples and then I've just depicted it as the the little piece of the visual field with the third dimension coming out into the screen corresponding to the average of the intensity for all those values so what are those look like what's it learning it appears to be learning to recognize bars in various orientations its distinguishing between those so the one in the upper left-hand corner that's a vertical bar same for the one below it let's see if you go to the to the one such as this you see it's a 45 degree angle and you're getting some idea of what the system is able to pick up by looking at these various images so the way that learning proceeds is it starts at the bottom of this hierarchy and just as I showed for the cartoon of the outset in order to quantify the entire model you have to have the conditional probability for every variable given its parents so essentially you have to learn that so the way that the algorithm works is having learned the bottom level it then uses that in order to generate samples for the level above it and those samples that are used in order to to set the variables and then perform a.m. and a little local circumscribed area to learn the parameters just for that portion of the of the of the image so at the very lowest level we're learning to distinguish various horizontal vertical etc bars at the next level it composes those in terms of lines at the level above that they correspond to little portions of curses in in the written text the models themselves I said for the lowest level they correspond to mixtures of gaussians the ones above that there a variety of things I've tried the simplest and sort of most straightforward is just to represent them as a table essentially a you can think of it as a truth function that has a lot of parameters but in fact the number of parameter doesn't seem to cause any problem having to do with overfitting for those of you who know anything about machine learning so the the what you would see is if you sort of watched this over time is that information would come in from the bottom it would initially be fed into the nodes at the lowest level it would learn the conditional probabilities for those lowest nodes and then once those parameters would learn it would then the next time that data comes in it would be sampled and propagated up further in the network and it would work its way up through the network in that fashion so you know if this works so here you see that the the date is coming at the bottom level the ones above it haven't been trained it's looking at images the images get fixed at the bottom and it's learning this in the completely unsupervised manner then later on what I do is I I use it in a supervised manner at the very but by fixing the note at the very top corresponding to the digit 0 through 9 for every image that it gets during the training process and then later in testing basically obviously I don't fix the top node I simply use that as the most likely output from the top node as the indicator of the guests for the model so that's the way the basic learning algorithm works however if you if you did what I said and you tried to learn it and node by knowed you'd end up with a a model that was not able to generalize or to move basically to capture features that spanned reasonable portions of the image so in order to to to facilitate doing exactly that and also to create a level of granularity for parallelism that was suited to the kinds of cluster machines that we have most available what I did was the following for every node in a given level I took that nodes parents and the level above and the set of children for for all of its parents and I took that as a unit which I referred to as a subnet I constructed the subnets for every node and then I composed them in cases where one subnet could be was completely contained in the other using us an algorithm called maximal says that results for the graph shown here in seven subnet and the subnets themselves constitute a graph and all of the learning and inference is performed on the graph as it's shown here this results in a much more robust set of features that are learned by the overall cortical models and it provides the basis for both learning and inference where that the information that's passed back and forth between subnets corresponds to marginal distributions that are computed by each of the subnets locally just to sort of pop up a level and to describe the way that the subnets are deep are composed into larger structures take this as a simple graph show the red graph in the lower left-hand corner that would correspond to one subnet and the cartoons and the following the little triangles correspond to subnets and the circles two nodes each of the subnets then is a unit that completes consists of anywhere from twenty to a hundred nodes and remember there's these are also in three dimensions and each of them is a process running on a machine where the output of that process corresponds to samples that it generates to send upwards into the hierarchy to use for training the subnets higher in the hierarchy or passing down priors or distributions in order to perform inference there are three different algorithms that we've been looking at and one new one that I've been working on just from reading a few of the papers here at Google on the MapReduce a framework that you all have so the easiest one is pointer chasing and that's sort of the obvious kind of graph algorithms that you might expect we're also using something called MPI are the message message passing interface framework that's probably the most the predominant method for doing parallel computation on clusters today and we have another new algorithm that uses a framework called publish and subscribe thus these if you think of the pyramid structures and and the columnar structure that began with sort of interposed over it you can see it's fairly obvious how to to partition the nodes or the subnets so as to allow for parallelism at in a given level all the nodes are all the subnets can compute simultaneously and the processing in the next level essentially can wait until all of the inference at that level has been completed actually the inference moves up and down this hierarchy but again for the most part things can be done in that fashion so again back to the same sort of toy problem we've been looking at all along you start off with the structure of the pyramid graph from that and this is a slice through the middle of the pyramid graph from that you construct the structure of the subnets and then you can embed the subnets in a plane in order to to allocate the processes in such a way to minimize communication I'm not going to go through the publish and subscribe I'll simply show this very briefly that same structure of basically a bulk parallel computation where you divide up the overall computation in a level and distribute it among a bunch of processes that fits exactly into the way that MapReduce works even though we allow for a slightly higher degree of a synchrony in the case of the MPI algorithms the MapReduce one is is essentially the same algorithm with a lot of mystery of course a good mystery from the programmers perspective about what goes on behind the scenes with all the inter process communication so the the mat portion is allowing each of the subnets to perform their computation which essentially is to run an inference algorithm and the reduce essentially combines the the outputs of those and then maps them into the subnet at the next level and so inference can proceed through a number of different MapReduce operations moving up the hierarchy down the hierarchy and then up the hierarchy one more time the one of the keys and one of the reasons why I think that Google is probably going to be interested ultimately in getting systems that can behave like the cortex and their ability to pattern recognition is the ability to handle spatio-temporal features and I'm not going to go into the details this is a whole talk in and of itself but but cortical models are that's what they are they're machines for doing pattern recognition on time series they deal with sequences and and streams of data that's what they're designed for they extract the where they exploit the opportunities of the contiguity of the data instance in time series in order to perform most of their inferences it's that continuity that essentially provides them with most of their leverage so some of the algorithms that we've developed handle spatio-temporal receptive fields and the model that we're working on is very closely aligned with a model that you're on singer helped to develop back in the late 90s called hierarchical hidden Markov models and in another talks I will go into that in more detail hopefully when you're on his back from from his being away for a couple of days so that's it for the talk I'd be glad to answer some questions so clustering into some minutes you said it's better is that from the theoretical perspective from curricle what that's purely empirical there's no reason why you couldn't do that in some sense if the brain realizes that sorry for the anthropomorphizing that the integral a large you know a large tree there's there may be a great distance between two points on a given level and so the fastest way to get from one point to another is to go up the hierarchy which is relatively shallow and then back down the hierarchy and unless you have some way at the local level of agglomerating portions of the data then you can't get the kind of of connectivity that you need in a tree I mean you have to have some branching factor so with no branching factor at all which is what you'd do if you just use a simple node and its parents it's branching up but it's not branching back down and you need branch in both ways the subnet structure where you take its parents and all their children again satisfies that requirement and larger ones are better of course but the larger they get the more computationally complex that get as well okay thank you Oh

No Comments

Leave a Comment

Your email address will not be published. Required fields are marked *