This thesis deals with the optimal use of existing models that predict certain phenomena of the road traffic system. Such models are extensively used in Advanced Traffic Information Systems (ATIS), Dynamic Traffic Management (DTM) or Model Predictive Control (MPC) approaches in order to improve the traffic system. As road traffic is the result of human behavior which is ever changing and which varies internationally, for each of these phenomena a multitude of models exist. The scientific literature generally is not conclusive about which of these models should be preferred. One common problem in road traffic science is therefore that for each application a choice has to be made from a set of available models. A second task that always needs to be performed is the calibration of the parameters of the models. A third and last task is the application of the chosen and calibrated model(s) to predict a part of the traffic system. For each of these three steps, generally data (measurements of the traffic system) is required. In this thesis, all three uses of data are summarized into data assimilation, which is defined as “the use of techniques aimed at the treatment of data in coherence with models in order to construct an as accurate and consistent picture of reality as possible. It comprises the use of data for model validation and identification (choosing between models), model calibration and estimation and prediction and specifically deals with the interactions between all these tasks”. In this thesis, a Bayesian framework is used in which these interactions can be treated consistently: solving one of these steps automatically leads to the solution of the other steps. Throughout the thesis, the calibration task is always performed first using standard optimization techniques such as regression or gradient-based algorithms. Once all available models are calibrated, a choice can be made between them. The selected model(s) can then be used to make an as accurate prediction as possible. One very important feature of the Bayesian framework is that it takes the complexity of models into account in the model comparison step. More complex models generally show a lower calibration error than more simple models, but they do not necessarily make better predictions. This is known as the problem of overfitting. The Bayesian framework deals with overfitting by penalizing models which contain more parameters and are thus more complex. The Bayesian assessment of models produces a measure called the evidence, which balances between a goodness of fit to the calibration data set and the complexity of the model. Besides this, the framework has more benefits. First, prior information can easily be included in each step of data assimilation. Second, error bars can be constructed on the predictions. This may be beneficial to the performance or public acceptance of ATIS, DTM or MPC systems. Third, a committee can be constructed, in which predictions of multiple models are combined. Committees generally produce more accurate predictions than individual models. The Bayesian framework for data assimilation is applied to three different phenomena: (1) car-following modeling, (2) travel time prediction and (3) traffic state estimation using a first order traffic flowmodel (the LWR model) and an Extended Kalman Filter. Finally, a part of the research is devoted to speeding up the EKF such that it can be applied together with the LWR model in real time to large networks. Car-following behavior Recent research has revealed that there exists large heterogeneity in car-following behavior such that different car-following models best describe different drivers behavior. The choice of a car-following model thus has to be made for each individual driver. Current approaches to calibrate and compare different models for one driver do not take the complexity of the model into account or are only able to compare a specific set of models. Using the Bayesian framework for data assimilation the suitability of any set of models can be quantitatively assessed for each single driver. In this research the Bayesian framework for data assimilation is applied to two simple car-following models, the CHM model and the Helly model. The workings of the Bayesian framework are demonstrated in a real-world experiment using 229 trajectories of drivers who were in car-following mode. Aggregated over all drivers, the probabilities of each model relative to the probability of all used models can be computed. This can serve as input to a heterogeneous microscopic simulation of traffic. The outcomes of this experiment show that averaged over all drivers the CHM model has a probability of 31% and the Helly model of 69%. Travel time prediction In this research different types of models are applied to the problem of travel time prediction: linear regression models and neural networks. Three experiments are performed on an 8.5 km long stretch of the A12 motorway in the Netherlands. Travel time data was collected during a period of three months in early 2007. In every experiment the Bayesian framework is applied to calibrate a set of available models, to make choices between models and to make predictions of the travel times. In all experiments a committee is used. In the first experiment two linear regression models are used. In this experiment the framework is applied dynamically: each time step, the available measured travel times and a set of historic loop detector data are used to recalibrate the models using standard regression tools. After this regression (calibration) is finished, the evidence measure assigns a preference for one of the two models over the other. Two strategies are tested: (1) the prediction of the model with the highest evide nce is used and (2) the weighted average of the predictions of both models is used, where the evidence is used as a weight factor. The results show that both models perform similarly well, and that the committees show a slight improvement of accuracy. A clear difference between the two strategies was not found. In the second experiment feed forward neural networks are used, with one hidden layer with different numbers of hidden nodes. The Bayesian framework is used to train (calibrate) 84 different neural networks, and the evidence measure is used to select highpotential networks. Using a separate validation data set, the evidence is tested as a predictor of the true prediction error. It is found that there is a correlation between the two, but that the evidence is not a perfect predictor of a well-performing neural network due to several reasons: (1) the size of the data sets may be too small so that the validation error does not equal the true error, (2) the models that are used may require improvement, such as weight pruning and (3) several assumptions were made in order to solve the necessary equations, such as the assumption that all distributions are Gaussian. In the same experiment a committee was tested using a simple average of the outcomes of a selection of models, ranked on the evidence. It was found that the average prediction error decreased from 8.1% of the best individual neural network to 7.8% for the committee. Finally, in the experiment the construction of error bars was tested, and it was shown that 97.4% of the true travel time fell within the 95% prediction intervals. The discrepancy between the two can be attributed to the relative simplicity of the used neural networks. In the third and final experiment feed forward neural networks (FFNN) as well as state-space neural networks (SSNN, a specific type of a recurrent or Elman neural network) were applied. The SSNN generally contains more parameters than the FFNN, but potentially are more accurate because they can take time dependencies into account: a typical problem of the necessity of balancing complexity against the ability to fit to a data set. For the Bayesian framework to be applied, the Jacobian and Hessian of the SSNN were derived (see Appendix A). Then, the Bayesian framework could again be used to compute the evidence for each model. In the experiments 70 FFNN and 70 SSNN were trained. The evidence was then used to form a committee of neural networks to predict the travel time on the selected motorway. The results show that the FFNN perform better on a short prediction horizon (5 minutes ahead), while the SSNN perform better on a longer horizon (15 minutes). The results also show that the use of a committee improves the accuracy of the predictions. In this experiment the calibration error was found to be a better predictor of the true error than the evidence. Nevertheless, the experiments show nearly no difference in performance of committees ranked on the evidence or ranked on the calibration error. The first order model with an Extended Kalman Filter In this research, two studies are performed on the application of a first order model (the LWR model) in combination with an Extended Kalman Filter (EKF) to create a networkwide estimate of the traffic state. The first study deals with the fact that the EKF itself contains parameters that require calibration. Using the Bayesian framework that has also been applied to calibrate car-following models and travel time prediction models, a method to calibrate the parameters of the EKF is derived. Using this result, the EKF parameters can be dynamically adapted during simulation. In an experiment on a small network it is then shown that the dynamic Bayesian choice for parameters leads to nearly the same accuracy compared to the optimal choice of fixed parameter values. This result is especially useful in large-scale applications, where it is impossible to test all possible fixed parameter values of the EKF. Finally, the last study overcomes a large disadvantage of the EKF: it is too slow to perform in real-time on large networks. To overcome this problem the novel Localized EKF (L-EKF) is proposed. The logic of the traffic network is used to correct only the state in the vicinity of a detector. The L-EKF does not use all information available to correct the state of the network; the resulting accuracy is however equal in case the radius of the local filters is taken sufficiently large. In two experiments, one on synthetic data and one on real-world data, it is shown that the L-EKF is much faster than the traditional Global EKF (G-EKF), that it scales much better with the network size and that it leads to estimates with the same accuracy as the G-EKF, even if the spacing between detectors is up to 5 kilometers. Opposed to the G-EKF, the L-EKF is hence a highly scalable solution to the state estimation problem.