CIM Workshop on Machine Learning, October 89, 2015
Workshop theme
The Centre for Interdisciplinary Mathematics (CIM) connects mathematics with other sciences and application domains. In this workshop we will explore how mathematical approaches to data analysis through machine learning can help make sense and use of the vast resovoirs of data that is being generated. The topics span many different areas of machine learning and its applications, from bioinformatics and language processing to industrial system and cyber security. Presenters include researchers from the fields of statistics, bioinformatics, computer science, and linguistics.
Invited speakers and talk titles

Florence d'AlcheBuc, Télécom ParisTech, University ParisSaclay, France, Learning vector autoregressive models with operatorvalued kernels with application to biological network inference

Jukka Corander, Department of Mathematics and Statistics, University of Helsinki, Finland, Learning Markov networks with marginal pseudolikelihood

Petros Dellaportas, Department of Statistical Science, University College London, UK, Scalable inference for a full multivariate stochastic volatility model

Hoài An Lê Thi, Theoretical and Applied Computer Science Laboratory, University of Lorraine, France, DC programming and DCA in Machine Learning and Data Mining

Hiroshi Mamitsuka, Bioinformatics Center, Institute for Chemical Research, Kyoto University, Japan, Collaborative matrix factorization for predicting drugtarget interactions

Shakir Mohamed, Deepmind, London, UK, Bayesian Reasoning and Deep Learning

Alan Said, Recorded Future, Gothenburg, Sweden, Multilingual Document Classification Using LDA

Anders Søgaard, Center for Language Technology, University of Copenhagen, Denmark, Occam’s Chainsaw: What not to learn when learning models of language
Poster session
Workshop attendants are invited to submit a poster on a topic related to machine learning. They will be on display all Thursday and there will be a poster session in the program, where these poster will be the focus of activity.
1.  Ali Basirat  Greedy TransitionBased Dependency Parsing with Discrete and Continuous Supertag Features 
2.  Joakim Nivre  Training Deterministic Dependency Parsers with Nondeterministic Oracles 
3.  Mats Dahllöf  Clustering Writing Components 
4.  C. A. Naesseth, F. Lindsten and T. B. Schön  Highdimensional Inference using Nested Particle Filters 
5.  Andreas Svensson, Arno Solin, Simo Särkkä, Thomas B. Schön  Computationally Efficient Bayesian Learning of Gaussian Process State Space Models 
6.  Mike Ashcroft  Efficient discovery of high value regions of the feature space using Delaunay Tessellation Field Estimation 
7.  Jimmy Callin  PartofSpeech Driven CrossLingual Pronoun Prediction with FeedForward Neural Networks 
8.  Mojgan Seraji  Language Technology Resources and Tools for Persian 
9.  Joef Höök, Elisabeth Larsson, Erik Lindström, Lina von Sydow  Filtering and Parameter Estimation of Partially Observed Diffusion Processes Using Gaussian RBFs 
Location
All talks will be given in room P2446, Polacksbacken Building 2, 4th floor, northwest corner. Lunches take place at Eklundshof (very close to Polacksbacken). The dinner will be in room P4308, Polacksbacken Building 4, 3rd floor. This map shows where the different buildings are located relative to each other.
Preliminary program
For abstracts of the talks, scroll down to the end of the page.
10:15    10:40  Registration & Welcome  2446 
10:40    11:20  S1: Jukka Corander  2446 
11:20    12:00  S2: Hoài An Lê Thi  2446 
12:00    13:30  Lunch  Eklundshof 
13:30    14:10  S3: Poster Session  2446 
14:10    14:50  S4: Florence d'AlcheBuc  2446 
14:50    15:20  Coffee Break  2446 
15:20    16:00  S5: Hiroshi Mamitsuka  2446 
16:00    16:40  S6: Anders Søgaard  2446 
18:00    20:00  Dinner  4308 
9:00    9:40  S7: Shakir Mohamed  2446 
9:40    10:20  S8: Petros Dellaportas  2446 
10:20    10:50  Coffee break  2446 
10:50    11:30  S9: Panel Discussion  2446 
11:30    12:10  S10: Alan Said  2446 
12:10    13:40  Lunch and closing  Eklundshof 
Registration
The registration is open from September 1st to October 1st. Go to the registration form.
Even if you have not registered you are welcome to attend any of the talks. Lunches and dinner are however reserved for registrated participants. To find out if late registration is possible, please contact the local organizers.
Sponsors
Scientific committee
Elisabeth Larsson, CIM, Uppsala University
Michael Ashcroft, Computing Science, Dept. of Information Technology, Uppsala University
Christian Hardmeier, Dept. of Linguistics and Philology, Uppsala University
Josef Höök, Scientific Computing, Dept. of Information Technology, Uppsala University
Cris Luengo, Centre for Image Analysis, Dept. of Information Technology, Uppsala University
Thomas Schön, Systems and Control, Dept. of Information Technology, Uppsala University
Hongli Zeng, Applied Mathematics, Dept. of Mathematics, Uppsala University
Silvelyn Zwanzig, Mathematical Statistics, Dept. of Mathematics, Uppsala University
Local organizing committee
Elisabeth Larsson, CIM, Uppsala University
Michael Ashcroft, CIM, Uppsala University
Jing Liu, CIM, Uppsala University
Talk Abstracts
Session 1
Speaker: Jukka Corander
Title: Learning Markov networks with marginal pseudolikelihood
Abstract: Markov networks, a.k.a. undirected graphical models are popular for a wide spectrum of application areas such as image analysis, spatial statistics and statistical mechanics. Without restricting assumption concerning triangulation Markov network models are difficult to fit to data due to intractability of their partition functions which renders the joint distribution unnormalized. Bayesian learning of the neighborhood structure is known as triply intractable problem and use of auxiliary variable MCMC techniques is not computationally feasible beyond small networks. LASSO based regression has recently been popularized as a technique for learning Markov networks in high dimensions. However, use of LASSO requires choice of a penalty parameter which is computationally demanding and may yield suboptimal results since a single penalty value is used for the whole network. We introduce an approximate inference technique based on combining pseudolikelihood and local reference priors which automatically regularize the learning problem. Our estimator is consistent and shown to perform favorably against all popular alternatives both for discretevalued and Gaussian Markov networks over a wide range of dimensionality.
Session 2
Speaker: Hoài An Lê Thi
Title: DC programming and DCA in Machine Learning and Data Mining
Abstract: One of the challenges for the scientists at the present time consists of the optimal exploitation of a huge quantity of data of the information stored in various forms. The knowledge extraction from these data requires the use of sophisticated techniques and high performance algorithms based on solid theoretical foundations and statistics. Based on the powerful arsenal of convex analysis, DC (Difference on Convex functions) programming and DCA (DC Algorithms) http://www.lita.univlorraine.fr/~lethi/index.php/dca.html are among the few nonconvex optimization approaches that can meet this requirement. Machine Learning and Data Mining (MLDM) represent a mine of optimization problems that are almost all DC programs for which appropriate resolutions should resort to DC programming and DCA. During the last two decades DC programming and DCA have been successfully applied for modeling and solving a lot of nonconvex programs in various areas of MLDM. This talk presents recent developments on DC programming and DCA in MLDM. After a brief introduction to DC programming and DCA we give a review and analysis on the existing methods based on DC programming and DCA in MLDM. We also show that standard algorithms in this domain are special cases of DCA. Finally, we discuss about recent advances and ongoing developments in DC Programming and DCA to challenging topics in MLDM including Learning with sparsity and uncertainty, Online learning, Big data, etc.
Session 3
Speaker: Florence d'AlcheBuc
Title: Learning vector autoregressive models with operatorvalued kernels with application to biological network inference.
Abstract: Learning vector autoregressive models with operatorvalued kernels with application to biological network inference Reverseengineering of highdimensional dynamical systems from timecourse data still remains a challenge in knowledge discovery and especially in systems biology. For this learning task, a number of approaches primarily based on sparse linear models or Granger causality conceptshave been proposed in the literature. However, when system exhibits nonlinear dynamics, there does not exist asystematic approach that takes into account the nature of the underlying system. In this work, we introduce a novel family of vector autorégressive models based on different operatorvalued kernels to identify the dynamical system and retrieve the target network that characterizes the interactions of its components. Assuming a sparse underlying structure, a key challenge, also present in the linear case, is to control the model's sparsity. This is achieved through the joint learning of the structure of the kernel and the basis vectors. To solve this learning task, we propose an alternating optimizatio n algorithm based on proximal gradient procedures that learns both the structure of the kernel and the basis vectors. Experimental results on gene regulatory network inference and climate data confirm the ability of the learning scheme to retrieve dependencies between statevariables.
Session 5
Speaker: Hiroshi Mamitsuka
Title: Collaborative matrix factorization for predicting drugtarget interactions
Abstract: Computationally predicting drugtarget interactions is useful to discover potential new drugs. Currently, promising machine learning approaches for this issue use not only known drugtarget interactions but also drug and target similarities. This idea can be wellaccepted pharmacologically, since the two types of similarities correspond to two recently advocated concepts, socalled, the chemical space and the genomic space. In this talk, I will first briefly review current similaritybased machine learning methods for predicting drugtarget interactions and then present our recent method, being based on a factor model, named Multiple Similarities Collaborative Matrix Factorization (MSCMF). MSCMF projects drugs and targets into a common lowrank feature space (matrix), which is estimated to be consistent with similarities over drugs and those over targets by alternating least squares. We note that our setting is general binary relations with similarities over instances, which can be found in many applications, such as recommender systems. In fact, MSCMF is an extension/generalization of weighted lowrank approximation for oneclass collaborative filtering.
Session 6
Speaker: Anders Søgaard
Title: Occam’s Chainsaw: What not to learn when learning models of language
Abstract: How do you sample randomly from language? Language is constantly subject to shift, but even if we ignore linguistic change, the accumulated linguistic experience of any of us undergoes entrenchment and abstraction conditional on our interactions with another people. In other words, we cannot escape selection biases. On top of this, linguistics is not an exact science, and we cannot escape labelling biases either. Consequently, machine learning for natural language processing (NLP) is moving toward evaluations with dozens of datasets, trying to optimize average performance across domains and language varieties. This in turn has consequences for the models we end up with, which are typically more heavily regularized than models for other applications. I present examples of recent work on regularization techniques tailored for NLP.
Session 7
Speaker: Shakir Mohamed
Title: Bayesian Reasoning and Deep Learning
Abstract: Deep learning and Bayesian machine learning are currently two of the most active areas of machine learning research. Deep learning provides a powerful class of models and an easy framework for learning that now provides stateoftheart methods for applications ranging from image classification to speech recognition. Bayesian reasoning provides a powerful approach for information integration, inference and decision making that has established it as the key tool for dataefficient learning, uncertainty quantification and robust model composition that is widely used in applications ranging from information retrieval to largescale ranking. Each of these research areas has shortcomings that can be effectively addressed by the other, pointing towards a needed convergence of these two areas of machine learning; the complementary aspects of these two research areas is the focus of this talk. Using the tools of autoencoders and latent variable models, we shall discuss some of the ways in which our machine learning practice is enhanced by combining deep learning with Bayesian reasoning. This is an essential, and ongoing, convergence that will only continue to accelerate and provides some of the most exciting prospects, some of which we shall discuss, for contemporary machine learning research.
Session 8
Speaker: Petros Dellaportas
Title: Scalable inference for a full multivariate stochastic volatility model
Abstract: We introduce a multivariate stochastic volatility model for asset returns that imposes no restrictions to the structure of the volatility matrix and treats all its elements as functions of latent stochastic processes. When the number of assets is prohibitively large, we propose a factor multivariate stochastic volatility model in which the variances and correlations of the factors evolve stochastically over time. Inference is achieved via a carefully designed feasible and scalable Markov chain Monte Carlo algorithm that combines two computationally important ingredients: it utilizes invariant to the prior Metropolis proposal densities for simultaneously updating all latent paths and has quadratic, rather than cubic, computational complexity when evaluating the multivariate normal densities required. We apply our modelling and computational methodology to 571 stocks of Euro STOXX index for data over a period of 10 years.
Session 10
Speaker: Alan Said
Title: Multilingual Document Classification Using LDA
Abstract: TBA