Department of Mathematics

CIM Workshop on Machine Learning, October 8-9, 2015

Workshop theme

The Centre for Interdisciplinary Mathematics (CIM) connects mathematics with other sciences and application domains. In this workshop we will explore how mathematical approaches to data analysis through machine learning can help make sense and use of the vast resovoirs of data that is being generated. The topics span many different areas of machine learning and its applications, from bioinformatics and language processing to industrial system and cyber security. Presenters include researchers from the fields of statistics, bioinformatics, computer science, and linguistics.

Left: Identification of tile position from high-dimensional image data. Right: A Gaussian process model of the ambient magnetic field for a table. Pictures courtesy of Thomas Schön.

Invited speakers and talk titles

  • Florence d'Alche-Buc, Télécom ParisTech, University Paris-Saclay, France, Learning vector autoregressive models with operator-valued kernels with application to biological network inference

  • Jukka Corander, Department of Mathematics and Statistics, University of Helsinki, Finland, Learning Markov networks with marginal pseudolikelihood

  • Petros Dellaportas, Department of Statistical Science, University College London, UK, Scalable inference for a full multivariate stochastic volatility model

  • Hoài An Lê Thi, Theoretical and Applied Computer Science Laboratory, University of Lorraine, France, DC programming and DCA in Machine Learning and Data Mining

  • Hiroshi Mamitsuka, Bioinformatics Center, Institute for Chemical Research, Kyoto University, Japan, Collaborative matrix factorization for predicting drug-target interactions

  • Shakir Mohamed, Deepmind, London, UK, Bayesian Reasoning and Deep Learning

  • Alan Said, Recorded Future, Gothenburg, Sweden, Multilingual Document Classification Using LDA

  • Anders Søgaard, Center for Language Technology, University of Copenhagen, Denmark, Occam’s Chainsaw: What not to learn when learning models of language

Poster session

Workshop attendants are invited to submit a poster on a topic related to machine learning. They will be on display all Thursday and there will be a poster session in the program, where these poster will be the focus of activity.

Authors and Poster Titles
1. Ali Basirat Greedy Transition-Based Dependency Parsing with Discrete and Continuous Supertag Features
2. Joakim Nivre Training Deterministic Dependency Parsers with Nondeterministic Oracles
3. Mats Dahllöf Clustering Writing Components
4. C. A. Naesseth, F. Lindsten and T. B. Schön High-dimensional Inference using Nested Particle Filters
5. Andreas Svensson, Arno Solin, Simo Särkkä, Thomas B. Schön Computationally Efficient Bayesian Learning of Gaussian Process State Space Models
6. Mike Ashcroft Efficient discovery of high value regions of the feature space using Delaunay Tessellation Field Estimation
7. Jimmy Callin Part-of-Speech Driven Cross-Lingual Pronoun Prediction with Feed-Forward Neural Networks
8. Mojgan Seraji Language Technology Resources and Tools for Persian
9. Joef Höök, Elisabeth Larsson, Erik Lindström, Lina von Sydow Filtering and Parameter Estimation of Partially Observed Diffusion Processes Using Gaussian RBFs

Location

All talks will be given in room P2446, Polacksbacken Building 2, 4th floor, northwest corner. Lunches take place at Eklundshof (very close to Polacksbacken). The dinner will be in room P4308, Polacksbacken Building 4, 3rd floor. This map shows where the different buildings are located relative to each other.

Preliminary program

For abstracts of the talks, scroll down to the end of the page.

THURSDAY OCTOBER 8
10:15 - 10:40 Registration & Welcome 2446
10:40 - 11:20 S1: Jukka Corander 2446
11:20 - 12:00 S2: Hoài An Lê Thi 2446
12:00 - 13:30 Lunch  Eklundshof
13:30 - 14:10 S3: Poster Session 2446
14:10 - 14:50 S4: Florence d'Alche-Buc 2446
14:50 - 15:20 Coffee Break 2446
15:20 - 16:00 S5: Hiroshi Mamitsuka 2446
16:00 - 16:40 S6: Anders Søgaard 2446
18:00 - 20:00 Dinner 4308
FRIDAY OCTOBER 8
9:00 - 9:40 S7: Shakir Mohamed 2446
9:40 - 10:20 S8: Petros Dellaportas 2446
10:20 - 10:50 Coffee break 2446
10:50 - 11:30 S9: Panel Discussion 2446
11:30 - 12:10 S10: Alan Said 2446
12:10 - 13:40 Lunch and closing Eklundshof

Registration

The registration is open from September 1st to October 1st. Go to the registration form.
Even if you have not registered you are welcome to attend any of the talks. Lunches and dinner are however reserved for registrated participants. To find out if late registration is possible, please contact the local organizers.

Sponsors

Scientific committee

Elisabeth Larsson, CIM, Uppsala University

Michael Ashcroft, Computing Science, Dept. of Information Technology, Uppsala University

Christian Hardmeier, Dept. of Linguistics and Philology, Uppsala University

Josef Höök, Scientific Computing, Dept. of Information Technology, Uppsala University

Cris Luengo, Centre for Image Analysis, Dept. of Information Technology, Uppsala University

Thomas Schön, Systems and Control, Dept. of Information Technology, Uppsala University

Hongli Zeng, Applied Mathematics, Dept. of Mathematics, Uppsala University

Silvelyn Zwanzig, Mathematical Statistics, Dept. of Mathematics, Uppsala University

Local organizing committee

Elisabeth Larsson, CIM, Uppsala University

Michael Ashcroft, CIM, Uppsala University

Jing Liu, CIM, Uppsala University

Talk Abstracts

Session 1

Speaker: Jukka Corander

Title: Learning Markov networks with marginal pseudolikelihood

Abstract: Markov networks, a.k.a. undirected graphical models are popular for a wide spectrum of application areas such as image analysis, spatial statistics and statistical mechanics. Without restricting assumption concerning triangulation Markov network models are difficult to fit to data due to intractability of their partition functions which renders the joint distribution unnormalized. Bayesian learning of the neighborhood structure is known as triply intractable problem and use of auxiliary variable MCMC techniques is not computationally feasible beyond small networks. LASSO based regression has recently been popularized as a technique for learning Markov networks in high dimensions. However, use of LASSO requires choice of a penalty parameter which is computationally demanding and may yield suboptimal results since a single penalty value is used for the whole network. We introduce an approximate inference technique based on combining pseudo-likelihood and local reference priors which automatically regularize the learning problem. Our estimator is consistent and shown to perform favorably against all popular alternatives both for discrete-valued and Gaussian Markov networks over a wide range of dimensionality.

Session 2

Speaker: Hoài An Lê Thi

Title: DC programming and DCA in Machine Learning and Data Mining

Abstract: One of the challenges for the scientists at the present time consists of the optimal exploitation of a huge quantity of data of the information stored in various forms. The knowledge extraction from these data requires the use of sophisticated techniques and high performance algorithms based on solid theoretical foundations and statistics. Based on the powerful arsenal of convex analysis, DC (Difference on Convex functions) programming and DCA (DC Algorithms)  http://www.lita.univ-lorraine.fr/~lethi/index.php/dca.html are among the few nonconvex optimization approaches that can meet this requirement. Machine Learning and Data Mining  (MLDM) represent a mine of optimization problems that are almost all DC programs for which appropriate resolutions should resort to DC programming and DCA. During the last two decades DC programming and DCA have been successfully applied for modeling and solving a lot of  nonconvex programs in various areas of MLDM. This talk presents recent developments on DC programming and DCA in MLDM. After a brief introduction to DC programming and DCA we give a review and analysis on the existing methods based on DC programming and DCA in MLDM. We also show that standard algorithms in this domain are special cases of DCA. Finally, we discuss about recent advances and ongoing developments in DC Programming and DCA to challenging topics in MLDM including Learning with sparsity and uncertainty, Online learning, Big data, etc.

Session 3

Speaker: Florence d'Alche-Buc

Title: Learning vector autoregressive models with operator-valued kernels with application to biological network inference.

Abstract: Learning vector autoregressive models with operator-valued kernels with application to biological network inference Reverse-engineering of high-dimensional dynamical systems from time-course data still remains a challenge in knowledge discovery and especially in systems biology. For this learning task, a number of approaches primarily based on sparse linear models or Granger causality conceptshave been proposed in the literature. However, when system exhibits nonlinear dynamics, there does not exist asystematic approach that takes into account the nature of the underlying system. In this work, we introduce a novel family of vector autorégressive models based on different operator-valued kernels to identify the dynamical system and retrieve the target network that characterizes the interactions of its components. Assuming a sparse underlying structure, a key challenge, also present in the linear case, is to control the model's sparsity. This is achieved through the joint learning of the structure of the kernel and the basis vectors. To solve this learning task, we propose an alternating optimizatio n algorithm based on proximal gradient procedures that learns both the structure of the kernel and the basis vectors. Experimental results on gene regulatory network inference and climate data confirm the ability of the learning scheme to retrieve dependencies between state-variables.

Session 5

Speaker: Hiroshi Mamitsuka

Title: Collaborative matrix factorization for predicting drug-target interactions

Abstract: Computationally predicting drug-target interactions is useful to discover potential new drugs. Currently, promising machine learning approaches for this issue use not only known drug-target interactions but also drug and target similarities. This idea can be well-accepted pharmacologically, since the two types of similarities correspond to two recently advocated concepts, so-called, the chemical space and the genomic space. In this talk, I will first briefly review current similarity-based machine learning methods for predicting drug-target interactions and then present our recent method, being based on a factor model, named Multiple Similarities Collaborative Matrix Factorization (MSCMF). MSCMF projects drugs and targets into a common low-rank feature space (matrix), which is estimated to be consistent with similarities over drugs and those over targets by alternating least squares. We note that our setting is general binary relations with similarities over instances, which can be found in many applications, such as recommender systems. In fact, MSCMF is an extension/generalization of weighted low-rank approximation for one-class collaborative filtering.

Session 6

Speaker: Anders Søgaard

Title: Occam’s Chainsaw: What not to learn when learning models of language

Abstract: How do you sample randomly from language? Language is constantly subject to shift, but even if we ignore linguistic change, the accumulated linguistic experience of any of us undergoes entrenchment and abstraction conditional on our interactions with another people. In other words, we cannot escape selection biases. On top of this, linguistics is not an exact science, and we cannot escape labelling biases either. Consequently, machine learning for natural language processing (NLP) is moving toward evaluations with dozens of datasets, trying to optimize average performance across domains and language varieties. This in turn has consequences for the models we end up with, which are typically more heavily regularized than models for other applications. I present examples of recent work on regularization techniques tailored for NLP.

Session 7

Speaker: Shakir Mohamed

Title: Bayesian Reasoning and Deep Learning

Abstract: Deep learning and Bayesian machine learning are currently two of the most active areas of machine learning research. Deep learning provides a powerful class of models and an easy framework for learning that now provides state-of-the-art methods for applications ranging from image classification to speech recognition. Bayesian reasoning provides a powerful approach for information integration, inference and decision making that has established it as the key tool for data-efficient learning, uncertainty quantification and robust model composition that is widely used in applications ranging from information retrieval to large-scale ranking. Each of these research areas has shortcomings that can be effectively addressed by the other, pointing towards a needed convergence of these two areas of machine learning; the complementary aspects of these two research areas is the focus of this talk. Using the tools of auto-encoders and latent variable models, we shall discuss some of the ways in which our machine learning practice is enhanced by combining deep learning with Bayesian reasoning. This is an essential, and ongoing, convergence that will only continue to accelerate and provides some of the most exciting prospects, some of which we shall discuss, for contemporary machine learning research.

Session 8

Speaker: Petros Dellaportas

Title: Scalable inference for a full multivariate stochastic volatility model

Abstract: We introduce a multivariate stochastic volatility model for asset returns that imposes no restrictions to the structure of the volatility matrix and treats all its elements as functions of latent stochastic processes. When the number of assets is prohibitively  large, we propose a factor multivariate stochastic volatility model in which the variances and correlations of the factors evolve stochastically over time. Inference is achieved via a carefully designed feasible and scalable Markov chain Monte Carlo algorithm that combines two computationally important ingredients: it utilizes invariant to the prior Metropolis proposal densities for simultaneously updating all latent paths and has quadratic, rather than cubic, computational complexity when evaluating the multivariate normal densities required. We apply our modelling and computational methodology to 571 stocks of Euro STOXX  index for data over a period of 10 years.  

Session 10

Speaker: Alan Said

Title: Multilingual Document Classification Using LDA

Abstract: TBA