This project is concerned with the problem of learning sequentially, adaptively and in partial information on an uncertain environment. In this setting, the learner collects sequentially and actively the data, which is not available before-hand in a batch form. The process is as follows: at each time t, the learner chooses an action and receives a data point, that depends on the performed action. The learner collects data in order to learn the system, but also to achieve a goal (characterized by an objective function) that depends on the application. In this project, we will aim at solving this problem under general objective functions, and dependency in the data collecting process – exploring variations of the so-called bandit setting which corresponds to this problem with a specific objective function.
As a motivating example, consider the problem of sequential and active attention detection through an eye tracker. A human user is looking at a screen, and the objective of an automatized monitor (learner) is to identify through an eye tracker zones of this screen where the user is not paying sufficient attention. In order to do so, the monitor is allowed at each time t to flash a small zone a t in the screen, e.g. light a pixel (action), and the eye tracker detects through the eye movement if the user has observed this flash. Ideally the monitor should focus on these difficult zones and flash more often there (i.e. choose more often specific actions corresponding to less identified zones). Therefore, sequential and adaptive learning methods are expected to improve the performances of the monitor.
The PhD candidate will focus on developing sequential learning algorithms with mathematical guarantees for learning on given non-stationary processes that are relevant in the context of recommendation systems, and on implementation of the algorithms that will be developed. S/He will collaborate with a second PhD student at the University of Potsdam on the eye tracker based application. A degree in machine learning or in mathematics with an interest in theoretical computer science will be preferred.
Vernade, C., Carpentier, A., Lattimore, T., Zappella, G., Ermis, B. and Brueckner, M. (2020). Linear Bandits with Stochastic Delayed Feedback. arXiv:1807.02089
Lefakis, L., Zadorozhnyi, O. and Blanchard, G. (2019). Efficient Regularized Piecewise-Linear Regression Trees. arXiv: 1907.00275
Zadorozhnyi, O., Blanchard, G. and Carpentier, A. (2019). Restless dependent bandits with fading memory. arXiv: 1906.10454
Blanchard, G. and Zadorozhnyi, O. (2019). Concentration of weakly dependent Banach-valued sums and applications to statistical learning methods. Bernoulli, 25(4B), 3421-3458. doi:10.3150/18-BEJ1095 (arXiv: 1712.01934)
Achdou, J., Lam, J., Carpentier, A. and Blanchard, G. (2019). A minimax near-optimal algorithm for adaptive rejection sampling. Accepted for ALT'2019. Arxiv 1810.09390
Locatelli, A., Carpentier, A., and Valko, M. (2019). Active multiple matrix completion with adaptive confidence sets. Proceedings of Machine Learning Research, PMLR, 89, 1783-1791. Open Access
Seznec, J, Locatelli, A., Carpentier, A., Lazaric, A., and Valko, M. (2019). Rotting bandits are no harder than stochastic ones. Proceedings of Machine Learning Research, in PMLR 89:2564-2572. Open Access
Blanchard, G., Carpentier, A. and Gutzeit, M. (2018). Minimax Euclidean Separation Rates for Testing Convex Hypotheses in Rd. Electron. J. Statist. 12 (2): 3713-3735. doi:10.1214/18-EJS1472
Locatelli, A., Carpentier, A., and Kpotufe, S. (2018). An Adaptive Strategy for Active Learning with Smooth Decision Boundary. Proceedings of Machine Learning Research (ALT), 83, 547-571. Open Access