1. Ethical Machine Learning in Health
Irene Y. Chen, Emma Pierson, Sherri Rose, Shalmali Joshi, Kadija Ferryman, Marzyeh Ghassemi
The use of machine learning (ML) in health care raises numerous ethical concerns, especially as models can amplify existing health inequities. Here, we outline ethical considerations for equitable ML in the advancement of health care. Specifically, we frame ethics of ML in health care through the lens of social justice. We describe ongoing efforts and outline challenges in a proposed pipeline of ethical ML in health, ranging from problem selection to post-deployment considerations. We close by summarizing recommendations to address these challenges.
Dream come true to coauthor a review article on ethical machine learning in health care with this team of brilliant women: @irenetrampoline, @2plus2make5, Shalmali Joshi, @KadijaFerryman & @MarzyehGhassemi.https://t.co/OSeqzKaPMf pic.twitter.com/1zw8mToaZR
— Sherri Rose (@sherrirose) September 23, 2020
2. SSMBA: Self-Supervised Manifold Based Data Augmentation for Improving Out-of-Domain Robustness
Nathan Ng, Kyunghyun Cho, Marzyeh Ghassemi
Models that perform well on a training domain often fail to generalize to out-of-domain (OOD) examples. Data augmentation is a common method used to prevent overfitting and improve OOD generalization. However, in natural language, it is difficult to generate new examples that stay on the underlying data manifold. We introduce SSMBA, a data augmentation method for generating synthetic training examples by using a pair of corruption and reconstruction functions to move randomly on a data manifold. We investigate the use of SSMBA in the natural language domain, leveraging the manifold assumption to reconstruct corrupted text with masked language models. In experiments on robustness benchmarks across 3 tasks and 9 datasets, SSMBA consistently outperforms existing data augmentation methods and baseline models on both in-domain and OOD data, achieving gains of 0.8% accuracy on OOD Amazon reviews, 1.8% accuracy on OOD MNLI, and 1.4 BLEU on in-domain IWSLT14 German-English.
in https://t.co/bfF4KPMb1J, @learn_ng draws a connection between VRM (chapelle & @jaseweston et al. https://t.co/3Ro0MHIiXY; in particular Β§4.2) and denoising autoencoder, and shows the effectiveness of using BERT-like MLM for data augmentation in 3 tasks with 9 datasets in NLP. pic.twitter.com/89rIdSq5FF
— Kyunghyun Cho (@kchonyc) September 23, 2020
New work with @kchonyc and @MarzyehGhassemi!
— Nathan Ng (@learn_ng) September 23, 2020
arxiv: https://t.co/ApaaP3xNkf
code: https://t.co/th47UxaxLT
We propose a novel data augmentation scheme based on using a pair of corruption and reconstruction functions to generate new examples along an underlying data manifold. 1/ pic.twitter.com/oIFJS0VNVu
3. A survey on Kornia: an Open Source Differentiable Computer Vision Library for PyTorch
E. Riba, D. Mishkin, J. Shi, D. Ponsa, F. Moreno-Noguer, G. Bradski
This work presents Kornia, an open source computer vision library built upon a set of differentiable routines and modules that aims to solve generic computer vision problems. The package uses PyTorch as its main backend, not only for efficiency but also to take advantage of the reverse auto-differentiation engine to define and compute the gradient of complex functions. Inspired by OpenCV, Kornia is composed of a set of modules containing operators that can be integrated into neural networks to train models to perform a wide range of operations including image transformations,camera calibration, epipolar geometry, and low level image processing techniques, such as filtering and edge detection that operate directly on high dimensional tensor representations on graphical processing units, generating faster systems. Examples of classical vision problems implemented using our framework are provided including a benchmark comparing to existing vision libraries.
E. Riba, D. Mishkin, J. Shi, D. Ponsa, F. Moreno-Noguer, G. Bradski, A survey on Kornia: an Open Source Differentiable #ComputerVision Library for PyTorch, arXiv, 2020
— Kosta Derpanis (@CSProfKGD) September 23, 2020
Paper: https://t.co/8S2ZSpNKhI@kornia_foss pic.twitter.com/eEHNHuJABQ
[πππ πππππ] now available in #arxiv
— Kornia (@kornia_foss) September 23, 2020
A survey on Kornia: an Open Source Differentiable Computer Vision Library for @PyTorch https://t.co/YDrAK8obBU@edgarriba @ducha_aiki @js_shijian @PonsaDaniel @fmorenoguer @grbradsk https://t.co/OlT1KbL3C8
4. A narrowing of AI research?
Joel Klinger, Juan Mateos-Garcia, Konstantinos Stathoulopoulos
Artificial Intelligence (AI) is being hailed as the latest example of a General Purpose Technology that could transform productivity and help tackle important societal challenges. This outcome is however not guaranteed: a myopic focus on short-term benefits could lock AI into technologies that turn out to be sub-optimal in the longer-run. Recent controversies about the dominance of deep learning methods and private labs in AI research suggest that the field may be getting narrower, but the evidence base is lacking. We seek to address this gap with an analysis of the thematic diversity of AI research in arXiv, a widely used pre-prints site. Having identified 110,000 AI papers in this corpus, we use hierarchical topic modelling to estimate the thematic composition of AI research, and this composition to calculate various metrics of research diversity. Our analysis suggests that diversity in AI research has stagnated in recent years, and that AI research involving private sector organisations tends to be less diverse than research in academia. This appears to be driven by a small number of prolific and narrowly-focused technology companies. Diversity in academia is bolstered by smaller institutions and research groups that may have less incentives to `raceβ and lower levels of collaboration with the private sector. We also find that private sector AI researchers tend to specialise in data and computationally intensive deep learning methods at the expense of research involving other (symbolic and statistical) AI methods, and of research that considers the societal and ethical implications of AI or applies it in domains like health. Our results suggest that there may be a rationale for policy action to prevent a premature narrowing of AI research that could reduce its societal benefits, but we note the incentive, information and scale hurdles standing in the way of such interventions.
A narrowing of AI research?
— Juan Mateos Garcia (@JMateosGarcia) September 23, 2020
Our new paper about the evolution of thematic diversity in AI research is live in arXiv now. I will post a thread about it sometime soon. Comments welcome!https://t.co/tabm03OBTi pic.twitter.com/1Um63oZkBI
5. On the Theory of Modern Quantum Algorithms
Jacob Biamonte
This dissertation unites variational computation with results and techniques appearing in the theory of ground state computation. It should be readable by graduate students. The topics covered include: Ising model reductions, stochastic versus quantum processes on graphs, quantum gates and circuits as tensor networks, variational quantum algorithms and Hamiltonian gadgets.
Second doctorate: "On the Theory of Modern Quantum Algorithms"
— Jacob D Biamonte (@JacobBiamonte) September 23, 2020
The dissertation focuses on variational computation and ground state computation.
It should be readable by graduate students. https://t.co/2KLXhmow2l
6. Proposal of a Novel Bug Bounty Implementation Using Gamification
Jamie OβHare, Lynsay A. Shepherd
Despite significant popularity, the bug bounty process has remained broadly unchanged since its inception, with limited implementation of gamification aspects. Existing literature recognises that current methods generate intensive resource demands, and can encounter issues impacting program effectiveness. This paper proposes a novel bug bounty process aiming to alleviate resource demands and mitigate inherent issues. Through the additional crowdsourcing of report verification where fellow hackers perform vulnerability verification and reproduction, the client organisation can reduce overheads at the cost of rewarding more participants. The incorporation of gamification elements provides a substitute for monetary rewards, as well as presenting possible mitigation of bug bounty program effectiveness issues. Collectively, traits of the proposed process appear appropriate for resource and budget-constrained organisations - such Higher Education institutions.
π° @Lynsay and I's #BugBounty paper is now live on arXiv.
— Jamie O'Hare (@TheHairyJ) September 23, 2020
We propose a possible bug bounty solution appropriate for resource and economically limited organisations, such as Universities.
Link π https://t.co/OqAtMo55lV
Find out more information in the thread below π pic.twitter.com/k6S8a4mGex
7. ALICE: Active Learning with Contrastive Natural Language Explanations
Weixin Liang, James Zou, Zhou Yu
Training a supervised neural network classifier typically requires many annotated training samples. Collecting and annotating a large number of data points are costly and sometimes even infeasible. Traditional annotation process uses a low-bandwidth human-machine communication interface: classification labels, each of which only provides several bits of information. We propose Active Learning with Contrastive Explanations (ALICE), an expert-in-the-loop training framework that utilizes contrastive natural language explanations to improve data efficiency in learning. ALICE learns to first use active learning to select the most informative pairs of label classes to elicit contrastive natural language explanations from experts. Then it extracts knowledge from these explanations using a semantic parser. Finally, it incorporates the extracted knowledge through dynamically changing the learning modelβs structure. We applied ALICE in two visual recognition tasks, bird species classification and social relationship classification. We found by incorporating contrastive explanations, our models outperform baseline models that are trained with 40-100% more training data. We found that adding 1 explanation leads to similar performance gain as adding 13-30 labeled training data points.
Our new #emnlp paper shows how to teach ML via natural language explanation of contrasts between concepts (eg " difference between COVID and flu is ...").
— James Zou (@james_y_zou) September 23, 2020
It's much more efficient than using labeled examples. Excited for more human-like learning! https://t.co/q0dznybPNe pic.twitter.com/eeV58VkdJl