Time; Hall 1; Hall 2
09:30 - 10:00;Registration. Welcome-coffee
10:00 - 10:45; From bag of texts to bag of clusters
Terpil Ievgen / Pavel Khudan (Data Scientists / NLP Engineer at YouScan)
In this talk we discuss some properties of generalized preferential attachment models. A general approach to preferential attachment was introduced in 2012, where a wide class of models (PA-class) was defined in terms of constraints that are sufficient for the study of the degree distribution and the clustering coefficient. These models used in a lot of tasks such as searching the SPAM, learning to rank, etc.; Kappa Architecture: How to implement a real-time streaming data analytics engine
Juantomás García (Data Solutions Manager at OpenSistemas), Madrid, Spain
We will have an introduction of what is the kappa architecture vs lambda architecture. We will see how kappa architecture is a good solution to implement solutions in (almost) real time when we need to analyze data in streaming. We will show in a case of real use: how architecture is designed, how pipelines are organized and how data scientists use it. We will review the most used technologies to implement it from apache Kafka + spark using Scala to new tools like apache beam / google dataflow.
10:50 - 11:35 ;Low latency model serving and large scale prediction pipelines on top of Spark
Stepan Pushkarev (GM (Kazan) at Provectus / CTO at Hydrosphere.io)
Real-time single row serving pipelines power such use cases as fraud detection, online content recommendations, image & text recognition and others.
Apache Spark is an offline engine for training MLlib, H2O, xgboost, TensorFlow and other models. However, for end-to-end solutions, it is required to plug Analytics into enterprise architecture to make user-faced products smarter.
In this talk we’ll learn how to streamline and simplify deployment and serving of all these prediction pipelines in production with low latency.
Training a model in apache Spark while having it automatically available for real-time serving is an essential feature for end-to-end solutions. However it is not a focus of Apache Spark to deal with single row serving. The Spark community has partially completed the separation of ml-local and are having ongoing discussions about the priority of that project. Also Databricks has released a proprietary tools of model exporting and serving it outside of Apache Spark.
There is an option to export the model into PMML/PFA and then import it into a separated scoring engine. The idea of interoperability is great but it has multiple challenges, such as code duplication, limited extensibility, inconsistency, and extra moving parts.
We are proposing a solution in this talk that has following advantages:
- No custom formats and new standards.
- No exports/imports for algorithms.
- No vendor lock – all the components are opensource
- Shared Spark API.
- Convergence with the Apache Spark Roadmap.; Geometrical correction of optical satellite images
Oleksiy Kravchenko (Senior Data Scientist at Zoral Labs)
We will talk about the variety of available satellite data and motivational applications in agriculture, land-cover mapping and forestry. Then we will focus on geometrical correction of images that is the first stage of satellite data processing pipeline. That includes data geolocation, image registration, subpixel key-points detection, and band-to-band alignment. On top of that we will touch upon several interesting and unexpected ways of performing satellite attitude/jitter estimation and cloud detection.
11:35 - 12:00; Networking Break
12:00 - 12:45;Patient similarity: duplicates cleaning and predicting missing diagnosisв
Victor Sarapin (CEO at V.I.Tech)
How to handle efficiently possible duplicates in ~10^7 population, and detecting possibly missing diagnosis and care items.; High-performance computing capabilities for data analysis systems
Mikhail Fedoseev (Architect of Infrastructure Solutions, Lantec)
In the report, we will talk about the hardware side of data analysis systems for cases of building private clouds or local high-performance computing clusters. Consider what technologies and integrated solutions from the company Hewlett Packard Enterprise can accelerate the process of data analysis. This is not only the best-in-class HPE Apollo servers in its segment, as well as HPE high-speed network switches, but also additional supporting elements of the solution, such as powerful NVIDIA graphics cards and Xeon Phi host processors. The HPE Core HPC Software Stack will also be reviewed, which allows administrators to control the use of system resources.
12:50 - 13:35;Survey of Face Detection Approaches
Yurii Pashchenko (Research Engineer, Ring Labs)
In this presentation we provide an overview of new and the most popular face detection approaches, such as Viola-Jones, Faster-RCNN, MTCCN, etc. We will discuss modern benchmarks for evaluation of the algorithm's quality including FDDB, WIDER, IJB-A, etc; BioVec: Word2Vec-like technology for bioinformatics that analyzes genomic data.
Dmitry Nowicki (Researcher at IMMSP NASU)
This presentation is dedicated to BioVec: Word2Vec-like technology for bioinformatic problems.
An extension of word vectors in biological sequences (e.g. DNA, RNA, and Proteins) for bioinformatics applications have been proposed by Asgari and Mofrad It is bio-vectors (BioVec) to refer to biological sequences in general with protein-vectors (ProtVec) for proteins (amino-acid sequences) and gene-vectors (GeneVec) for gene sequences, this representation can be widely used in applications of machine learning in proteomics and genomics. The results suggest that BioVectors can characterize biological sequences in terms of biochemical and biophysical interpretations of the underlying patterns.
13:35 - 15:00; Dinner
15:00 - 15:45; Optimizing ML Hyper-parameters with Bayesian Optimization
Maksym Bevza (Research Engineer at Grammarly)
All of ML algorithms require tuning. We use Grid Search, Randomized Search or our own intuition to select the hyper-parameters. Bayesian Optimization helps us to guide Randomized Search such that the same overall results can be achieved in less number of iterations.; Data Sciences and Big Data in Telecom
Oleksandr Saienko (Software Engineer at SoftServe/CISCO)
At our conference Oleksandr will speak about some interesting Big Data and Data Science case studies in Telecom: network optimization, improving user experience, mobile location predictive models, customer churn revention, fraud detection and others. Oleksandr will present an overview of modern approaches on the basis of machine learning algorithms.
15:50 - 16:35; Fashion trend monitoring using Deep Learning and Tensorflow.
Olha Romaniukк (Data Scientist at Eleks)
Over the last 8 month we at Eleks have been working on the fashion trend tracking system. The system is based on residual neural network with identity mapping. The network has been trained with online data augmentation and data paralelization over 2 GPU cards. We have set up everythung from scratch using TensorFlow. In the presentation, I will focus on practical side of this projeect, implementation tricks and pitfalls we figured out during this project; How to know (almost) everything about customers?
Darina Peremot (ML Engineer at SynergyOne)
We are going to answer the question "What does the customer want?". We will share customer transactions research results and guess whether you have a pet. We will show how machine learning helps to get to know you better.
16:35 - 17:00; Networking Break
17:00 - 17:45; Identifying and Annotating Speakers in Phone Conversations
Yuriy Guts (Machine Learning Engineer, DataRobot )
Speaker diarization is a challenging multimedia search and indexing problem, the goal of which is to answer the question "Who spoke when?" without any a priori information about the speakers present in the audio/video recording. In this talk, we'll discuss the approaches for speaker diarization in phone conversations; Reasoning with probabilistic graphical models in project based business
Olga Tataryntseva (Data Scientist at Eleks)
How often do you need to make decisions regarding the knowledge you have about the particular domain? Are your decisions good enough every time? And now pretend that you have gathered the visions of the best experts in the domain. Looks like your decisions should be much more considered with their help, don't they?
We will talk about the reasoning system that has been built based on the experience of the best experts in the project based business in Eleks - ProjectHealth. Now it saves the bunch of time and efforts for the top management of Eleks as it tracks the business in every smallest detail on everyday basis and does that as the real expert.
17:55 - 19:00; Lighting Talks and Discussions:
Enterprise out of the box
Sergey Shelpuk (Head of Data Science Office at Eleks)
Enterprise IT architecture for data analytics with open source components
Recent deep learning approaches for speech generation
Dmytro Bielievtsov (Techlead at IBDI)
In the last half a year we've seen a lot of progress in applying deep learning for sample-level speech generation. These models generate speech waveforms directly and therefore overcome many of the limitations connected to the use of spectral vocoders. In this talk I'll give a brief overview of some of the seminal models in this field such as Wavenet and SampleRNN.;
Distributed calculations: use of BOINC in Data Science
Vitalii Koshura (Software Developer at Lohika)
BOINC is an open-source software for distributed computing. Currently it supports different platforms, devices and technologies: 4 desktop OSs (Windows, MacOS, Linux, FreeBSD), mobile phones (Android-based), ARM-based devices (Raspberry PI), 3 GPUs (nVidia, AMD Radeon, Intel) and VirtualBox Virtualization technology. This paper enlights the usage of BOINC application in different fields of science related to processing of huge amount of data on the example of current active research projects.
The use of machine learning in the development of HR product
Mamed Khalilov(CEO & Founder at Morbax HR)
Describing of product stages with help of machine learning using linear mathematics, multilayered neural network and much more.
19:00 - 19:15; Conference Close
19:15 - ... ; Afterparty