Active Learning on MNIST – saving on labeling

Active Learning is a semi-supervised technique that allows to label less data by selecting most important samples from the learning process (loss) standpoint. This can have a huge impact on the cost of the projects in the case when the amount of data is large and labeling rate is high. For example for object detection and NLP-NER problems.

The article is based on the code: Active Learning on MNIST Continue reading “Active Learning on MNIST – saving on labeling”

Data Science Conference in Lviv

Data Science and Engineering conference in Lviv become regular. Good time to meet your colleagues, students, and teachers, of course.  In general – some Data Science presentations were from Data Engineering flow and vice versa. What was really interested in this event from my point of view.

Volodymyr Getmansky showed a model that tries to colorify a  satellite image and recognize its scale. Surprisingly, even 20K of input images with labeled (by the way using some computer vision techniques) distances was enough to train regression model that recognized map scale with nearly 20-30% accuracy. Really global problems can be solved by moderate-size datasets.

Prognosis is not a regression problem, even if looks similarly – that is what Taras Firman showed in his presentation about time series analysis. More data seasoning analysis plus feature engineering and less complexity of heavy RNNs – that is a simple receipt of efficiency in this field.

How to add semantic to an image – brilliant idea described by Oles Petriv as a result of his experience in computer vision and natural language processing. Just prepare a map of all words in 300-dimension space, measure distance between them and connect this information to CNN that does object detection. As result, it is possible to find “speed” or even “spring” on a picture – very impressive and the only presentation with a live demonstration.

Panel discussion “From buzzwords to reality” tried to evaluate the hype around “Optimization” in “Applied Mathematics” that became a “Machine (Deep)  Learning” in “Data Science”. Everybody agreed that this DS bubble is similar to DotCom rush in 2000x, however, we believe it will become mature and predictable as software development.

Interesting presentation on Data Engineering flow made by Neuromation representative Denis Popov. The company prepares pictures of goods to feed the neural networks. Business idea lays in freeing merchandizers from inspecting store shelves, what is understandable, but generate artificial data from 3D-models to train artificial NN – that sounds like a very temporary enterprise and introduces another challenge for data science community to go from supervised learning to real intelligence.

Andy Bosyi,
Information Technology & Data Science

Lviv AI & Big Data Day collected Data Science people from Ukraine

A ticket to AI & Big Data Day worth its cost. Data Science only touches grand IT business in Lviv and most of the conference participants were true amateurs in this magic in a good sense of the word.

NLP and Deep Learning

The first speaker was Volodymyr Getmanskyi with NLP and NLU in details. What was interested that in this first line of Data Science technologies that were able to show business value quickly they still build models from scratch and using monsters like IBM Watson does not add much value to solve really specific problems.

Continuing talking about speakers from the Eleks company for me it was very interesting to hear about practical modeling of a system that recognizes what people wear from Sergey Shelpuk and Olha Romaniuk. They build and teach a Convolution Neural Network based on 154 layers of Con-Con-MPool design (actually adjusted architecture from the winner of the ImageNet challenge). What was interesting that the main training session lasted for three days using TensorFlow on two Nvidia 1080 GPUs and one CPU.

Data Science in Business and Legal

I visited a business section of the conference to listen Mamed Khalilov about probs in AI startups. His main message was – hey my high-tech friends that start a business in AI – forget about ML (or any other buzzword) in stating goals for you and the team, marketing services or a product to your clients – focus on the business value and use ML as a tool to achieve it. And of course knowledge of the business field – data understanding expertise goes before the deep learning latest trends and other AI techniques.

Ivan Horodyskyy gave us a good head-up on near future legal issues with AI and Robotics in EU. Interesting thing was that the definition of the robot was taken from Golem, Frankenstein, Chapek’s robot and Asimov’s Laws. The main question was the liability, as it turns out, the robot cannot be liable and everything falls on the user/ trainer/ owner/ manufacturer/ designer chain. Hopefully, the EU institution clearly stands that the object of the regulations must have a physical body which is not a concern for a software product as long as it was not placed on some hardware.

Computer Vision

My favorite topic on the conf was from Dmytro Peleshko who is working on computer vision problems. His ideas and results in background detection, classification, object tracking and recognition inspired me for new steps in my works related to object movement tracking. We had a very good discussion on the problem of multiple objects intersection and identification in distributed cameras system during which I shared my solution on the shadow suppression problem.

I liked the conf pretty much, the only thing to adjust is the frequency of such events. One year means a lot if we count it from the Moore’s Law standpoint.

Andy Bosyi,
Information Technology & Data Science

Object Detection and Counting

Object detection, especially recognition can be done using different technics, like a combination of OpenCV functions. For me, it was rather interesting to build a quick model in R then to spend weeks writing long C++ or .NET code for it. I started with a people counter as a practical application for the object detection and took a footage of people passing by the office.

The first thing needed is to prepare images from video using FFmpeg. Then choose a background image and create a matrix of difference between an image with an object on it and the background one. As it can be found on my blog I have created an R library for raster image processing and vectorization – fasteraster, this can be used for object detection, thus the idea was to vectorize the matrix of differences by some gradient-detected zones.

On this picture, there is a variant #1, where the matrix of pictures was represented by simple RGB values. Their comparison gave me a strong object detection of the person’s shadow (see in the course code):object detection - shadows

Thus next idea (variant #2 in code) was to calc differentials between colors (Red / Green, Green / Blue) and then compare to the background. This cleaned shadow detection but introduced another issue with a lot of detection of dark areas that probably caused by poor CMOS-camera color detection capabilities:object detection - dark areas

Then I decided to subtract the colors (Red – Green, Green – Blue) and it worked just fine. I also added filtering for detected zones weight and shown these on the video.As you can see, there is another problem when a black object moves through the black background  – it is being split into two or three parts:

In this case, I just added code to join the areas and calculate the new center of the joined object. Added track line and two green margins to detect that object passed both in the same direction:

As one can see the model itself took a page of code, most of it was for the visualization. However following items was not included in the model:

  • background image – it has to adjust to the weather, daytime and other conditions (like somebody left a bag in the middle of the observation area).
  • count the objects – simply check the vectors crossed the green margins
  • multiple objects detection – needs identification algorithm based on the path approximation.
  • joined objects recognition – needs clusterization of shape medians to split the joined area into smaller ones by average weight and path approximation.
Object detection R source code

X <- 48 * 2 ;
Y <- 27 * 2;
from <- 140;
to <- 200;

matrixFromFrame <- function (idx)
  v <- readPNG(sprintf("in/%03d.png", idx));
  rgb <- lapply(1:3, function(x) as.matrix(aggregate(raster(v[ , , x]), fact = 5)));
  rgb <- lapply(rgb, function(x) t(x)[1:X, Y:1]);
  #1 return(rgb);
  #2 return(list(rgb[[1]] / rgb[[2]], rgb[[2]] / rgb[[3]]));
  return(list(rgb[[1]] - rgb[[2]], rgb[[2]] - rgb[[3]], (rgb[[1]] + rgb[[2]] + rgb[[3]]) / 3));

processFrame <- function(idx, back)
#  png(file = sprintf("out/final%03d.png", idx), width = 640, height = 480);
  rggb <- matrixFromFrame(idx);
  diff <- (rggb[[1]] - back[[1]]) ^ 2 + (rggb[[2]] - back[[2]]) ^ 2;
  pol <- raster2vector(diff, 0.001, 100, 100);
  plot(0, type = "l", xlim = c(1, X), ylim = c(1, Y));
  rasterImage(readPNG(sprintf("in/%03d.png", idx), native = TRUE), 1, 1, X, Y);
  abline(v = 30, col = 'green');
  abline(v = 70, col = 'green');
  lapply(pol, function(x) lines(rbind(x, x[1,]), col = 'blue'));
  zone <- rasterZoneAnalyzer(diff, 0.001, 100, 100);
  zone <- zone[zone[ , 2] > 10, , drop = FALSE];
  #text(zone[ , 3], zone[ , 4], labels = zone[ , 2], col = 'red');
  track[[idx - from + 1, 1]] <<- sum(zone[, 2] * zone[, 3]) / sum(zone[, 2]);
  track[[idx - from + 1, 2]] <<- sum(zone[, 2] * zone[, 4]) / sum(zone[, 2]);
  lines(track, col = 'red');
  points(track, col = 'red', pch = 20);

track <- matrix(nrow = to - from + 1, ncol = 2);
back <- matrixFromFrame(100);
lapply(from:to, function(x) processFrame(x, back));

Andy Bosyi,
Information Technology & Data Science

The Natural Ear for Digital Sound Processing – as an alternative to the Fourier Transformation

This is a primitive prototype of the natural ear. Why I came to it and how it can be better than the Fast Fourier Transformation (FFT) in Digital Sound Processing (DSP) – that what the article is about.

Some of the software development projects that I was related to used Fourier Transformation for waveform analysis. The projects Included sound tone recognition for gun targets and DTMF signals. But before that, I was keen to get a “picture” of the human speech and music harmony. Recently I started an app that will play some instrument while I am playing a lead guitar. The problem was to teach the computer to listen to my tempo and keep the musical rhythm in order. To accomplish this I used Fourier Transformation for the first seconds of Pink Floyd composition “Marooned”. Then I compared the “picture” to the same composition performed by me and the results were poor until I selected FFT block size as much as 8192 to recognize notes at least to 6th octave.

This showed the first problem with Fourier Transformation – for really good analysis you need to increase block size (on a number of frequency bins) and, as result, performance goes down, especially for real-time processing.

The second problem of Fourier Transformation analysis for music – the same instrument depending on the timbre can generate the different set of overtones. These overtone frequencies analyzed by FFT created peaks that were irrelevant to what we actually hear. To generalize the result I summarized the frequency bins by twelve semitones. The picture was better, but now the very first note recognized as C, while it was B in fact:

This forced me to read more about the nature of sound, hearing and human ear. I thought that maybe the problem is the third problem with Fourier Transformation – it is sensitive to the signal phase. The human ear does not recognize phase of individual harmonics, only frequencies.

I created a model using R language (you can find the code at the end of the article) that generates input signals for a set of frequencies:

Then used some formulas I combined fifteen years ago ( the same experiment failed due to the poor PC performance) to create a model of a pendulum. The object can receive an incoming signal and oscillate if there is a frequency in the signal that is the same it’s own:



The fading coefficient that does not depend on the auto-oscillation frequency of the pendulum:

The position of the pendulum:

Velocity and energy:

This is a reaction of the pendulum on the same frequency signal:

green – input signal
blue – pendulum oscillation
red – pendulum energy

For the input signal that slightly differs from the frequency of the pendulum the amplitude and energy are significantly smaller than in the previous result:

Combined plot for nine different signals – the central one has been recognized:

After that, I built a set of pendulums for different frequencies to cover five octaves and twelve notes. This is resulting energy for 60 pendulums listening to the first chords of “Marooned”:

And as result, the main tone was detected correctly. I think that ability of the human ear to omit the phase information of the input signal is crucial for the music recognition. I used this model to create a C++ library named Cochlea to listen, detect and synchronize music in real-time. That will be described in next article.

R code

#define a class that imitate a pendulum and has two methods - init and tick
pendulum <- setRefClass(
  fields = list( v = "numeric",
                 x = "numeric",
                 K = "numeric",
                 T = "numeric",
                 Phi = "numeric",
                 E = "numeric",
                 lastS = "numeric"),
  methods = list(
    #define the initial state and calculate coefficients
    init = function(w = "numeric")
      T <<- 44100 / w;
      #coefficient of elasticity
      K <<- (2 * pi / T) * (2 * pi / T);
      #fading coefficient
      Phi <<- 2 * atan(T) / pi;
      #initial state
      v <<- 0;
      x <<- 0;
      lastS <<- 0;
    #pass the position of the stimulating lever
    tick = function(s)
      lastX <- x;
      x <<- x + (v + s - lastS - K * x) * Phi;
      v <<- x - lastX;
      E <<- (v * v) / 2 + (K * x * x) / 2;
      lastS <<- s;
      return(c(x, E));

#create one pendulum and init with 700 as frequency of auto-oscillation
p <- pendulum();

#init a vector of waveforms with frequencies from 500 to 900
m <- aaply(seq(500, 900, 50), 1, function(x) sine(x, 1500)@left);

# clear end of the waveform
m[, 1001:1500] <- 0;

#apply the pendulum tick to the vector of waveforms
m <- t(m);
r <- aaply(m, c(1, 2), p$tick, .progress = "time");

#index of the waveform to  plot
i <- 5;

#show results
plot(m[, i] * 100, type = "l", col = "dark green");
lines(r[ , i, 1], type = "l", col = "blue");
lines(r[ , i, 2], type = "l", col = "red");

Andy Bosyi,
Information Technology & Data Science

Vectorization of raster to polygons

I would never start to write any vectorization code if there was any free library. However, recently I was involved as a tech lead in an interesting project related to geometry. We needed to calculate complex projections of numerous shapes. As we started with “clean” mathematical solution, we quickly end up with a huge number of calculations related to polygon triangulation – O(log n!). We spent a week and I found that we are in a real trouble – for real scenarios the process lasted for minutes. Then I decided to turn our faces to discretization (as it was declared at the beginning) and we did the job and got the result in a form of a matrix. It was Friday and on the next Monday, we ought to present the results. But in a vector form.

Search for raster vectorization in R packages shown this package and the function rasterToPolygons looked good, perhaps it was producing too many points for the polygons. If there is no needed R package then I need to create own. Well, still having a half of a weekend, I did some C++ coding and created this function that does the job in one pass.

Initial bitmap with an enclave and an option to do not allow exclaves and the result:

initial imagevectorization result #1

and an example from the volcano dataset:

 inp = volcano;
 res = raster2vector(volcano, 120, 200, 20);
 image(inp, col = rev(grey.colors(100)), useRaster = TRUE)
 plot(0, type = "l", xlim = c(0, nrow(inp)), ylim = c(0, ncol(inp)))
 a = lapply(res, function(x) lines(rbind(x, x[1,])))


volcano imagevectorization result #2

You can find source package: fasteraster_1.0.4.tar

and Linux 64 binary here: fasteraster_1.0.4_R_x86_64-pc-linux-gnu.tar

or get a fresh version right from CRAN:

Andy Bosyi,
Information Technology & Data Science