Tuesday, January 26, 2016

Data science and software testing, it's all about the question


When I started my career in software testing I was a biologist without business experience, but I knew how to crunch data through statistics, python and machine learning.
In the last 11 years software testing was my main profession and still is.
But, more and more companies are into Big Data (as a part of data science) and as a biologist, trained in crunching lots of data (genetics, bioinformatics), I got curious.
Is there a way to combine my knowledge of statistics and crunching big data and software testing in today's business?
Sure there is: a lot of methods (statistics, data mining, web scraping) and programming language (R, python) used in data science can also be used in software testing.
Both software testing and data science are empirical studies trying to answer a specific question. The answer to this question can be derived by using tools or methods.
Mind you, don't let the tool or method determine how the answering process proceeds, let the question be the determinant.
Be open minded! Remember a fool with a tool is just a fool.

Data science and software testing

Data science is not just statistics, it is an interdisciplinary field like bioinformatics, combining mathematics, statistics, computer science, information science etc.
Just like Big Data, it's a buzz word, but a data scientist, according to Coursera, has one goal:

Ask the right questions, manipulate data sets, and create visualizations to communicate results.

Well, that's the same in software testing.
Without the correct question, dataset and visualization (report) a software tester can't inform the stakeholder about the state of quality of the object under test.

Now I know testers have tools like Jira, Microsoft Excel and Selenium to help them.
Why should we know about data science then?
Well, as I said before, a fool with a tool is just a fool.
You maybe know how to use many testtools, but the most important thing a tester does is asking the right questions. This triggers the other stakeholders to answer these and this way possible issues are found.
Data science is all about asking the right questions. It can help the tester with creating the question and deriving the testset, even when the testset has missing data. It also learns the tester how to visualize its findings.
Testtools can also do these things, but, in my opinion, a tester should be able to do it himself.
Knowing data science can help the tester to stay critical. 
There are a lot of data science courses online like Coursera or Udacity.
Try a course, it won't be easy, but that's part of the learning.


Software testers can learn from data science to help them in their daily work: ask open minded critical questions, testdata development and processing, testtool selection and visualizing the quality of the object under test.

For me, data science increased my ability to ask the right questions and diminished the fear of going too deep into the data. 
A software tester never should be afraid to ask the right questions to different (!) people, go deep if neccesary and report his/her findings
You have a job to do: Visualize the quality of the object under test, as critical as possible!