Monday, November 7, 2016

Big data, what's in a name?

Last week I announced I am going to do a blog series about big data items and explain them in a straightforward way.
Well, naturally I have to start with big data because everybody talks about it, but nobody can exactly say what it is.

You can find many definitions of big data online.

Gartner explains Big Data as

 "Big data is high-volume, high-velocity and/or high-variety information assets that demand cost-effective, innovative forms of information processing that enable enhanced insight, decision making, and process automation."

This a very business-driven definition.

Technology vendors like Microsoft define it as:

“Big data is the term increasingly used to describe the process of applying serious computing power—the latest in machine learning and artificial intelligence—to seriously massive and often highly complex sets of information.”

Lots of tech-talk in a definition about data.

A more straightforward definition is given by big data expert Bernard Marr:

 "Big data refers to our ability to collect and analyze the vast amounts of data we are now generating in the world."

In other words, all definitions above state it's not about the data itself , but the way we utilize the data which is now generated in a huge amount.

Mind you, big data today is small data, or just data, in a few years from now.

Big data, it's just data, only we now learn how to use it.

Friday, October 28, 2016

Big Data, deep learning, neural networks. Blimey, now I'm confused!

Yesterday I went to the BI-Podium Event 'De achterkant van Big Data' in Amersfoort.
A great event with fascinating presentations about big data, technology and even ethics.

Many Dutch companies in data science like Big Data Lab, Xomnia and Many2More were present.
Also a great endnote by Donna Burbank, which was a great boost for starting data scientists.
Awesome to see there are a lot of Dutch Big Data enthousiasts and there was enough time for networking.
Thank you Visser & Van baars Recruitment for this opportunity.

As a data scientist with a bio-informatics and software testing (also lots of analysis) background I was able to follow the presentations.
A lot of terms were not new for me, but is that also true for my fellow SaaS enthusiasts from my TestingSaaS-community?
I  already wondered why big data terms like deep learning, neural networks, machine learning etc.are mostly explained from a marketing (too easy) or development (too technical) viewpoint?
Luckiliy, this was not the case at this BI-Podium Event.

So, I got the idea for a blog series on explaining these big data terms in a straight forward way without the marketing and technical phrases.
This way I want to help new big data enthusiasts not too get scared of all these terms, but give them a starting point to explore this new sexy discipline data science.

That's why I founded TestingSaaS, to explain the world of SaaS in a straightforward way.

Stay tuned for my blog series on Big Data as-it-is!

Tuesday, October 18, 2016

DataOps: Combining data analytics and DevOps

Blogging and writing articles for Fixate is fun!
Next to learning about the Fixate customers like Sumo Logic and PagerDuty it is also exciting to combine IT disciplines in 1 article.
As you already know I am a trained software tester interested in data science.
Fixate is DevOps oriented, so I investigated if data science is related to DevOps.
Well, it is: DataOps.
DataOps is the extension of DevOps values and practices into the data analytics world. The DevOps philosophy emphasizes seamless collaboration between developers, quality assurance teams and IT Ops admins. DataOps does the same for the admins and engineers who store data, analyze data, archive data and deliver data.
In other words, DataOps is all about streamlining the processes involved in storing, interpreting and deriving value from big data. It aims to break down the siloes that have traditionally separated different teams from one another in the data storage and analytics fields.

Great, a story about DataOps, that's old school TestingSaaS.
Well, now it's time for something new. As you already saw in an earlier blogpost, TestingSaaS is resurrected, and I wanted to alter my blog reporting.
I always wanted to develop an infographic and why not now? So for my DataOps article I devised an infographic on DataOps, which can be found here on the Fixate Sweetcode.
It shows the infographic and an accompanying story.
In my opinion, this dual way of reporting attracts two kinds of readers: the visual (infographic) and the text readers.
And for making an infographic you really need to know your theme, otherwise you can't grasp it all in a simple, though elegant, infographic.
So, a lot of advantages.

Have fun reading and if you have any questions or feedback do not hesitate to contact me.

Thursday, September 1, 2016

New challenge: contentmarketing with Fixate

What's going on?

This summer I was approached by Chris Riley from Fixate IO, the content and influencer marketing company for techies. Chris needed help.
He is a big fan of the TestingSaaS social network and wanted me to become a member of the Fixate Influencer Community aimed at DevOps.
This crack community (with the best QA and DEV professionals) is the DevOps contentmarketing engine for Fixate's clients RollbarSumo Logic and a lot of others.

A huge honour

This is a huge honour for me.
I can improve my content marketing skills in my free time, meet great people and still have a daily job.
Ok, I know about software testing, security and forensics, but Fixate IO wants to put time and effort in me and trust my work as a tech blogger and social media community leader.
That's just awesome!!

So, it's time to kick Ass.
Prepare yourself for some informative articles about DevOps, data science, security and software testing.

Thank you Fixate IO. Glad to be of help.

TestingSaaS, he's back!

Sunday, April 24, 2016

time for WhatsApp forensics with R and SQLite

It's the end of April 2016.
Spring is in the Netherlands (still some wet snow, but who cares)

Time for some adventures in computer forensics.
This time I want to combine R and mobile forensics.

And what's the best app for that: WhatsApp !

Hm, but how to get data from WhatsApp into R.
Well, WhatsApp uses SQLite to store its data.
SQLite is an open source, embedded relational database and via R you can examine the data in the SQLite database used in WhatsApp at your mobile phone.

So, follow this week the  #RWhatsApp on the TestingSaaS social network for my deepdive into WhatsApp forensics with R and SQLite.
Feedback is always welcome.

Trust me, it's going to be a fun week!

Sunday, March 20, 2016

When curiosity gets noticed: an interview about my journey in big data

Last year I started my deepdive in big data.
Blogging, Tweeting and following Coursera data science modules gave me a good start.

Well, that was noticed by my Twitter followers.
One guy, Matt Ritter saw my enthusiasm and wanted an interview.
We talked about my curiosity for big data, my journey and the problems I face when dealing with big data.

Matt, thank you for interviewing me and sharing my journey.
A great guy to follow.

Join also my social network for sharing adventures in big data, software testing, SaaS and computer security and forensics.

Enough adventures for a lifetime!

Tuesday, January 26, 2016

Data science and software testing, it's all about the question


When I started my career in software testing I was a biologist without business experience, but I knew how to crunch data through statistics, python and machine learning.
In the last 11 years software testing was my main profession and still is.
But, more and more companies are into Big Data (as a part of data science) and as a biologist, trained in crunching lots of data (genetics, bioinformatics), I got curious.
Is there a way to combine my knowledge of statistics and crunching big data and software testing in today's business?
Sure there is: a lot of methods (statistics, data mining, web scraping) and programming language (R, python) used in data science can also be used in software testing.
Both software testing and data science are empirical studies trying to answer a specific question. The answer to this question can be derived by using tools or methods.
Mind you, don't let the tool or method determine how the answering process proceeds, let the question be the determinant.
Be open minded! Remember a fool with a tool is just a fool.

Data science and software testing

Data science is not just statistics, it is an interdisciplinary field like bioinformatics, combining mathematics, statistics, computer science, information science etc.
Just like Big Data, it's a buzz word, but a data scientist, according to Coursera, has one goal:

Ask the right questions, manipulate data sets, and create visualizations to communicate results.

Well, that's the same in software testing.
Without the correct question, dataset and visualization (report) a software tester can't inform the stakeholder about the state of quality of the object under test.

Now I know testers have tools like Jira, Microsoft Excel and Selenium to help them.
Why should we know about data science then?
Well, as I said before, a fool with a tool is just a fool.
You maybe know how to use many testtools, but the most important thing a tester does is asking the right questions. This triggers the other stakeholders to answer these and this way possible issues are found.
Data science is all about asking the right questions. It can help the tester with creating the question and deriving the testset, even when the testset has missing data. It also learns the tester how to visualize its findings.
Testtools can also do these things, but, in my opinion, a tester should be able to do it himself.
Knowing data science can help the tester to stay critical. 
There are a lot of data science courses online like Coursera or Udacity.
Try a course, it won't be easy, but that's part of the learning.


Software testers can learn from data science to help them in their daily work: ask open minded critical questions, testdata development and processing, testtool selection and visualizing the quality of the object under test.

For me, data science increased my ability to ask the right questions and diminished the fear of going too deep into the data. 
A software tester never should be afraid to ask the right questions to different (!) people, go deep if neccesary and report his/her findings
You have a job to do: Visualize the quality of the object under test, as critical as possible!