Leaders Shaping the Digital Landscape
July 25, 2024

The Art of Flaky Tests: Turning Uncertainty into Precision

In software development, bugs often hide when you look for them. Many have experienced a three-hour bug hunt that leads nowhere and causes doubt in skills.  This conversation between Wade Erickson and Filip Hric, DevRel at Replay.io , covered some of the most annoying bugs and test flakes encountered as an SDET, presenting a process to make debugging faster and even enjoyable. Tips were shared on how to narrow down a bug or test flake to identify the root cause.

In this live video interview, Wade Erickson and Filip Hric, DevRel at Filip Hric, delve into the frustrating world of debugging. They share their experiences with some of the most annoying bugs and test flakes encountered as SDETs, presenting a process to make debugging faster and more enjoyable. Learn tips on narrowing down a bug or test flake to identify the root cause.

Key Takeaways:

  • Techniques to streamline the debugging process
  • Strategies for identifying the root cause of persistent bugs
  • Tips to make debugging a less frustrating and more enjoyable task
Transcript

Wade Erickson (00:14):

Welcome all to another episode of Tech Leaders Unplugged. I'm Wade Erickson, I'll be your host today. And we have Philip Ritz from yes, I think I got it. And he is a consultant, speaker, and tester. Worked with replay.io of recent. And our topic today is the Art of Flaky Tests. And so thanks so much, Philip, for getting you know, unplugged with us today in our community and talking test testing. And yeah, introduce yourself and then we'll jump into the topic. And I wanna hear what flaky tests are. I, I've been in testing a while, and that's a new term for me. So let's talk a little jump into that, get a little definitions going, and then we'll just talk a little bit about testing.

Filip Hric (01:02):

Thank you. Thank you for having me. Thank you for the lovely introduction. As it was said, my name is Phillip. I've been in testing for almost a decade, and my main focus was always either like QA or test automation. And as year passed by, I started to dabble a little bit more into Cyprus and started teaching it and doing blog posts and stuff like that. So that's, that's where, if anyone in the audience knows me, that's probably where they know me from, from the area of test automation in Cypress. 'cause I've been a big enthusiast of this tool and and having, sharing my, sharing my knowledge with the, with the community. So yeah. The, the title is very very recent as a consultant and speaker, although I've been always doing that. If you were asking me two weeks ago on what my job title would be, that would be Dere at replay io, which is a company I used to work with.

Filip Hric (02:18):

As of very recently right now we actually had a, had a layoff. So I, I was let go of the company among some of my friends and colleagues. Unfortunately it's startup life. It's very hard right now for startup. So I'm assuming like, there, there was a very hard decision to make assuming I believe that the decision was very hard to make make, and I know it was hard to make. So no hard feelings there. But yeah, currently I'm, I'm on a, on a job search. So if you're out there, if you have something interesting and you would make use of someone like me let's talk, I'd be, I'd be happy to to talk to you. Great.

Wade Erickson (03:04):

Let's get into what's a flaky test? So let's talk about the topic and and, and I know that is tied to replay dot. I, 'cause I went to the website and that people were in their comments, we're talking about this, so it seems to be a concept for sure over there. But why don't we talk a little bit about that and then let's jump into a little bit about your work with open source testing, Cyprus and where things are kind of going in not only that open source tool, but maybe some of the, the, the new frontier with AI and stuff like that. Yeah. And how that's affecting open source and open source tools and hosting of open source tools and all that. If you have any kind of insights to those,

Filip Hric (03:46):

That's a, that's a wide range of topics, so I'll better take it one at a time. <Laugh>, that's great.

Wade Erickson (03:52):

That's fine.

Filip Hric (03:52):

So when it comes to flaky tests, right there is a, there, there are many things that people, that comes to mind of people when we say flaky tests, but in general, I think like the best, the closest definition to something being flaky or in other words, unstable, is when you have a test, whether that it's end-to-end test, unit test API test, or whatever kind of test you have, and you run it and you run it a couple of times, and without any known change, it is not giving you the same result, right? So sometimes it passes, sometimes it fails. Some define flaky test as a situation where a test fails just once and it was previously passing, right? I think that's also a very good definition because in ideal world, if there is no known change the test. If you run it a thousand times, it should give you the same result a thousand times.

Filip Hric (05:01):

If for the thousand and first time it gives you another some other result it fails or something, then you could probably flag that as as flaky or in stable. So in my previous job replay, this was a hot topic for us because we were working on a on a browser that could record everything that's inside it as well as the runtime of the test and of the application that's that's being under test. So if you had a situation where your test would become flaky, you could actually go back to that because you could have that recording available and take a look one line by line into what, what is going on, right? And we are <inaudible> know, right? This is a, this is a situation we don't like to face, right? The thing when you write your test locally, then you run it on ci and for some reason it started, started failing. Ha. Has that ever happened to you, Wade?

Wade Erickson (06:06):

Yeah, of course. The test, every, everything can happen and no, nothing's predictable. A hundred percent. Yeah.

Filip Hric (06:11):

Yeah. And, and you know, like oftentimes what happens on CI stays on ci, right? <Laugh>. So nowadays we have moved to the world where we have all kinds of different tools that can help us you know, take the information from CI and examine it, which is a step in good direction in my opinion, especially when we, when we talk about end-to-end tests, right? If you think about end-to-end tests, it usually, you usually have some sort of user journey, right? If you have like you, you need to define what are the ends of end-to-end tests. But if we're talking about UI tests as such as written with selenium Cypress playwright, you usually try to mimic some user behavior, right? Like if you have a to-do app, then the test would open the to-do app, add a new item, hit enter, and then see that to do item appear, right?

Filip Hric (07:15):

That could be an example of a, of a simple, simple test, right? So when you have that user journey and you have a failure of your test, then that failure sometimes doesn't give you enough information, right? For example, element was not found well, which was the element that was not found. Why wasn't it found? Was it covered by other element or did it not appear at all? Was it in the dome, but it actually had opacity set to zero? Or did something happen way before, right? Did, did the page not not load, right? Or was it covered by a loading animation or something else? Right? So whenever we are writing tend end-to-end test, we need more context. And as I mentioned, tools nowadays provide us like various amount of information about that. So Cypress has recently come up with their test replay functionality with with their Cypress cloud, where you can go back and examine what the test was actually doing as it was running on ci.

Filip Hric (08:32):

Playwright has had their trace viewer, which is again, like very robust, very useful tool that can give you the information of on what the test was doing and how your application looked as as it was being tested. So all these tools are like really helpful in order to, to make our tests more stable. And when I say make our tests more stable, the, the really nice appeal of of replay was that you would not only consider yourself with what the test is doing, but also with what the application is doing. And I think, yeah, that was that was really exciting. But yeah, unfortunately the, the company decided to go some some other direction. But still, I'm still optimistic about the, the future of debugging our tests because it seems like most of the tools that we have now, like Cypress and playwright are moving in the, in the right right direction when it comes to your ability as a tester to, to take a look into it. And I realize this is probably a way longer <laugh> answer than, than you wanted. No, no,

Wade Erickson (09:57):

That's

Filip Hric (09:57):

To the question of what is flaky tests, but I also sort of painted picture of like where we are today and and what's the day-to-day of of a, of debugging a flaky test. Right, right. So testing right now,

Wade Erickson (10:13):

So the way I'm reading this is that, you know, in general it just says that you know, it's a test that is not failing in a binary way that either Yeah. You know, where it's at. So it's a black and white. This is the gray area where your tests fail, and your tool sets are not giving you proper feedback to help you trace to get to the, the, the root of the cause, which may or may not even be the platform you're testing. It could be the test itself had, you know some changes or something. So

Filip Hric (10:44):

Yeah. And also if, if you think about it, if you have a test that you need to debug or you need to spend time maintaining, suddenly your test automation as a tool you are trying to use to help your company deliver new features, it is failing, right? Because why we, why do we write test automation to reduce human time Exactly. On on the right. Yeah. So if your test automation is randomly failing and you need to spend a lot of time debugging it, your test automation as a concept is failing. So there are like lots of discussions on like, which is the best tool that you should use? And you have all these benchmarks. You can, you can compare and is, is playwright faster than Cypress? Is cypress faster than Selenium? Is web driver io beating them all, et cetera, right?

Filip Hric (11:53):

It literally doesn't really matter, right? If your, if your tests are stable and are passing and, and only failing when there's a genuine problem, then it, it doesn't really matter that much if they run in 30 minutes versus 50 minutes, right? That's a very small difference. But if your tests are super fast, let's say they're, they're, the whole test run takes five minutes and you have thousands of tests, right? If you have thousands of tests and 400 of them randomly fail, now you're going to spend lots of your upcoming weeks trying to figure out what the hell is going on. So now it has nothing to do with test automation is not saving you time, it is making more work for you. Exactly.

Wade Erickson (12:44):

Yeah. So, so the quality of your testing itself could be an impediment to the efficiencies and productive productivity that you're trying to resolve and improve with the use of automated testing. Yeah.

Filip Hric (13:00):

Right, right. Exactly. So,

Wade Erickson (13:01):

So let's talk a little bit about the open source community. 'cause It seems, you know, you're bringing up playwrights Cyprus, selenium, all the open source suites. And of course there's a lot of commercial tools out there and you know, I'm closely tied to some of the big ones, and they obviously are going after AI in a big way, spending a lot of money, adding AI to test case design using against, you know, user stories all the way to low-code, no-code tools like we have that are you know, using reusable test libraries to then maybe build that mid 80, 80% of that test script based on, you know, how it observes the UI and suggests, you know, that's still a few years out. The open source community obviously does not have a, you know, the same kind of financial support in a lot of cases that those commercial tools have. So since you're close to Cyprus, what, what are you seeing in the open source community as they are trying to stay you know, in parallel to the feature sets that the commercial tools have?

Filip Hric (14:20):

Yeah, that's, that's an interesting question. I, I think there is a different slightly different value of what the open source stream sort of provides to the, to the community versus what the commercial tools are trying to provide. Like, there are definitely areas where they want to be in parity with with each other. And I think that that AI might equalize some of the areas and might make the gap bigger in, in some, right? I think one of the interesting things that, that's coming from from this whole AI wave is that some of the tools get integrated into other tools and some of the tools are going to fall behind. So let me, let me give you an example, right? So I've I'm friends with Jonathan Canales. We were doing a livestream couple of weeks ago. He's from C Checkley. Checkley is an interesting tool. It's it's providing synthetic monitoring and and you can use your tests as sort of a monitoring check on your website. I think that's really interesting concept. What they did is that they integrated playwright into into their tooling, right? So you can run your playwright test as sort of a check on your production website to make sure that it is live right? Sort of like, like similar tools have

Wade Erickson (16:16):

That kind of stuff. Yeah,

Filip Hric (16:17):

Yeah, yeah. So like, there are some similar tools that are out there that can do monitoring for you. Usually they do that by sort of some sort of API ping, right? Mm-Hmm. <affirmative>. But, but Checkley does that with, with a test as well, and they provide more, more features as well. But the thing I wanted to point out is that they have integrated playwright into their product because there is a way of how you can how you can use it in a, how you can sort of integrate it. You can easily integrate playwright into whatever code, whatever thing you are building, right? Maybe you heard about rabbit ai, like the small orange box. It was a product that you could sell and it would like you can order an Uber or do whatever with it, and it, and it would answer your questions.

Filip Hric (17:16):

And, and of course it was almost a scam, right? Right. But but what w what it was doing was using playwrights to do like some automated tasks for you. So if you try to try to order Uber, what would actually happen in the background? What was that? It would use playwright to like click through real Uber page which of course didn't work because as soon as Uber updated their page, that functionality was bust. The thing I'm trying to say here is, is that when you have a tool that easily integrates into all of the other tools you can also integrate it to the new tool we have here, which is ai, right? If I were to compare Cyprus and playwright in this regard, I think that Cypress has a sort of harder time to get integrated into, into something.

Filip Hric (18:15):

And that's because like the architectural differences between these two tools, right? Playwright uses Chrome devs protocol to integrate with the browser sort of from outside. And Cypress does that pretty much like the other way they open the browser injects the Cypress code inside it, and then the test automation happens inside there, which is also like an interesting concept and gives you some, some advantages. So again, I, I sort of went off the rails with the, with the with the AI question. The way I, I see AI right now, I think there's a lot of noise out there, is really hard to find the good resources. But one of the things that I have found recently was this interesting article about sort of a new generation of AI developers, like in, in couple of years, there will be a lot of AI developers because you don't need to be like a data science person anymore to work with AI or know a lot about machine learning and all kinds of very complicated models.

Filip Hric (19:36):

Because now with with open AI and, and other like companies coming in, you pretty much have an API endpoint, which you can call, you can give it context, right? And get information back from all of these large language models. So in a way, I, I see AI as being sort of an accelerator of what we do, right? At the end of the day, we're either shipping products that people are going to use, that they're are going to make their lives better or not. And if AI can sort of speed that up and help you either deliver the software or make a useful service for, for your customers, then I think that is going to be an overall improvement. But as of right now, there's, there's a lot of there's a lot of noise.

Wade Erickson (20:32):

Yeah, I agree. I agree with that for sure. These product companies though, are obviously very challenged by the customer base. You know, because the customers are saying, how are you applying AI knowing it's such a key piece? And if you don't have, have a good answer for that <laugh> and the, the, the conversation might end pretty quickly. So you know, I think in for the product folks, it, it is definitely a it's an in a unique time where normally you don't have these kind of broad changes to the it, you know, spectrum. Maybe like, oh, Microsoft releases some new de net platform, or, you know, or, you know, Java now becomes a a price to JVM from Oracle. So <laugh>, there's a movement away from it. And, and you have to now change your tools to use the open JVM.

Wade Erickson (21:27):

You know, those kind of things are bumps on the road. This was a, a a, a wave that was, everybody's been caught in and you really have no way to no choice but to address it within your product lines, for sure. Alright, so you know, we got about five minutes left here. I just wanted to kind of pivot a little bit. So, you know, you're a speaker and a consultant. Sounds like you've been doing that in parallel to your job when you even have the, you know, we're working at replay, replay io. So tell me a little bit about the, you know, kind of this gig economy model where you do that. And if you could give some insights to some people who have really strong, you know capabilities and talents in a certain niche. 'cause That's when speakers, you know, the ri riches is in the niches, right? That's what they say. Mm-Hmm. <affirmative> and, you know, yours is in Cyprus and testing and stuff. What advice would you give to some people that are really looking to kind of expand a little bit their kind of, their, their brand in a certain area? You know, is it broadcasting, is it blogging still? Or, you know, what would you give them for advice to kind of get into this to build themselves as a, you know, thought leader in the space? And then then we'll kind of wrap up.

Filip Hric (22:50):

Yeah. it's a, it's a good question. Like, sometimes I try to reflect back and try to see and understand what was my journey and how it started, and like how it, how it progressed. I was lucky enough to discover or like learn about the concept of learning in public pretty soon. And I found that idea to be really appealing. I, I like the, I, I pretty much like the idea of learning in public. So I'm learning something and I'm writing about the thing that I, that I've been learning and sharing that with the internet. As I go through that journey, I have realized that,

Wade Erickson (23:44):

By the way, that is kind of a unique aspect to the internet, because prior to that, you had to be an expert to write the textbook.

Filip Hric (23:53):

Oh, yeah. You

Wade Erickson (23:53):

Can be a learner and write as what you're learning because other learners still can benefit from your journey in that area and say, Hey, these are some things I found out because it's, it's a similar process and, and they can kind of, you know, tag along on your journey and, and yeah. And learn along the way, which wasn't the case before, you know?

Filip Hric (24:16):

Yeah, exactly. Exactly. And

Wade Erickson (24:18):

You had to be an expert to get a publishing contract, and then you had to write a book and you, you know, it was critically reviewed and now there's so much content out there, podcasts and blogging and stuff that you can share while you're on the journey.

Filip Hric (24:33):

Yeah, exactly. And you can be like, more granular, because you don't need to write a whole book. You can write a simple blog post. Right. And I think that's really, really great, really powerful. One thing that I, I, I saw, and I still remember that tweet like, I don't remember the exact details, but it was about the fact that there's a video somewhere on YouTube about how to unzip a zip file, right? And it has like millions of views, <laugh>, and you may laugh at that, laugh at that, but there are people that need to have that problem solved because they have never done that, right? Right, right. And as I go through my learning journey and I learn new stuff this week, I have learned something I didn't know last week, right? Mm-Hmm. <Affirmative>. So sharing what I learned this week would be very useful for me last week.

Filip Hric (25:40):

So anything you learn, anything you find out about is is going to have value for someone. Like there are so many people out there searching for stuff, right? And, and it's also going to be valuable for you as person who is on that learning journey, right? Because in order to explain something, you need to have slightly better knowledge about that. So right now I have I have an upcoming workshop about Cyprus. As I was creating that Cypress workshop, I learned so much about like, the little details I didn't know. Plus, when I had people on the workshop asking me about different stuff, that's where I found out even more things and not only about the topic of Cypress, but also about how to actually give you the information in a way that you will understand it. So it's, you learn a lot by, by teaching.

Filip Hric (26:46):

And also you learn a lot about teaching when you teach. So I think that's, that's that's really great. And like, it doesn't, like when you write your first blog post, it doesn't look like much, right? 'cause A but as you progress, it adds up. And writing those blog posts that I did, and sharing the videos and, and, and social media updates and stuff like that, that was the thing that helped me get into conferences that that got, got me to network with interesting people. So I personally found this journey to be very, very fulfilling. And I know that there are people out there who feel like, maybe this is not their way, and maybe it isn't, but I'm, I think you will be sure when you try it out. So my encouragement to anyone who's listening and their sort of thinking on whether they should be doing this kind of stuff, I very much encourage you to, to try it out and to see if that works for you. And it, and if it's and if it feels good. And if you have any trouble starting, I think there was my LinkedIn somewhere at the bottom of the page, reach out to me and talk to me. I'd be happy to help you out with the, with the first steps.

Wade Erickson (28:17):

Great, great way to end the show.

Filip Hric (28:19):

<Laugh>, you know,

Wade Erickson (28:20):

Providing value is I think very fulfilling. It's what's why we do this show you know, and

Filip Hric (28:28):

Yeah, kudos to you,

Wade Erickson (28:29):

You know, and to, to, you know, have guests like you to, to share with the community. You just never know who's gonna stumble across the, the recordings and you know, find them you know, interesting to, to jump out on their own and, and, and start, you know, podcasting or sharing their knowledge. And you know, it's, I I think it's, you know, when you, when you help others, it, it starts to bring different kind of purpose to your life, you know? Yeah. And, and, and like you said, teaching the best way to learn something is to teach it, because those students are gonna ask questions that you didn't ask yourself.

Filip Hric (29:10):

Yeah, exactly. Yeah.

Wade Erickson (29:13):

Alright, well let wrap up on the next week. I'm gonna give next week's guest here is Dmitri Ru I'm gonna butcher the name, last name Nic founder of report portal.io, head of test products and director at EAM systems. Topic's gonna be AI and software testing, practical insights from test analysis to AI agents. And that'll be next week. And go to the events page on LinkedIn or, or your favorite social media that we share this on YouTube and others, Twitter and or XI guess it's called them <laugh>. So, but anyway, thank you so much for your time and sharing your testing knowledge with our community. And yeah, have a great rest of your week and everybody else we'll see you next week.

Filip Hric (30:00):

Thank you. Bye everyone.

Wade Erickson (30:02):

Bye-Bye.

 

Filip Hric Profile Photo

Filip Hric

DevRel at Replay.io

Filip Hric is a DevRel at Replay.io He teaches testers about web development and developers about testing.

Filip is a Cypress.io ambassador, leads a “Learn Cypress.io” community on Discord and he has a blog at filiphric.com where he publishes Cypress.io tips.

He’s an international keynote speaker and leading expert on test automation in Cypress.io. As author and instructor of live Cypress workshop, he has taught hundreds of testers and developers about good practices and advanced concepts for testing in Cypress.

Enjoys running, playing guitar and spending time with his wife and four children.