K. El Haji

info

Please Note

<p>This page displays the records of the person named above and is not linked to a unique person identifier. This record may need to be merged to a profile.</p>

Bachelor thesis (1)

Master thesis (1)

2 records found

Empirical Study on Test Generation Using GitHub Copilot

Master thesis (2023) - K. El Haji, C.E. Brandt, A.E. Zaidman

Writing unit tests is a crucial task in the software development lifecycle, ensuring the correctness of the software developed. Due to its time-consuming and laborious nature, it is, however, often neglected by software engineers. Numerous automatic test generation tools have been devised to ease unit testing efforts, but these test generation tools produce tests that are typically difficult to understand. Recently, Large Language Models (LLMs) have shown promising results in generating unit tests and in supporting other software engineering tasks. LLMs are capable of producing natural-looking (human-like) source code and text. In this thesis, we investigate the usability of tests generated by GitHub Copilot, a proprietary closed-source code generation tool that uses a LLM for its generations and integrates into well-known IDEs. We evaluate GitHub Copilot’s test generation abilities both within and without an existing test suite. Furthermore, we also evaluate the impact of different code commenting strategies on test generations, both within and without an existing test suite. We devise aspects of usability to investigate GitHub Copilot’s test generations. In total, we investigate the usability of 290 tests generated by GitHub Copilot. Our findings reveal that within an existing test suite, approximately 45.28% of the tests generated by Copilot are passing tests. The majority (54.72%) of generated tests in an existing test suite are failing, broken, or empty tests. Furthermore, tests generated by Copilot without an existing test suite are less usable compared to those generated within an existing test suite. The vast majority (92.45%) of these test generations are failing, broken, or empty tests. Only 7.55% of tests generated without an existing test suite were passing, and most of them provided less branch coverage when compared to human-written tests. Finally, we find that tests using a code usage example comment resulted in the most usable generations within an existing test suite. In contrast, when there is no existing test suite, a comment combining instructive natural language combined with a code usage example yielded the most usable test generations. ...

Synthetic Waste Generator for Classification Training

Bachelor thesis (2020) - Khalid El Haji, Noah Posner, Hakan Ilbaş, Sergen Karpuz, Victor Wernet, Lydia Chen

As the population increases so does the waste that is generated. Manually recycling waste is expensive and slow. Computer Vision (CV) solutions aim to make this less expensive and faster. Lots of data of this waste (thousands of images) is needed to train these CV solutions. This project, called Synthetic Waste Generator (SWaG) can create synthetic waste data through the use of Blender and Python. Moreover, this project makes a contribution to the current state of research by having developed an automated synthetic data generation pipeline. This synthetic data can be used to train CV solutions to enable automated recycling procedures. With the help of adjustable parameters, the synthetic data can be customized, such that different unique images of waste can be created deterministically based on a seed. Furthermore, SWaG is fully portable as it has been containerized using Docker which makes it extremely easy to obtain even faster results by running SWaG on an NVIDIA GPU enabled system as a single local container, on the cloud as a farm or incorporate it in a container-orchestration system such as Kubernetes. SWaG also crushes 3D models, to mimic real waste using soft body dynamics. The pipeline has also been suited to automatically generate COCO datasets by using masking and image segmentation techniques. SWaG can also add textures and different colors to the waste objects in the synthetically created image. Furthermore, with SWaG different conveyor belt setups at recycling plants can be simulated with the help of variable camera heights, conveyor belts, backgrounds and lighting conditions. SWaG is currently deployable and is being used and built upon by our client. After conducting empirical research experiments with SWaG, it is noted that its performance speed is linear as the amount of objects that are in a given scene increases. In fact, with between roughly 40 and 80 objects SWaG performs sub-linearly. This is an important performance criteria as images of trash on the conveyor belt often have tonnes of objects pilled up on top of one another. ...