Skip to content
View msznajder's full-sized avatar

Organizations

@c-labpl

Block or report msznajder

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this userโ€™s behavior. Learn more about reporting abuse.

Report abuse
msznajder/README.md

Hi there. I'm Michaล‚. ๐Ÿ‘‹

I'm a Lead Data Scientist, Deep Learning Engineer and NLP Engineer.

As a Lead Data Scientist, I apply my expertise in deep learning, natural language processing (NLP) and large language models (LLMs) to build and deploy state-of-the-art solutions. I have over 10 years of experience in developing and delivering machine learning and deep learning products with global reach out that use structured and unstructured data at petabyte-scale. I am a published author in Nature and Springer journals, a multiple patents owner, and a conference speaker.

I am passionate about working with the latest advancements in NLP and LLMs, such as transformers networks, pre-trained models like Llama 2 or Mistral, fine-tuning techniques, RAG and more. I use Python, PyTorch, TensorFlow, HuggingFace Transformers, PEFT, and TRL as my models development toolkit, and GCP Vertex AI and BigQuery as my cloud platforms.

I enjoy conducting end-to-end research projects, solving open-ended problems with a scientific approach, and communicating the results. I am also a product-oriented technical problem solver, keeping the business value perspective always in mind. I am experienced in project management and capable of handling multiple projects, priorities, or products simultaneously. I am a technical leader who helps others to deliver and learn, always a team player who cares for others, and a believer that only good teams can do great things.

๐Ÿš€ Personal Projects

  • Mistral 7B model parameter efficient fine-tuning for dialogue summarization with LoRA - I have implemented a parameter efficient fine-tuning (PEFT) of Mistral-7B-Instruct-v0.2 base model for dialogue summarization task using the samsum dataset using LoRA technique. In this experiment my goal was to find out how well can decoder-only model learn to perform better in a seq2seq type of task like summarization. The fine-tuned model turned out to perform really well.
  • Mathematical Transformer - building and pre-training mathematical LLM from scratch - My goal in this project is building mathematical transformer architecture from scratch and training it to learn basic integer based mathematics like addition, subtraction and multiplication. I implemented the original "Attention Is All You Need" paper architecture in PyTorch and expand from there. Can tokens be used for math calculations and inference? How well embeddings can represent numerical values and their relations? I want to verify how to expand these in order to create more symbolic-like math representations with a flavor of Wolfram Mathematica or Theano.
  • RAG-based text documents Q&A chat using Mistral 7B, ChatGPT and LangChain - I have built a simple RAG demo where the LLM answers questions regarding the set of external text files. RAG stands for retrieval augmented generation. It works by retrieving external documents and using them when executing queries to the LLMs. Text files served as input data and two LLMs to compare their performance: commercial ChatGPT API and open-source Mistral 7B. Finally LangChain was used to connect it all into a RAG application.
  • Classics with LLMs - fine-tuning BERT for Iris dataset classification task - Ever wondered how new generation of LLMs handle classical ML tasks? Goal of this series of experiments is to verify how LLMs perform on classical datasets. This time we'll see how fine-tuned BERT handles the Iris classification task dataset.
  • An open-source ECG signal QRS complex pattern recognition Python module - I am creator and maintainer of an open-source ECG signal QRS complex (ECG heart contraction marker) pattern recognition Python module. I have implemented the algorithm, adjusted it to work with both real-time data stream and offline datasets, prepared usage documentation and assisted users helping solving technical issues. It is used by dozens of research teams globally for various scientific projects.

โšก Skills And Technologies

๐Ÿค– Machine learning and deep learning

  • Models research, development and production deployment
  • Models architectures, data preprocessing, feature engineering, hyperparameters tuning, loss functions and performance metrics

โœ๏ธ Natural Language Processing (NLP) and Large Language Models (LLMs)

  • Architectures: RNN, LSTM, seq2seq, Transformers, etc.
  • Pre-trained models: Llama 2, Mistral, GPT, BERT, T5, etc.
  • Techniques: fine-tuning, PEFT (LoRA, QLoRA, Prompt Tuning), RAG, prompt engineering
  • Tasks: sequence and token classification, translation, summarization, question answering, language modeling, dialogue, etc.

๐Ÿ—๏ธ Models development toolkit

  • Python, NumPy, pandas, matplotlib and scikit-learn
  • PyTorch, TensorFlow
  • HuggingFace Transformers, PEFT, TRL, LangChain

โ˜๏ธ Cloud models development and data acquisition

  • GCP Vertex AI models development and production deployment
  • GCP BigQuery and SQL used with petabyte-scale data sets
  • MLOps models deployment technologies and pipelines

๐Ÿง  Problem solving and research

  • Problem-solving with a creative, innovative and logical approach
  • Conduct scientific research individually and collectively
  • Written and verbal communication and reporting skills, with the ability to explain complex technical concepts to non-experts

๐Ÿ“’ Team and project management

  • Technical leadership of ML and DL projects
  • Project management and organizational skills for handling multiple projects
  • Team player with natural team-building skills

Pinned Loading

  1. mistral-7b-samsum-dialogue-summary-finetune mistral-7b-samsum-dialogue-summary-finetune Public

    Fine-tuning of Mistral-7b for dialogue summarization

    Jupyter Notebook

  2. mathematical-transformer-from-scratch mathematical-transformer-from-scratch Public

    Building mathematical transformer from the first principles

    Jupyter Notebook

  3. rag-mistral-chatgpt-langchain-text-docs-chat rag-mistral-chatgpt-langchain-text-docs-chat Public

    RAG-based text documents Q&A chat with Mistral 7b, ChatGPT and LangChain

    Jupyter Notebook

  4. classics-llms-finetuning-iris-classification classics-llms-finetuning-iris-classification Public

    Fine-tuning BERT LLM for traditional Iris classification dataset just for fun - and science

    Jupyter Notebook

  5. c-labpl/qrs_detector c-labpl/qrs_detector Public archive

    Python Online and Offline ECG QRS Detector based on the Pan-Tomkins algorithm

    Jupyter Notebook 175 89

  6. Estimote/iOS-Indoor-SDK Estimote/iOS-Indoor-SDK Public archive

    Estimote Indoor SDK for iOS

    Objective-C 484 136