paint-brush
7 Open Source Projects Every Data Scientist/Analyst Needs to Bookmark ๐Ÿš€โ€‚by@azizepalali
1,006 reads
1,006 reads

7 Open Source Projects Every Data Scientist/Analyst Needs to Bookmark ๐Ÿš€

by Azize Sultan Palali
Azize Sultan Palali HackerNoon profile picture

Azize Sultan Palali

@azizepalali

Passionate about data storytelling and enhancing customer experiences through data...

January 24th, 2025
Read on Terminal Reader
Read this story in a terminal
Print this story
Read this story w/o Javascript
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

Data analysts rely on tools and platforms just as crucial to our daily workflows. Iโ€™ve put together a list of open-source projects that can speed up your workflows, simplify your data processes, and maybe even spark new ideas. Youโ€™ll find both popular tools and some lesser-known gems with big potential.

People Mentioned

Mention Thumbnail

Madza

@madzadev

Companies Mentioned

Mention Thumbnail
Apache
Mention Thumbnail
GitHub
featured image - 7 Open Source Projects Every Data Scientist/Analyst Needs to Bookmark ๐Ÿš€
1x
Read by Dr. One voice-avatar

Listen to this story

Azize Sultan Palali HackerNoon profile picture
Azize Sultan Palali

Azize Sultan Palali

@azizepalali

Passionate about data storytelling and enhancing customer experiences through data science and analytics.

Learn More
LEARN MORE ABOUT @AZIZEPALALI'S
EXPERTISE AND PLACE ON THE INTERNET.
0-item
1-item
2-item

STORYโ€™S CREDIBILITY

Guide

Guide

Walkthroughs, tutorials, guides, and tips. This story will teach you how to do something new or how to do something better.

News

News

Hot off the press! This story contains factual information about a recent event.

Original Reporting

Original Reporting

This story contains new, firsthand information uncovered by the writer.

While reading @madzadev โ€˜s article "9 Open Source Projects Every Developer Needs to Bookmark for Their Workflow", I realized how useful it would be to create a similar list specifically for data scientists/analysts. Inspired by that idea, Iโ€™ve put together a list of open-source projects designed to make our workflows faster, data processes smoother, and maybe even spark some fresh ideas. This list includes both well-known tools and a few hidden gems that have big potential.


I hope you may find something thatโ€™s helpful and inspiring for you. Leeetโ€™s dive in! ๐Ÿš€๐Ÿš€

1. Streamlit - Interactive Dashboards ๐Ÿ
image

Streamlit is an open-source Python library for creating interactive web-based data applications quickly and easily. Thanks to their community, you can also use templates or ask your questions.


๐Ÿ‘จโ€๐Ÿ’ป GitHub Repository: Streamlit Repository

๐ŸŒ Website: https://streamlit.io/


2. Superset - Data Visualization and Exploration ๐Ÿ“ˆ

image


Apache Superset is an open-source, highly customizable bi-tool ideal for us, offering SQL-based exploration and integration with various databases, but requires more technical expertise. In contrast, tableau, powerbi, and Data Studio provide user-friendly interfaces, advanced analytics, and ready-to-use features for non-technical users, though they come with licensing costs (except Data Studio, which is free ๐Ÿ’ฐ )


๐Ÿ‘จโ€๐Ÿ’ป GitHub Repository: Superset Repository

๐ŸŒ Website: https://superset.apache.org/



3. DVC - Versioning for ML Projects ๐Ÿ‘ฉ๐Ÿปโ€๐Ÿ’ป

image

DVC brings git-like functionality to datasets and machine learning models, making projects more reproducible and manageable. They also have a perfect technical guide and community for themselves.


๐Ÿ‘จโ€๐Ÿ’ป GitHub Repository: DVC Repository

๐ŸŒ Website: https://dvc.org/


4. Great Expectations - Data Validation and Documentation ๐Ÿ“ƒ

image


Great Expectations is your go-to tool for making sure your data is clean, reliable, and ready to use. It automates data validation with customizable tests, so you can catch issues before they become problems. If you care about data quality and trust in your analysis, this tool is a game changer.


๐Ÿ‘จโ€๐Ÿ’ป GitHub Repository: Great Expectations Repository

๐ŸŒ Website: https://greatexpectations.io/


5. Dask - Parallel Computing With Python ๐Ÿ

image


Dask is like Pandas and Numpy on steroids, built for handling massive datasets that donโ€™t fit in memory. Itโ€™s perfect for scaling your data tasks, whether youโ€™re working on your laptop or a big cluster. If you need speed and power without learning a whole new tool, Dask has got you covered.


๐Ÿ‘จโ€๐Ÿ’ป GitHub Repository: Dask Repository

๐ŸŒ Website: https://www.dask.org/


6. Haystack โ€“ Build Search Systems With NLP ๐Ÿ”

image

Haystack is your go-to tool for creating intelligent search and question-answering systems. It lets you connect LLMs and other NLP models to your own data, making it perfect for building domain-specific applications. Whether itโ€™s semantic search or document retrieval, Haystack gives you the tools to get it done efficiently.


๐Ÿ‘จโ€๐Ÿ’ป GitHub Repository: Haystack Repository

๐ŸŒ Website: https://haystack.deepset.ai/



7. Logseq - Open Source Knowledge Management ๐Ÿ“š

image


Logseq is an open-source tool that feels like a digital brain for organizing your notes, tasks, and ideas. Itโ€™s built around a clean outliner and bi-directional linking, making it perfect for connecting thoughts and tracking your workflows. If you love structure and flexibility in your knowledge management, Logseq is a must-try.


๐Ÿ‘จโ€๐Ÿ’ปGitHub Repository: Logseq Repository

๐ŸŒ Website: https://logseq.com/


Thank you for your time; sharing is caring! ๐ŸŒ

L O A D I N G
. . . comments & more!

About Author

Azize Sultan Palali HackerNoon profile picture
Azize Sultan Palali@azizepalali
Passionate about data storytelling and enhancing customer experiences through data science and analytics.

TOPICS

THIS ARTICLE WAS FEATURED IN...

Permanent on Arweave
Read on Terminal Reader
Read this story in a terminal
 Terminal
Read this story w/o Javascript
Read this story w/o Javascript
 Lite
Hackernoon
Bsky

Mentioned in this story

profiles
X REMOVE AD