TF-IDF-Model

Retrieve Information from Text Documents with TF-IDF model and dimention reduction with (Latent Semantic Indexing)LSI.

Tech 🛠️ Languages and Tools :

Python : Popular language for implementing Neural Network
Jupyter Notebook : Best tool for running python cell by cell
Google Colab : Best Space for running Jupyter Notebook with hosted server
Requests : Simple HTTP library for accessing APIs and websites
Polars : Fast DataFrame library for efficient data processing
Numpy : Best Library for working with arrays in python
MatPlotLib : Library for showing the charts in python
Scikit-learn : Essential ML toolkit for training and evaluating models

Run the Notebook on Google Colab

You can easily run this code on google colab by just clicking this badge

Dataset

This dataset named LISA and i modified them into three files (easy for working) :

Documents.txt (Documents are stored here)
Queries.txt (Queries are stored here)
Result.txt (Real related results are stored here)

Download Dataset

you can use this modified Dataset with clicking this badges :

Documents :

Queries :

Result :

or Download the raw dataset :

Raw Dataset :

Frames of Dataset

Here is part of the Documents raw text :

Here is part of the Queries raw text :

Here is part of the Real Result raw text :

Here is part of the Documents frame :

Here is part of the Queries frame :

Here is part of the Real Result frame :

PreProcess

Clear garbage charachters and digits
Lower all alphabet charachters
Tokenization
Word Counting
Show Zipf Law
Calculate stop and steⅿⅿing words
Remove stop and steⅿⅿing words

Result

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
Code		Code
Dataset		Dataset
Pictures		Pictures
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TF-IDF-Model

Tech 🛠️ Languages and Tools :

Run the Notebook on Google Colab

Dataset

Download Dataset

Frames of Dataset

PreProcess

Result

License

About

Releases

Packages

Languages

License

AsadiAhmad/TF-IDF-Model

Folders and files

Latest commit

History

Repository files navigation

TF-IDF-Model

Tech 🛠️ Languages and Tools :

Run the Notebook on Google Colab

Dataset

Download Dataset

Frames of Dataset

PreProcess

Result

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages