Skip to content

Retrieve Information from Text Documents with TF-IDF model and dimention reduction with (Latent Semantic Indexing)LSI.

License

Notifications You must be signed in to change notification settings

AsadiAhmad/TF-IDF-Model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TF-IDF-Model

Retrieve Information from Text Documents with TF-IDF model and dimention reduction with (Latent Semantic Indexing)LSI.

Tech 🛠️ Languages and Tools :

Python  Jupyter Notebook  Google Colab  Request  Polars  Numpy  MatPlotLib  Sci-kit Learn 
  • Python : Popular language for implementing Neural Network
  • Jupyter Notebook : Best tool for running python cell by cell
  • Google Colab : Best Space for running Jupyter Notebook with hosted server
  • Requests : Simple HTTP library for accessing APIs and websites
  • Polars : Fast DataFrame library for efficient data processing
  • Numpy : Best Library for working with arrays in python
  • MatPlotLib : Library for showing the charts in python
  • Scikit-learn : Essential ML toolkit for training and evaluating models

Run the Notebook on Google Colab

You can easily run this code on google colab by just clicking this badge Open In Colab

Dataset

This dataset named LISA and i modified them into three files (easy for working) :

  • Documents.txt (Documents are stored here)
  • Queries.txt (Queries are stored here)
  • Result.txt (Real related results are stored here)

Download Dataset

you can use this modified Dataset with clicking this badges :

Documents : Static Badge

Queries : Static Badge

Result : Static Badge

or Download the raw dataset :

Raw Dataset : Static Badge

Frames of Dataset

Here is part of the Documents raw text :

Here is part of the Queries raw text :

Here is part of the Real Result raw text :

Here is part of the Documents frame :

Here is part of the Queries frame :

Here is part of the Real Result frame :

PreProcess

  • Clear garbage charachters and digits
  • Lower all alphabet charachters
  • Tokenization
  • Word Counting
  • Show Zipf Law
  • Calculate stop and steⅿⅿing words
  • Remove stop and steⅿⅿing words

Result

License

This project is licensed under the MIT License.

About

Retrieve Information from Text Documents with TF-IDF model and dimention reduction with (Latent Semantic Indexing)LSI.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published