Category: Blog

  • test-project

    Build Status Total Downloads Latest Stable Version License

    About Laravel

    Laravel is a web application framework with expressive, elegant syntax. We believe development must be an enjoyable and creative experience to be truly fulfilling. Laravel attempts to take the pain out of development by easing common tasks used in the majority of web projects, such as:

    Laravel is accessible, yet powerful, providing tools needed for large, robust applications.

    Learning Laravel

    Laravel has the most extensive and thorough documentation and video tutorial library of any modern web application framework, making it a breeze to get started learning the framework.

    If you’re not in the mood to read, Laracasts contains over 1100 video tutorials on a range of topics including Laravel, modern PHP, unit testing, JavaScript, and more. Boost the skill level of yourself and your entire team by digging into our comprehensive video library.

    Laravel Sponsors

    We would like to extend our thanks to the following sponsors for helping fund on-going Laravel development. If you are interested in becoming a sponsor, please visit the Laravel Patreon page:

    Contributing

    Thank you for considering contributing to the Laravel framework! The contribution guide can be found in the Laravel documentation.

    Security Vulnerabilities

    If you discover a security vulnerability within Laravel, please send an e-mail to Taylor Otwell via taylor@laravel.com. All security vulnerabilities will be promptly addressed.

    License

    The Laravel framework is open-sourced software licensed under the MIT license.

    Visit original content creator repository https://github.com/KKOA/test-project
  • japanese_text_classification

    Text Classification with Various DNN Methods

    While trying various DNN text classifier methods for a Japanese corpus, livedoor corpus, I aim to gain some knowledge and experice of DNN for NLP.

    Tokenizer

    Two major tokenizers below are experimented.
    [1]MeCab + mecab-ipadicNEologd
    Since the program is implemented in Python, mecab-python3 is also required to execute the program.
    [2]SentencePiece
    SentencePiece is trained on the Japanese Wikipedia Data dumps.
    To train this, I referred to the webpage titled “Wikipediaから日本語コーパスを利用してSentencePieceでトークナイズ(分かち書き)”.

    Word Embedding

    Japanese BERT

    I used bert-japanese implemented by “yoheikikuta”.
    Instead of using the trained SentencePiece and the pratrained BERT model, I trained them from scratch, only a few changes are listed below.

    • I trained SentencePiece with 8,000 words instead of 32,000.
    • I used newer Japanese Wikipedia dataset than the one he used.
    • I pretrained BERT model up to 1,300,000 steps instead of 1,400,000.
      The pretrained result is shown as below.

    ***** Eval results *****
    global_step = 1300000
    loss = 1.3378378
    masked_lm_accuracy = 0.71464056
    masked_lm_loss = 1.2572908
    next_sentence_accuracy = 0.97375
    next_sentence_loss = 0.08065516
    

    Results

    [1]MeCab + ipadicNEologd + fastText

    • MLP(Multi-layer Perceptron)

                    precision    recall  f1-score   support           
    
    dokujo-tsushin      0.721     0.886     0.795       175
      it-life-hack      0.789     0.825     0.806       154
     kaden-channel      0.856     0.856     0.856       167
    livedoor-homme      0.820     0.439     0.571       114
       movie-enter      0.848     0.931     0.888       174
            peachy      0.801     0.701     0.748       184
              smax      0.930     0.930     0.930       186
      sports-watch      0.980     0.890     0.932       163
        topic-news      0.832     0.975     0.897       157
    
         micro avg      0.839     0.839     0.839      1474
         macro avg      0.842     0.826     0.825      1474
      weighted avg      0.843     0.839     0.834      1474
    
    • CNN

                      precision    recall  f1-score   support
    
    dokujo-tsushin      0.935     0.909     0.922       175
      it-life-hack      0.906     1.000     0.951       154
     kaden-channel      1.000     0.970     0.985       167
    livedoor-homme      0.968     0.798     0.875       114
       movie-enter      0.939     0.977     0.958       174
            peachy      0.900     0.929     0.914       184
              smax      0.984     0.978     0.981       186
      sports-watch      0.994     1.000     0.997       163
        topic-news      0.981     0.987     0.984       157
    
         micro avg      0.955     0.955     0.955      1474
         macro avg      0.956     0.950     0.952      1474
      weighted avg      0.956     0.955     0.954      1474
    
    • BiLSTM

                    precision    recall  f1-score   support
    
    dokujo-tsushin      0.850     0.874     0.862       175
      it-life-hack      0.851     0.929     0.888       154
     kaden-channel      0.957     0.928     0.942       167
    livedoor-homme      0.786     0.675     0.726       114
       movie-enter      0.860     0.989     0.920       174
            peachy      0.886     0.761     0.819       184
              smax      0.957     0.957     0.957       186
      sports-watch      0.969     0.957     0.963       163
        topic-news      0.950     0.975     0.962       157
    
         micro avg      0.900     0.900     0.900      1474
         macro avg      0.896     0.894     0.893      1474
      weighted avg      0.900     0.900     0.899      1474
    

    [2]SentencePiece

    • MLP

                    precision    recall  f1-score   support
    
    dokujo-tsushin      0.862     0.926     0.893       175
      it-life-hack      0.911     0.929     0.920       154
     kaden-channel      0.931     0.970     0.950       167
    livedoor-homme      0.886     0.684     0.772       114
       movie-enter      0.945     0.983     0.963       174
            peachy      0.893     0.864     0.878       184
              smax      0.974     0.995     0.984       186
      sports-watch      0.974     0.914     0.943       163
        topic-news      0.909     0.955     0.932       157
    
         micro avg      0.922     0.922     0.922      1474
         macro avg      0.921     0.913     0.915      1474
      weighted avg      0.922     0.922     0.920      1474
    
    • CNN

                    precision    recall  f1-score   support
    
    dokujo-tsushin      0.965     0.937     0.951       175
      it-life-hack      0.962     0.994     0.978       154
     kaden-channel      1.000     0.982     0.991       167
    livedoor-homme      0.903     0.816     0.857       114
       movie-enter      0.956     0.994     0.975       174
            peachy      0.937     0.962     0.949       184
              smax      0.989     1.000     0.995       186
      sports-watch      0.975     0.975     0.975       163
        topic-news      0.981     0.981     0.981       157
    
         micro avg      0.965     0.965     0.965      1474
         macro avg      0.963     0.960     0.961      1474
      weighted avg      0.965     0.965     0.965      1474
    
    • BiLSTM

                    precision    recall  f1-score   support
    
    dokujo-tsushin      0.927     0.943     0.935       175
      it-life-hack      0.936     0.955     0.945       154
     kaden-channel      0.970     0.964     0.967       167
    livedoor-homme      0.930     0.702     0.800       114
       movie-enter      0.919     0.983     0.950       174
            peachy      0.891     0.935     0.912       184
              smax      0.969     0.995     0.981       186
      sports-watch      0.969     0.957     0.963       163
        topic-news      0.955     0.949     0.952       157
    
         micro avg      0.940     0.940     0.940      1474
         macro avg      0.941     0.931     0.934      1474
      weighted avg      0.941     0.940     0.939      1474
    
    • BERT

                    precision    recall  f1-score   support
    
    dokujo-tsushin      0.958     0.920     0.939       175
      it-life-hack      0.933     0.987     0.959       154
     kaden-channel      0.976     0.964     0.970       167
    livedoor-homme      0.922     0.825     0.870       114
       movie-enter      0.944     0.977     0.960       174
            peachy      0.922     0.967     0.944       184
              smax      0.989     0.973     0.981       186
      sports-watch      1.000     0.982     0.991       163
        topic-news      0.969     0.987     0.978       157
    
          accuracy                          0.958      1474
         macro avg      0.957     0.954     0.955      1474
      weighted avg      0.958     0.958     0.958      1474
    

    Conclusion

    The best model among the 7 models above is CNN with Sentence Piece.
    Results may be changed if you do more complicated classification tasks.
    For each DNN model tested on both MeCab and Sentence Piece, such as MLP, CNN or biLSTM, a model that used Sentence Piece outperformed the one that used fastText+MeCab+ipadicNEologd.

    Visit original content creator repository
    https://github.com/Masao-Taketani/japanese_text_classification

  • letmeask

    letmeask

    NLW 06

    Project

    Letmeask is a web application to create interactive Q&A rooms to help streamers/content creators. The project was developed during Next Level Week #06 Together (ReactJS), event presented by Rocketseat.

    To complement the project I developed: toast notifications, logout flow, room reopen flow, listing of user rooms, permission rules for accessing links, interaction rules with the room and others fixes.


    Technologies

    This project was developed using the following technologies:

    Layout

    You can view the project layout through this link. You must have an account at Figma to access it.

    How to execute

    • First, you need install Node.js and Yarn.
    • Clone the repository git clone https://github.com/rafaelthz/letmeask-nlw6.git
    • Access the folder cd letmeask-nlw6
    • Install the dependencies with yarn
    • Create a .env.local file and add the yours Firebase SDK configs – see more on the docs
    • Start the server with yarn start

    Now you can access localhost:3000 in your browser. Remembering that it will be necessary to create an account in Firebase and a project to make a Realtime Database available.


    Inspired by Rocketseat Education.

    Visit original content creator repository https://github.com/rafaelthz/letmeask
  • COPA

    COPA: Certifying Robust Policies for Offline Reinforcement Learning against Poisoning Attacks

    We propose COPA, the first unified framework for certifying robust policies for general offline RL against poisoning attacks, based on certification criteria including per-state action stability and the lower bound of cumulative reward. Specifically, we propose new partition and aggregation protocols (PARL, TPARL, DPARL) to obtain robust policies and provide certification methods for them. More details can be found in our paper:

    Fan Wu*, Linyi Li*, Chejian Xu, Huan Zhang, Bhavya Kailkhura, Krishnaram Kenthapadi, Ding Zhao, and Bo Li, “COPA: Certifying Robust Policies for Offline Reinforcement Learning against Poisoning Attacks”, ICLR 2022 (*Equal contribution)

    All experimental results are available at the website https://copa-leaderboard.github.io/.

    Code

    In our paper, we conduct experiments on Atari games Freeway and Breakout, as well as an autonomous driving environment Highway. For each RL environment, we evaluate three RL algorithms (DQN, QR-DQN, and C51), three aggregation protocols and certification methods (PARL, TPARL, and DPARL), up to three partition numbers, and multiple horizon lengths.

    Reference implementation for experiments on Atari games can be found at https://github.com/AI-secure/COPA_Atari.

    Reference implementation for experiments on Highway can be found at https://github.com/AI-secure/COPA_Highway.

    Reference

    @inproceedings{wu2022copa,
    title={COPA: Certifying Robust Policies for Offline Reinforcement Learning against Poisoning Attacks},
    author={Wu, Fan and Li, Linyi and Xu, Chejian and Zhang, Huan and Kailkhura, Bhavya and Kenthapadi, Krishnaram and Zhao, Ding and Li, Bo},
    booktitle={International Conference on Learning Representations},
    year={2022}
    }

    Visit original content creator repository
    https://github.com/AI-secure/COPA

  • bptree

    bptree

    Build codecov Go Report Card GoDoc

    An in-memory B+ tree implementation for Go with byte-slice keys and values.

    Installation

    To install, run:

    go get github.com/krasun/bptree
    

    Quickstart

    Feel free to play:

    package main
    
    import (
    	"fmt"
    
    	"github.com/krasun/bptree"
    )
    
    func main() {
    	tree, err := bptree.New()
    	if err != nil {
    		fmt.Fprintf(os.Stderr, "error: %v\n", err)
            os.Exit(1)
    	}
    
    	tree.Put([]byte("apple"), []byte("sweet"))
    	tree.Put([]byte("banana"), []byte("honey"))
    	tree.Put([]byte("cinnamon"), []byte("savoury"))
    
    	banana, ok := tree.Get([]byte("banana"))
    	if ok {
    		fmt.Printf("banana = %s\n", string(banana))
    	} else {
    		fmt.Println("value for banana not found")
    	}
    
    	tree.ForEach(func(key, value []byte) {
    		fmt.Printf("key = %s, value = %s\n", string(key), string(value))
    	})
    
    	// Output: 
    	// banana = honey
    	// key = apple, value = sweet
    	// key = banana, value = honey
    	// key = cinnamon, value = savoury
    }

    You can use an iterator:

    package main
    
    import (
    	"fmt"
    
    	"github.com/krasun/bptree"
    )
    
    func main() {
    	tree, err := bptree.New(bptree.Order(3))
    	if err != nil {
    		fmt.Fprintf(os.Stderr, "error: %v\n", err)
            os.Exit(1)
    	}
    
    	tree.Put([]byte("apple"), []byte("sweet"))
    	tree.Put([]byte("banana"), []byte("honey"))
    	tree.Put([]byte("cinnamon"), []byte("savoury"))
    
    	banana, ok := tree.Get([]byte("banana"))
    	if ok {
    		fmt.Printf("banana = %s\n", string(banana))
    	} else {
    		fmt.Println("value for banana not found")
    	}
    
    	for it := tree.Iterator(); it.HasNext(); {
    		key, value := it.Next()
    		fmt.Printf("key = %s, value = %s\n", string(key), string(value))
    	}
    
    	// Output: 
    	// banana = honey
    	// key = apple, value = sweet
    	// key = banana, value = honey
    	// key = cinnamon, value = savoury
    }

    An iterator is stateful. You can have multiple iterators without any impact on each other, but make sure to synchronize access to them and the tree in a concurrent environment.

    Caution! Next panics if there is no next element. Make sure to test for the next element with HasNext before.

    Use cases

    1. When you want to use []byte as a key in the map.
    2. When you want to iterate over keys in map in sorted order.

    Limitations

    Caution! To guarantee that the B+ tree properties are not violated, keys are copied.

    You should clearly understand what []byte slice is and why it is dangerous to use it as a key. Go language authors do prohibit using byte slice ([]byte) as a map key for a reason. The point is that you can change the values of the key and thus violate the invariants of map:

    // if it worked 
    b := []byte{1}
    m := make(map[[]byte]int)
    m[b] = 1
    
    b[0] = 2 // it would violate the invariants 
    m[[]byte{1}] // what do you expect to receive?

    So to make sure that this situation does not occur in the tree, the key is copied byte by byte.

    Benchmark

    Regular Go map is as twice faster for put and get than B+ tree. But if you need to iterate over keys in sorted order, the picture is slightly different:

    $ go test -benchmem -bench .                                                                                            127 ↵
    goos: darwin
    goarch: amd64
    pkg: github.com/krasun/bptree
    BenchmarkTreePut-8                     	     187	   6423171 ns/op	 2825134 B/op	   99844 allocs/op
    BenchmarkMapPut-8                      	     525	   2736062 ns/op	 1732158 B/op	   20150 allocs/op
    BenchmarkTreePutRandomized-8           	     177	   6745088 ns/op	 1622519 B/op	   69431 allocs/op
    BenchmarkMapPutRandomized-8            	     612	   1944303 ns/op	  981396 B/op	   20111 allocs/op
    BenchmarkMapGet-8                      	    1484	    704045 ns/op	   38880 B/op	    9900 allocs/op
    BenchmarkTreeGet-8                     	     505	   2184212 ns/op	   38880 B/op	    9900 allocs/op
    BenchmarkTreePutAndForEach-8           	     181	   6958273 ns/op	 2825133 B/op	   99844 allocs/op
    BenchmarkMapPutAndIterateAfterSort-8   	     205	   5473439 ns/op	 2558078 B/op	   20172 allocs/op
    PASS
    ok  	github.com/krasun/bptree	15.460s
    

    Tests

    Run tests with:

    $ go test -cover .
    ok  	github.com/krasun/bptree	0.468s	coverage: 100.0% of statements
    

    License

    bptree is released under the MIT license.

    Visit original content creator repository https://github.com/krasun/bptree
  • ipyvolume

    ipyvolume

    Join the chat at https://gitter.im/maartenbreddels/ipyvolume Documentation Version Anaconda-Server Badge Coverage Status Build Status

    Try out in mybinder: Binder

    3d plotting for Python in the Jupyter notebook based on IPython widgets using WebGL.

    Ipyvolume currently can

    • Do (multi) volume rendering.
    • Create scatter plots (up to ~1 million glyphs).
    • Create quiver plots (like scatter, but with an arrow pointing in a particular direction).
    • Render isosurfaces.
    • Do lasso mouse selections.
    • Render in the Jupyter notebook, or create a standalone html page (or snippet to embed in your page).
    • Render in stereo, for virtual reality with Google Cardboard.
    • Animate in d3 style, for instance if the x coordinates or color of a scatter plots changes.
    • Animations / sequences, all scatter/quiver plot properties can be a list of arrays, which can represent time snapshots.
    • Stylable (although still basic)
    • Integrates with

    Ipyvolume will probably, but not yet:

    • Render labels in latex.
    • Show a custom popup on hovering over a glyph.

    Documentation

    Documentation is generated at readthedocs: Documentation

    Screencast demos

    Animation

    screencast

    (see more at the documentation)

    Volume rendering

    screencast

    Glyphs (quiver plots)

    screencast quiver

    Installation

    If you want to use Jupyter Lab, please use version 3.0.

    Using pip

    Advice: Make sure you use conda or virtualenv. If you are not a root user and want to use the --user argument for pip, you expose the installation to all python environments, which is a bad practice, make sure you know what you are doing.

    $ pip install ipyvolume
    

    Conda/Anaconda

    $ conda install -c conda-forge ipyvolume
    

    Pre-notebook 5.3

    If you are still using an old notebook version, ipyvolume and its dependend extension (widgetsnbextension) need to be enabled manually. If unsure, check which extensions are enabled:

    $ jupyter nbextension list
    

    If not enabled, enable them:

    $ jupyter nbextension enable --py --sys-prefix ipyvolume
    $ jupyter nbextension enable --py --sys-prefix widgetsnbextension
    

    Pip as user: (but really, do not do this)

    You have been warned, do this only if you know what you are doing, this might hunt you in the future, and now is a good time to consider learning virtualenv or conda.

    $ pip install ipyvolume --user
    $ jupyter nbextension enable --py --user ipyvolume
    $ jupyter nbextension enable --py --user widgetsnbextension
    

    Developer installation

    $ git clone https://github.com/maartenbreddels/ipyvolume.git
    $ cd ipyvolume
    $ pip install -e . notebook jupyterlab
    $ (cd js; npm run build)
    $ jupyter nbextension install --py --overwrite --symlink --sys-prefix ipyvolume
    $ jupyter nbextension enable --py --sys-prefix ipyvolume
    # for jupyterlab (>=3.0), symlink share/jupyter/labextensions/bqplot-image-gl
    $ jupyter labextension develop . --overwrite
    

    Developer workflow

    Jupyter notebook (classical)

    Note: There is never a need to restart the notebook server, nbextensions are picked up after a page reload.

    Start this command:

    $ (cd js; npm run watch)
    

    It will

    • Watch for changes in the sourcecode and run the typescript compiler for transpilation of the src dir to the lib dir.
    • Watch the lib dir, and webpack will build (among other things), ROOT/ipyvolume/static/index.js.

    Refresh the page.

    Visit original content creator repository https://github.com/widgetti/ipyvolume
  • ipyvolume

    ipyvolume

    Join the chat at https://gitter.im/maartenbreddels/ipyvolume Documentation Version Anaconda-Server Badge Coverage Status Build Status

    Try out in mybinder: Binder

    3d plotting for Python in the Jupyter notebook based on IPython widgets using WebGL.

    Ipyvolume currently can

    • Do (multi) volume rendering.
    • Create scatter plots (up to ~1 million glyphs).
    • Create quiver plots (like scatter, but with an arrow pointing in a particular direction).
    • Render isosurfaces.
    • Do lasso mouse selections.
    • Render in the Jupyter notebook, or create a standalone html page (or snippet to embed in your page).
    • Render in stereo, for virtual reality with Google Cardboard.
    • Animate in d3 style, for instance if the x coordinates or color of a scatter plots changes.
    • Animations / sequences, all scatter/quiver plot properties can be a list of arrays, which can represent time snapshots.
    • Stylable (although still basic)
    • Integrates with

    Ipyvolume will probably, but not yet:

    • Render labels in latex.
    • Show a custom popup on hovering over a glyph.

    Documentation

    Documentation is generated at readthedocs: Documentation

    Screencast demos

    Animation

    screencast

    (see more at the documentation)

    Volume rendering

    screencast

    Glyphs (quiver plots)

    screencast quiver

    Installation

    If you want to use Jupyter Lab, please use version 3.0.

    Using pip

    Advice: Make sure you use conda or virtualenv. If you are not a root user and want to use the --user argument for pip, you expose the installation to all python environments, which is a bad practice, make sure you know what you are doing.

    $ pip install ipyvolume
    

    Conda/Anaconda

    $ conda install -c conda-forge ipyvolume
    

    Pre-notebook 5.3

    If you are still using an old notebook version, ipyvolume and its dependend extension (widgetsnbextension) need to be enabled manually. If unsure, check which extensions are enabled:

    $ jupyter nbextension list
    

    If not enabled, enable them:

    $ jupyter nbextension enable --py --sys-prefix ipyvolume
    $ jupyter nbextension enable --py --sys-prefix widgetsnbextension
    

    Pip as user: (but really, do not do this)

    You have been warned, do this only if you know what you are doing, this might hunt you in the future, and now is a good time to consider learning virtualenv or conda.

    $ pip install ipyvolume --user
    $ jupyter nbextension enable --py --user ipyvolume
    $ jupyter nbextension enable --py --user widgetsnbextension
    

    Developer installation

    $ git clone https://github.com/maartenbreddels/ipyvolume.git
    $ cd ipyvolume
    $ pip install -e . notebook jupyterlab
    $ (cd js; npm run build)
    $ jupyter nbextension install --py --overwrite --symlink --sys-prefix ipyvolume
    $ jupyter nbextension enable --py --sys-prefix ipyvolume
    # for jupyterlab (>=3.0), symlink share/jupyter/labextensions/bqplot-image-gl
    $ jupyter labextension develop . --overwrite
    

    Developer workflow

    Jupyter notebook (classical)

    Note: There is never a need to restart the notebook server, nbextensions are picked up after a page reload.

    Start this command:

    $ (cd js; npm run watch)
    

    It will

    • Watch for changes in the sourcecode and run the typescript compiler for transpilation of the src dir to the lib dir.
    • Watch the lib dir, and webpack will build (among other things), ROOT/ipyvolume/static/index.js.

    Refresh the page.

    Visit original content creator repository https://github.com/widgetti/ipyvolume
  • unemployment

    Unemployment: Course Portal

    This repository is the portal for the course “Unemployment” taught by Pascal Michaillat at UC Santa Cruz. The course ID is ECON 182. The course portal contains the syllabus, provides a discussion forum, and hosts other course resources.

    Course webpage

    The course materials are available at https://pascalmichaillat.org/v/.

    Portal content

    • Syllabus for Winter 2025
    • Presentation schedule for Winter 2025
    • Lecture handouts – The folder contains handouts distributed in lecture. The handouts are designed to help you develop your research ideas and collect questions about the lecture videos.
    • Discussion forum – This collaborative discussion forum is designed to get you help quickly and efficiently. You can ask and answer questions, share updates, have open-ended conversations, and follow along course announcements.
    • Reading material – The folder contains book chapters and articles that may be hard to find online.
    • Lecture material – The folder contains discussions from lecture.
    • Section material – The folder contains material from section.
    • Presentations – The folder contains all the student presentations given during the quarter, and some presentation templates and examples.
    • LaTeX code for presentation slides – Complete code to produce a presentation with LaTeX. Just upload the files to Overleaf and start writing your slides!
    • LaTeX code for research paper – Complete code to produce a research paper with LaTeX. Just upload the files to Overleaf and start writing your paper!

    License

    This repository is licensed under the Creative Commons Attribution 4.0 International License.

    Visit original content creator repository
    https://github.com/pmichaillat/unemployment

  • TextProcessing

    Icon

    Project Overview

    TLDR; Text extraction, transcription, punctuation restoration, translation, summarization and text to speech

    The goal of this project is to extend the functionalities of Fabric. I’m particularly interested in building pipelines using utilities like yt as a source and chaining them with the | operator in CI.

    However, a major limitation exists: all operations are constrained by the LLM context. For extracting information from books, lengthy documents, or long video transcripts, content may get truncated.

    To address this, I started working on adding a summarization step before applying a fabric template, based on the document length. Additionally, I explored capabilities like transcripting, translating and listening to the pipeline result or saving it as an audio file for later consumption.

    Examples

    Listen to the condensed summary of a long Youtube video

    yt --transcript url | tp --cb | tts

    Read a web page summary

    tp --ebullets https://en.wikipedia.org/wiki/Text_processing

    Listen to the condensed French summary of a long English Youtube video

    yt --transcript --lang en url | tp --cb --tr fr | tts

    Save a book’s wisdom as an audio file

    tp my_book.txt --eb | fabric --p extract_wisdom | tts --o my_book_wisdom.mp3

    Say “hello world!” in Chinese

    echo "Hello world!" | tp --tr zh | tts

    Translate a document to Spanish

    tp doc_fr.txt --tr es > doc_es.txt

    Generate a transcript in any language from a mp4 file. E.G.: from English to French

    tp en.mp4 --tr fr

    Listen in spanish a French audio file

    tp fr.mp3 --tr es | tts

    Convert a spanish audio book to a French audio book… and make an English transcript

    tp es.mp3 --tr fr | tts --o fr.mp3 | tp fr.mp3 --tr en --o tr_en.txt

    Extract ideas from an audio file, save them in a French text file

    tp en.mp3 | fabric --p extract_ideas | tp --tr fr --o idées.txt

    Perform OCR

    tp image.png

    Extracts text from a Word file

    tp document.docx

    Text Processing (tp)

    Input (text or audio file)

    tp receives from stdin or as first command line argument It accepts:

    • Text.
    • File path. Supported formats are: .aiff, .bmp, .cs, .csv, .doc, .docx, .eml, .epub, .flac, .gif, .htm, .html, .jpeg, .jpg, .json, .log, .md, .mkv, .mobi, .mp3, .mp4, .msg, .odt, .ogg, .pdf, .png, .pptx, .ps, .psv, .py, .rtf, .sql, .tff, .tif, .tiff, .tsv, .txt, .wav, .xls, .xlsx

    tp accepts unformatted content, such as automatically generated YouTube transcripts. If the text lacks punctuation, it restores it before further processing, which is necessary for chunking and text-to-speech operations.

    Transcription

    Converts audio and video files to text using Whisper.

    Summarization

    The primary aim is to summarize books, large documents, or long video transcripts using an LLM with an 8K context size. Various summarization levels are available:

    Extended Bullet Summary (--ebullets, --eb )

    • Splits text into chunks.
    • Summarizes all chunks as bullet points.
    • Concatenates all bullet summaries.

    The goal is to retain as much information as possible.

    Condensed Bullet Summary (--cbullets, --cb)

    Executes as many extended bullet summary phases as needed to end up with a bullet summary smaller than an LLM context size.

    Textual Summary (--text, --t)

    A simple summarization that does not rely on bullet points.

    Translation (--translate, --tr)

    Translates the output text to the desired language. Use two letters code such as en or fr.

    Usage

    usage: tp [-h] [--ebullets] [--cbullets] [--text] [--lang LANG] [--translate TRANSLATE] [--output_text_file_path OUTPUT_TEXT_FILE_PATH] [text_or_path]
    
    tp (text processing) provides transcription, punctuation restoration, translation and summarization from stdin, text, url, or file path. Supported file formats are: .aiff, .bmp, .cs, .csv, .doc, .docx, .eml, .epub, .flac, .gif, .htm, .html, .jpeg, .jpg, .json, .log, .md, .mkv, .mobi, .mp3, .mp4, .msg, .odt, .ogg, .pdf, .png, .pptx, .ps, .psv, .py, .rtf, .sql, .tff, .tif, .tiff, .tsv, .txt, .wav, .xls, .xlsx
    
    positional arguments:
      text_or_path          plain text; file path; file url
    
    options:
      -h, --help            show this help message and exit
      --ebullets, --eb      Output an extended bullet summary
      --cbullets, --cb      Output a condensed bullet summary
      --text, --t           Output a textual summary
      --lang LANG, --l LANG
                            Forced processing language. Disables the automatic detection.
      --translate TRANSLATE, --tr TRANSLATE
                            Language to translate to
      --output_text_file_path OUTPUT_TEXT_FILE_PATH, --o OUTPUT_TEXT_FILE_PATH
                            output text file path
    

    Text To Speech (tts)

    Listen to the pipeline result or save it as an audio file to listen later.

    tts can also read text files, automatically detecting their language.

    usage: tts.py [-h] [--output_file_path OUTPUT_FILE_PATH] [--lang LANG] [input_text_or_path]
    
    tts (text to speech) reads text aloud or to mp3 file
    
    positional arguments:
      input_text_or_path    Text to read or path of the text file to read.
    
    options:
      -h, --help            show this help message and exit
      --output_file_path OUTPUT_FILE_PATH, --o OUTPUT_FILE_PATH
                            Output file path. If none, read aloud.
      --lang LANG, --l LANG
                            Forced language. Uses language detection if not provided.
    

    Environment setup

    .env file

    GROQ_API_KEY=gsk_
    LITE_LLM_URI='http://localhost:4000/'
    SMALL_CONTEXT_MODEL_NAME="groq/llama3-8b-8192"
    SMALL_CONTEXT_MAX_TOKENS=8192
    

    script short hand

    • Make script executable chmod +x tts.py

    • Create symlink : Link the script to a directory that’s in your PATH sudo ln -s tts.py /usr/local/bin/tts

    Visit original content creator repository https://github.com/Gauff/TextProcessing
  • tiny-timer

    tiny-timer

    npm Build Status Dependency status downloads license

    Small countdown timer and stopwatch module.

    Installation

    npm:

    $ npm install tiny-timer

    Yarn:

    $ yarn add tiny-timer

    Example

    const Timer = require('tiny-timer')
    
    const timer = new Timer()
    
    timer.on('tick', (ms) => console.log('tick', ms))
    timer.on('done', () => console.log('done!'))
    timer.on('statusChanged', (status) => console.log('status:', status))
    
    timer.start(5000) // run for 5 seconds

    Usage

    timer = new Timer({ interval: 1000, stopwatch: false })

    Optionally set the refresh interval in ms, or stopwatch mode instead of countdown.

    timer.start(duration [, interval]) {

    Starts timer running for a duration specified in ms. Optionally override the default refresh interval in ms.

    timer.stop()

    Stops timer.

    timer.pause()

    Pauses timer.

    timer.resume()

    Resumes timer.

    Events

    timer.on('tick', (ms) => {})

    Event emitted every interval with the current time in ms.

    timer.on('done', () => {})

    Event emitted when the timer reaches the duration set by calling timer.start().

    timer.on('statusChanged', (status) => {})

    Event emitted when the timer status changes.

    Properties

    timer.time

    Gets the current time in ms.

    timer.duration

    Gets the total duration the timer is running for in ms.

    timer.status

    Gets the current status of the timer as a string: running, paused or stopped.

    Visit original content creator repository https://github.com/mathiasvr/tiny-timer