![]() Given a block of text, we want to have a function or model that is able to extract important keywords. Without futher ado, let’s jump right in! Introductionīefore we get down into the engineering details, here’s a bird’s eye view of what we want to achieve. I highly recommend that you check out both his post as well as the library on GitHub. The method introduced in this post heavily borrows the methodology introduced in this Medium article by Maarten Grootendorst, author of the KeyBERT. In today’s post, I hope to explore the latter in more detail by introducing an easy way of extracting keywords from a block of text using transformers and contextual embeddings. While there might be many ways to go about this problem, I’ve come to two realistic, engineerable solutions: zero-shot classification and keyword extraction as a means of new label suggestion. Retraining and fine-tuning the model again would be a costly, resource-intensive operation. After all, the classification head of the model was fixed, so unless a new classifier was trained from scratch using new data, the model would never learn to predict new labels. The supervised leanring approach I took with fine-tuning also meant that the model could not learn to classify new labels it had not seen before. The fact that the dataset had been manually labeled by me, who tagged articles back then without much thought, certainly did not help. ![]() This was in large part due to my naïve design of the model and the unavoidable limitations of multi-label classification: the more labels there are, the worse the model performs. For instance, the model was only trained on a total of the eight most frequently occuring labels. The BERT fine-tuning approach came with a number of different drawbacks. Nonetheless, I knew that more could be done. ![]() ![]() Automatically generates a markdown file that not only includes all the contents of the Jupyter notebook, but it also includes automatically generated tags that the fine-tuned BERT model inferred based on text processing. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |