Depending on the amount of data to process, file generation may take longer.

If it takes too long to generate, you can limit the data by, for example, reducing the range of years.


Download file Download BibTeX


Automating Opinion Extraction from Semi-Structured Webpages: Leveraging Language Models and Instruction Finetuning on Synthetic Data


[ 1 ] Wydział Inżynierii Zarządzania, Politechnika Poznańska | [ 2 ] Instytut Informatyki, Wydział Informatyki i Telekomunikacji, Politechnika Poznańska | [ S ] student | [ SzD ] doctoral school student | [ P ] employee

Scientific discipline (Law 2.0)

[2.3] Information and communication technology
[6.6] Management and quality studies

Year of publication


Chapter type

chapter in monograph / paper

Publication language


  • Language Models
  • Information Extraction
  • Opinion Mining

EN To address the challenge of extracting opinions from semi-structured webpages such as blog posts and product rankings, encoder-decoder transformer models are employed. We enhance the models’ performance by generating synthetic data using large language models like GPT3.5 and GPT-4, diversified through prompts featuring various text styles, personas and product characteristics. Different fine-tuning strategies are experimented, training both with and without domain-adapted instructions, as well as, training on synthetic customer reviews, targeting tasks such as extracting product names, pros, cons, and opinion sentences. Our evaluation shows a significant improvement in the models’ performance in both product characteristic and opinion extraction tasks, validating the effectiveness of using synthetic data for fine-tuning and signals the potential of pretrained language models to automate web scraping techniques from diverse web sources.

Pages (from - to)

681 - 688



Proceedings of the 16th International Conference on Agents and Artificial Intelligence - (Volume 3)

Presented on

16th International Conference on Agents and Artificial Intelligence, 24-26.02.2024, Rome, Italy

License type

CC BY-NC-ND (attribution - noncommercial - no derivatives)

Open Access Mode

publisher's website

Open Access Text Version

final published version

Date of Open Access to the publication

at the time of publication

Full text of chapter

Download file

Access level to full text


Ministry points / chapter


This website uses cookies to remember the authenticated session of the user. For more information, read about Cookies and Privacy Policy.