Sentiment analysis

Sentiment Analysis is the detection of attitudes. The basic task is to automatically decide whether a piece of text (e.g. a review, a tweet, a blog post, or a general document) is positive or negative. Also the attitude’s polarity as well as the target, source, or complex types are detected.

In our research, we focus on sentiment analysis in the Czech web environment, with a special attention to social media. In our pilot paper, we created a large annotated corpus from the top 10 Czech facebook brands and achieved the recognition accuracy about 70% [Habernal et al., 2013]. The corpus is freely available for further research at liks.fav.zcu.cz/sentiment. Since NLP in Czech suffers from its large vocabulary and very rich flection in general, we furhter improved our methods by incorporating semi-supervised features based on statistical distributional semantics [Habernal and Brychcín, 2013].

Our experiments in both Czech and English movie review domains achieved the state-of-the-art performance on a widely used datased in the sentiment analysis task (about 92% accuracy). For details, please refer to our paper [Brychcín and Habernal, 2013].

Article Resources

Tomáš Hercig and Tomáš Brychcín and Lukáš Svoboda and Michal Konkol and Josef Steinberger
Unsupervised Methods to Improve Aspect-Based Sentiment Analysis in Czech
Computación y Sistemas (2016)
BibTex | PDF | Data
Sentiment analysis

Corpora

Restaurant Reviews CZ ABSA — 2.15k reviews with their related target and category: CzechABSA-v2.zip (~0.3 MB)

Josef Steinberger and Tomáš Brychcín and Michal Konkol
Aspect-Level Sentiment Analysis in Czech
Proceedings of the 5th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis (2014)
BibTex | PDF
Sentiment analysis

Corpora

Restaurant Reviews CZ ABSA — 1.2k reviews with their related target and category: CzechABSA-v1.zip (~0.15 MB)Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Tomáš Brychcín and Ivan Habernal
Unsupervised Improving of Sentiment Analysis Using Global Target Context
Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013 (2013)
BibTex | PDF
Sentiment analysis

Corpora

CSFD CZ — 90k reviews with their related target (movie): csfd-90k-reviews-ranlp2013.tar.bz2 (11 MB) Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Ivan Habernal and Tomáš Hercig and Josef Steinberger
Sentiment analysis in czech social media using supervised machine learning
Proceedings of the 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis (2013)
BibTex | PDF
Sentiment analysis

Corpora

CSFD CZ — Corpus contains 91,381 movie reviews (30,897 positive, 30,768 neutral, and 29,716 negative reviews) from the Czech Movie Database Corpus: csfd.zip (~13 MB) Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Facebook CZ — Corpus consists 10,000 Facebook posts (2,587 positive, 5,174 neutral, 1,991 negative and 248 bipolar posts).
Corpus: facebook.zip (~1.5 MB) Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
The archive contains data and statistics in an Excel file (FBData.xlsx) and gold data in two text files with posts (gold-posts.txt) and labels (gols-labels.txt) on corresponding lines.

Mall CZ — Corpus consists 145,307 user product reviews (102,977 positive, 31,943 neutral, and 10,387 negative) crawled from a large Czech e-shop Mall.cz
Corpus: mallcz.zip (~7.4 MB) Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Stance Detection

Peter Krejzl and Josef Steinberger
Stance detection in online discussions
WIKT & DaZ 2016 (2016)
BibTex | PDF

Czech Stance Detection v1.1 — Corpus consists 1,460 comments from a Czech news server related to two topics –
Czech president - "Miloš Zeman" (181 In favor, 165 Against, and 301 None) and "Smoking ban in restaurants" (168 In favor, 252 Against, and 393 None).
Corpus: CzechStanceDetection-v1.1.rar (~0.1 MB) Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Unknown publication.

Czech Stance Detection v2.0 — Corpus consists of comments from a Czech news server related to two topics –
Czech president - "Miloš Zeman" 2,638 comments (691 In favor, 1,263 Against, and 684 Neither) and "Smoking ban in restaurants" with two subsets - ALL 2,785 comments(744 In favor, 1,280 Against, and 761 Neither) and GOLD 1,388 comments(272 In favor, 485 Against, and 631 Neither).
Corpus: CzechStanceDetection-v2.0.zip (~0.5 MB) Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Licence

The Corpora are licenced under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Citation

Please, cite the appropriate article if you use any of the available resources.

Publications

Tomáš Hercig and Tomáš Brychcín and Lukáš Svoboda and Michal Konkol
UWB at SemEval-2016 Task 5: Aspect Based Sentiment Analysis
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016) (2016)
BibTex | PDF
Sentiment analysis
Josef Steinberger and Tomáš Brychcín and Michal Konkol
Aspect-Level Sentiment Analysis in Czech
Proceedings of the 5th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis (2014)
BibTex | PDF
Sentiment analysis
Tomáš Hercig and Ivan Habernal
Sarcasm Detection on Czech and English Twitter
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers (2014)
BibTex | PDF
Sentiment analysis
Tomáš Brychcín and Ivan Habernal
Unsupervised Improving of Sentiment Analysis Using Global Target Context
Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013 (2013)
BibTex | PDF
Sentiment analysis
Ivan Habernal and Tomáš Hercig and Josef Steinberger
Sentiment analysis in czech social media using supervised machine learning
Proceedings of the 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis (2013)
BibTex | PDF
Sentiment analysis
Back to Top