Natural Language Processing, Machine Learning & Information Retrieval

Sunday, March 24, 2019

Opening NLP analysis tools and their corpora

We have developed and opened NLP analysis tools and their corpora in the github site.
All of the tools have been developing by deep learning techniques as well as statistical ones.

Please, visit the github site (https://github.com/sgnlplabeling/nlp_labeling) of our project.

Wednesday, May 16, 2018

My accepted paper in COLING 2018

My paper related to WSD was accepted in COLING 2018 as a full paper. Its title and abstract is as follows:

Word Sense Disambiguation Based on Word Similarity Calculation Using Word Vector Representation from a Knowledge-based Graph

Word sense disambiguation (WSD) is the task to determine the sense of an ambiguous word according to its context. Many existing WSD studies have been using an external knowledge-based unsupervised approach because it has fewer word set constraints than supervised approaches requiring training data. In this paper, we propose a new WSD method to generate the context of an ambiguous word by using similarities between an ambiguous word and words in the input document. In addition, to leverage our WSD method, we further propose a new word similarity calculation method based on the semantic network structure of BabelNet. We evaluate the proposed methods on the SemEval-2013 and SemEval-2015 for English WSD dataset. Experimental results demonstrate that the proposed WSD method significantly improves the baseline WSD method. Furthermore, our WSD system outperforms the state-of-the-art WSD systems in the Semeval-13 dataset. Finally, it has higher performance than the state-of-the-art unsupervised knowledge-based WSD system in the average performance of both datasets.

COLING 2018 will be held in Santa Fe, New-Mexico, USA, August 20-26, 2018.

Saturday, August 5, 2017

My accepted paper in CIKM 2017

My paper related to NER was accepted in CIKM 2017. Its abstract is as follows:

Korean named-entity recognition (NER) systems have been developed mainly on the morphological-level, and they are commonly based on a pipeline framework that identifies named-entities (NEs) following the morphological analysis. However, this framework can mean that the performance of NER systems is degraded, because errors from the morphological analysis propagate into NER systems. This paper proposes a novel syllable-level NER system, which does not require a morphological analysis and can achieve a similar or better performance compared with the morphological-level NER systems. In addition, because the proposed system does not require a morphological analysis step, its processing speed is about 1.9 times faster than those of the previous morphological-level NER systems.

CIKM 2017 will be held in Singapore, July 6-10, 2017.

Friday, July 28, 2017

My published paper in Information Processing and Management (IPM 2017)

My paper related to Text Classification was published in Information Processing and Management (SSCI & SCIE). The title is "How to Use Negative Class Information for Naive Bayes Classification" and Its abstract is as follows:

The Naive Bayes (NB) classifier is a popular classifier for text classification problems due to its simple, flexible framework and its reasonable performance. In this paper, we present how to effectively utilize negative class information to improve NB classification. As opposed to information retrieval, supervised learning based text classification already obtains class information, a negative class as well as a positive class, from a labeled training dataset. Since the negative class can also provide significant information to improve the NB classifier, the negative class information is applied to the NB classifier through two phases of indexing and class prediction tasks. As a result, the new classifier using the negative class information consistently performs better than the traditional multinomial NB classifier.

You can freely get the PDF version of this paper from the link https://authors.elsevier.com/a/1VSJt15hYdYMhA until September 15, 2017.

This is the fourth manuscript about text classification using negative class information: SIGIR 2012, Pattern Recognition Letters 2015, JASIST 2015 and IPM 2017. Actually, I'm still interested in this topic so I hope that I will be able to do more studies about that.

Thursday, July 6, 2017

Text Classification and Summarization (Using Natural Language Processing and Machine Learning Techniques)

I gave an invited talk at KISTI. The title is "Text Classification and Summarization (Using Natural Language Processing and Machine Learning Techniques)."

http://web.donga.ac.kr/yjko/talks/TC&TS(Youngjoong%20Ko).pdf

Friday, June 2, 2017

How to Develop NLP Tools with DNN Techniques

I gave an invited talk in IT 21 Global Conference at June 2, 2017. The title is "How to develop NLP tools with DNN techniques."

http://web.donga.ac.kr/yjko/talks/NLP_Tools_with_DNN(Youngjoong%20Ko).pdf

Friday, March 4, 2016

The Basic Concept of TensorFlow

I am preparing to teach TensorFlow in my graduate course. TensorFlow is Google's open software library for machine learning. The first class is about the basic concept of TensorFlow.

http://web.donga.ac.kr/yjko/usefulthings/TensorFlow-Basic-Concept_Ko.pdf

Next topic is about "Practice of NNet with the MNIST data."