A Study on Social Perceptions of Public Libraries Utilizing the sentiment analysis
Abstract
This study would understand the overall perception of our society about public libraries, analyzing the texts related to public libraries, utilizing the semantic connection network & sentiment analysis. For this purpose, this study collected data from the last five years with keywords, ‘Library’ and ‘Lifelong Learning Center’ from January 1, 2016 through November 30, 2020 through the blogs and cafés of major domestic portal sites. With the collected data, text mining, centrality of keywords, network structure, structural equipotentiality, and sensitivity analyses were conducted. As a result of the analysis, First, ‘reading’ and ‘book’ were identified as representative keywords that form the social perception of public libraries. Second, it turned out that there were keywords related to the use of the library and the untact service due to the recent spread of COVID-19. Third, in seeking a plan for the development of public libraries through the keywords drawn to have positive meanings, it is necessary to create continuous services that can form a new image of the library, breaking away from the existing fixed role and image of the library and increase the convenience of use. Fourth, facilities and facilities for library services were recognized from a neutral point of view. Fifth, the spread of infectious diseases, social distancing, and temporary closure and closure of libraries are negatively related to public libraries, and awareness of librarians has been identified as negative keywords.
Keywords:
Public Library, Text Mining, Semantic Connection, Network Analysis, CONCOR Analysis, Sentiment Analysis1. Introduction
Public library has had a significant impact on creating and maintaining the culture of the nation and community, and in particular, as an educational and cultural institution, has played a major role in guaranteeing the basic rights of the members of society, such as information, reading culture, and education.
According to the Third Comprehensive Library Development Plan, the Korean society is faced with changes such as by entering an aging society socio-culturally, decreased population due to the low rate of fertility, increased multi-cultural families, increased leisure time due to the implementation of the 52-hour work week, and the entry into the fourth industrial society based on the super intelligence and hyper connectivity. Library officials are saying that these social changes are both a crisis and an opportunity. The service environment for public library is rapidly changing, such as the decrease in the reading population since the advent of various information media and the change of users and the diversification of desires, and in particular, the pandemic caused by the massive infectious disease has had a great impact on the public library’s operation and services provision. In order for the public library to perform its role as an educational and cultural institution moving forward, it is necessary to respond to the demand for the public library which occurs based on these social changes in order to continuously develop. From such point of view, confirming our society's perception of public libraries may be said to be a preemptive action for the effective library operation in terms of service creation following the demand.
Accordingly, this study sought to verify the perception of public libraries formed throughout our society by analyzing major keywords and sentiment about public libraries through the text mining analysis based on the texts related to public libraries which exist online. The detailed research contents for this purpose are as follows. First, the texts related to public libraries posted on the portal site blogs and cafes were collected, and TF and TF-IDF (Term Frequency-Inverse Document Frequency) values for the keywords were analyzed. Second, attempt was made to confirm the meaning of keywords through the centrality analysis and the CONCOR (CONvergence of iteration CORrelation) analysis of the keywords derived. Third, the sentiment analysis of the data collected was used to confirm the sentiment in the public library, and based on which, the main keywords along with the positive and negative perceptions of the public library were verified. Based on which, this study sought to provide the basic data which may be used to present the future directions of the public library’s operation by presenting the implications for the social perception of public library.
2. Previous Studies
2.1 Previous studies on sentiment Analysis for Libraries
Many studies on the emotional analysis for public libraries could not be found, and several studies conducted the emotional analysis by using the spaces in public libraries.
Noh (2015) attempted to provide a foundation for designing and evaluating the future library spaces by drawing emotional vocabulary regarding spaces in public libraries. He draw 12 main emotional words regarding the spaces in university libraries via 5 steps of extraction and refinement, and finally suggested the following emotional vocabulary: ‘various, satisfactory, necessary, full, clean, stable, proper, harmonic, opening, warm, natural and excellent’
Ham and Oh (2016) conducted the emotional analysis on the effect of color images on the spaces in university libraries. The findings showed that different emotions occur in the different spaces in libraries and have close relations with color images. They thus argued that color images are closely related with emotions in spaces.
2.2 Previous studies on Social Role and Perception of Public Libraries
Studies on perceptions of public libraries have been conducted with various themes and scopes. There have been studies on libraries' specific services, including programs and policies. This study arranged researches on the roles which public libraries should play as social institutions and the perceptions of them.
Pyo and Cha (2018) aimed to provide basic data necessary for establishing the developmental directions of libraries and determining the directions of services by comparatively analyzing librarians' and users' perceptions on public libraries. To this end, questionnaires about the environment of the current libraries, roles and status of the future libraries and policies for libraries were administered to librarians and users in the public library under the Seoul Metropolitan Office of Education. The findings indicate that both librarians and users perceive that reading and education are important for public libraries' roles and future status, suggesting that two groups have common expectations for public libraries. Libraries, however, proposed the improvement of outworn equipment and retraining for libraries, as policy plans for the future libraries, while users asked for expanding the collections of books and the use of reading rooms. It is, therefore, necessary to develop policies for public libraries, given such a perceptional difference between two groups.
Noh and Kim (2019) examined and analyzed library users' perceptions on the complex cultural space in libraries and their preferences and perceptions on the programs operated in it, by administering questionnaires to users of libraries promoting and operating it. The findings suggest that it is necessary to manage the programs by providing the spaces for information/education to users and simultaneously developing those for exhibition and performance, and publicize the complex culture spaces in libraries as not only those for information/education, but also as those for culture, healing, experience, communic ation, and so on, and improve users' perceptions on them.
Kim and Kwon (2020) inspected the service systems of public libraries located at Mapo-gu, Seoul and analyzed users' perceptions and needs for public libraries, to establish the service systems of public libraries, which can meet the needs of various users. The findings indicate residents' strong agreement that libraries are necessary for communities and are essential institutes for children. In addition, the quantitative expansion of the collections of books and the verification of contents were found to be most strongly required by them. For the environmental improvement in library facilities and the expansion of libraries, establishing public libraries at the center of living space to which users can have easy access was perceived to be important for public libraries to faithfully play their roles as central spaces of the communities.
Chang (2020) examined the current status of public library services in Busan and analyzed users' needs for public libraries, by investigating public library users' perceptions in Busan. The findings exhibited that Busan citizens' perceptions and needs for public libraries were very high, while inadequate and outworn facilities in Busan cannot meet such needs. Easily accessible life space-directed public libraries, comfortable and convenient spaces, quick provision of collections of books, expansion of digital services and dynamic promotion policies, etc. were, therefore, found to be necessary for public libraries to faithfully play their roles as community culture spaces.
3. Theoretical Background
This study sought to verify the keywords and sentiment for the public library by utilizing text data. The analytical method to this end is as follows.
3.1 TF_IDF (Term Frequency-Inverse Document Frequency)
The Term Frequency (TF) technique is used based on the hypothesis that “if the frequency of emergence of a specific word in a document is high, the relevance will be as high as the frequency of emergence.” However, if the frequency of emergence of meaningless words such as articles and postpositions is high, it is difficult to determine the relationship between the frequency of emergence of words and the relationship between the sentences. The TF-IDF technique is utilized as a technique to limit such unrelated words. TF-IDF is a numerical value indicating the correlation of sentences for each word (Lee, Lee, & Kim, 2019). A key word of a document may be derived as a value which statistically indicates the importance of a specific word in the entire sentence. The TF-IDF calculation formula used in this study is as follows.
3.2 Centrality analysis
Centrality refers to a measure of the relative importance of a node. The central word in a network carries an index meaning which can improve the performance of the group as a whole, such as making the group stronger or influencing decision making. The index of centrality is classified into the degree centrality, closeness centrality, eigenvector centrality, and betweenness centrality, etc., according to the calculation method (Choi, 2016).
“Degree centrality” is an indicator which measures how many connections the nodes of a connection network have, and the central node may be identified by measuring the extent to which a specific node is connected to the other nodes. It demonstrates as to how many links are connected to a node by counting the number of links for all the nodes neighboring a node. It is based on the number of the other nodes connected to a single node, and does not include the nodes which undergo 2 or more phases (Choi, 2016). “Closeness centrality” is an indicator of centrality which is associated with the connection distance reaching each node. In addition to demonstrating the distance between the nodes, the node with the most general influence may be identified by measuring the distances of all the nodes which are directly or indirectly connected. A node with the high closeness centrality has the shortest connection distance to all of the other nodes within the network, and the higher the closeness centrality, the more the node is located in the center of the network (Lee, 2012).
“Betweenness centrality” is a concept which measures to what extent it performs as an intermediary between the nodes when forming a network (Kim et al., 2014). It identifies the nodes that act as intermediaries between the nodes, the higher the betweenness centrality, the higher the possibility of affecting the flow of meaning within the semantic connection network. The greater the importance of the mediation role of a specific node with respect to the nodes forming the network, the greater the control which may control the entire network. Even a node with a relatively low degree centrality and closeness centrality is considered to be a central node if the role of betweenness centrality is large (Choi, 2016).
“Eigenvector centrality” measures the influence of each node, and it is possible to find the node with the greatest influence among many nodes. It measures by combining the influence through the degree centrality and the influence of other nodes connected to a single node (Choi, 2016). As an index measuring the influence or importance of a single node in the entire network, if the betweenness centrality is an index which measures the intermediary role between the nodes, the eigenvector centrality may be said to be an index which demonstrates the importance of the other connected nodes (Lee, 2012).
3.3 Cluster analysis
Cluster analysis is an analytical method which measures the similarity of each object, classifies the target group of a high similarity, and identifies the similarity of objects in a cluster and the difference between the objects in different clusters. Objects are classified into multiple exclusive groups according to the characteristics or are clustered based on the distance from the data without assumptions regarding the number or structure of the clusters. By dividing the object into several groups, it is possible to efficiently understand the target group, and it is also utilized for the understanding and targeting of the customer group. There are k-means cluster analysis and hierarchical cluster analysis, which group the observed values based on the similarity such as distance, and the cluster analysis using the latent structure (Yu, 2017). It is a statistical technique which articulates the similarity of the objects associated with a cluster and the difference between the objects associated with different clusters, and is also used to classify the objects with various characteristics into groups in the absence of or unclarified criteria.
3.4 Structural equivalence
Equivalence analysis is to find the equivalence hierarchy of the nodes with similar structural characteristics. If two nodes have the structural equivalence within a network, there is a high probability that they will demonstrate similar aspects in attitude or behavior. Among which, the structural equivalence refers to a state in which two nodes are structurally and completely replaceable by demonstrating similar behaviors in response to being placed in a similar social environment.
By measuring the extent of similarity or difference for the equivalence between the nodes, a matrix can be created and block modeling can be performed based on it. Through the block modeling, the nodes with similar equivalence are grouped into a single block, and the connective relationship within or between the blocks is demonstrated so that the position of the node within the block may be grasped at a glance. Among the block modeling techniques is CONCOR (CONvergence of iterated CORrelations), which finds a matrix of the correlation coefficients for the connective relationship related patterns between the nodes, and binds the group of equivalent nodes. Through which, the blocks of nodes may be identified, and the relationship between the blocks may be identified (Yu, 2017).
3.5 Sentiment analysis
Sentiment analysis is a manner of deriving the subjective feelings of a person who wrote a text on a specific topic, and may classify the author's tendencies towards topics as positive or negative (Yim, 2015). The sentiment analysis can grasp the tone and sentiment of a text through the sentimental keywords linked to the context, so that comments or posts made on specific events may be classified as positive, neutral, or negative to objectively and accurately grasp. In the case of the public sector, services may be improved by identifying the cause or problem of civil complaints, and in the case of a company, it may be effectively used to identify customer reactions for specific products, and infer their preferences inversely (Yu, 2017).
4. Research Contents & Methodology
In this study, an attempt was made to examine and verify the keywords and sentiment regarding public library by using the text data related to public library. The research process carried out to this end is as follows.
4.1 Research contents
This study sought to examine and verify major keywords and sentiments regarding public library through the semantic connection network analysis and the sentiment analysis targeting unstructured big data. The details of the study are as follows.
First, an attempt was made to verify what the main keywords related to the public library are. The main keywords were derived by analyzing the appearance frequency and the TF-IDF value by collecting the public library related texts uploaded through the blog and cafe of the portal site.
Second, an analysis was performed as to the meaning of the main keywords derived through the previous study. The centrality and network structure of the keywords were analyzed, and the structural equivalence and cluster characteristics of the keywords were verified.
Third, the sentiment analysis was performed for the collected texts to verify as to which were the keywords indicating positive and negative perceptions about the public library.
To survey the research contents discussed in the above, a research method using the semantic connection network analysis and the sentiment analysis techniques was designed and conducted. Textom was used as a data collection and analysis tool for conducting the research, and the statistical analysis and visualization of the derived keywords were performed by using Ucinet6 and NetDraw, respectively.
4.2 Data collection
Korean main portal sites, such as Naver and Daum, as well as blogs and cafes were used as main channels for collecting data necessary for this study. Although main channels through which data can be collected include newspaper articles, SNS (Facebook, Tweeter, Instagram, etc.), except for blogs and cafes, they were excluded, because they were not determined to be suitable for this study.
Newspaper articles are writing with an aim to communicate objective facts, in which the use of adjectives expressing emotions is limited. Texts and other contexts including images, clips, etc. are uploaded to SNS on which texts do not account for high portions of posts. It is difficult to understand users' perceptions on and emotions toward public libraries, based on short texts. Meanwhile, blogs and cafes contain higher portions of texts, and a lot of posts with narrative structures were found, so they were determined to be suitable for collecting data necessary for this study.
The period of data collection was set to run from January 1, 2016 until November 30, 2020 to collect the data for the last 5 years. ‘Library’ and ‘Lifelong Learning Center’ were selected as the words of collection. It was verified that public libraries in Korea are operated by local governments and the Office of Education, and the libraries across various parts of the country operated by the Office of Education primarily use the name of ‘Lifelong Learning Center.’ Hence, in this study, the data were collected by setting the group words as ‘Library’ and ‘Lifelong Learning Center’. However, considering the fact that unnecessary data such as lifelong learning and lifelong education unrelated to the public library may be collected and analyzed during the collection process, the data related to the public library were selected by verifying and arranging the collected original text before the analysis was performed.
In the data collection process, a total of 8,258 online posts were collected through the blogs and cafes operated by Naver and Daum. Among which, unnecessary data was filtered out, and a total of 7,063 texts were selected for analysis in this study. As for the analysis data, 3,944 cases of Naver (1951 cases on blogs, and 1993 cases on cafes) and 3,119 cases of Daum (1,407 cases on blogs, and 1712 cases on cafes) were analyzed by collection channel (refer to Table 1).
4.3 Data pre-processing
The data collected were first subjected to the data pre-processing for analysis. As for the data pre-processing, first, the original text of the collected data was verified, unnecessary data unrelated to ‘public library’ were filtered out, and the data were refined through the pre-processing process of the collected text. While noise data such as meaningless characters and special characters, and the stopwords which emerge frequently such as surveys and suffixes but cannot be used for semantic analysis were removed. As for the other extracted data, normalization work was performed to filter out words and phrases unnecessary for analysis, and unify identical words and similar words expressed differently. For instance, if it is expressed as an abbreviated word such as ‘Library Policy Committee,’ it was unified into ‘Library Information Policy Committee,’ and as for ‘Comprehensive Plan,’ it was unified into ‘Comprehensive Library Development Plan,’ while ‘Library Association’ was unified into ‘Korea Library Association’, respectively. Furthermore, the words which are semantically related even if they are not the same were unified to facilitate interpretation. ‘Books’, ‘book’, and ‘Book’ were all unified into ‘book’, and ‘child’, ‘kid’, and ‘children’ were all unified into ‘children’. In addition, the stopwords which are not required for analysis were removed. While the sequence, number, and the personal name, etc., emerge repeatedly and frequently, they do not carry an important meaning, and rather have limited relationships with the other words, and hence, even if they are extracted as nouns, they were removed from the analysis. Such words as ‘library’, which emerged across all documents and which are difficult to find meaningful information in the analysis process, were also processed as the stopwords. However, when referring to a specific type of library, such as ‘public library’, ‘small library’, or ‘specialized library’, it was included for the analysis (refer to Table 2).
4.4 Data analysis
After the data pre-processing, the words in the form of nouns and adjectives were extracted, and the keyword and sentiment analyses were performed. As for the keywords in the form of nouns among the derived words, the TF analysis and the TF-IDF analysis were performed, and the top 50 keywords for public library were derived. As for the top 50 keywords derived as such, a one way mode symmetric matrix was created, then the networks and clusters were analyzed by utilizing Ucinet6 and NetDraw. As for the inter-keyword network, the relationship between keywords was quantified by analyzing the centrality index, and the inter-keyword network was visualized through NetDraw. Furthermore, CONCOR (Convergence of Iterated Correlations) analysis was perform ed to verify the cluster of keywords.
The emotional analysis was conducted based on learning data, and the data were categorized into positive, neutral and negative poles, by constructing them in the analysis process and applying them to it. The learning data were created by the researcher who directly divided each post of the original data into positive, neutral and negative one. Based on the learning data, the emotional frequency and strength by key words and detailed emotions of positive/negative key words were analyzed by conducting the emotional analysis using the adjective forms of words.
5. Research Results
5.1 Results of the keyword frequency analysis related to public library
The top 25 keywords with a high frequency of emergence were derived by analyzing the public library related posts made on the social media and portal sites. The keyword with a highest frequency of emergence were ‘reading’ (6,668 times). Other major keywords with a high frequency of emergence were ‘book’ (4,855 counts), ‘use or usage’ (3,915 counts), ‘story’ (2,933 counts), ‘online’ (2,595 counts), ‘daily life’ (2,580 counts), ‘cafe’ (2,497 counts), ‘people’ (2,333 counts), ‘writing’ (2,299 counts), ‘children’ (2,099 counts), ‘thought’ (1,995 counts), ‘empathy’ (1.473 counts), ‘time’ (1,343 counts), ‘support’ (1,244 counts), ‘COVID-19’ (1.135 counts), ‘writer’ (1.030 counts), ‘program’ (976 counts), ‘mind’ (896 counts), ‘study’ (847 counts), and ‘space’ (802 counts), etc.
As a result of deriving the TF-IDF values for the keywords, it turned out that the keyword with a highest value was ‘reading’ (6245.05), ‘Book’ (6157.06) had an almost identical value as ‘reading’, and it turned out that such keywords as ‘Story’ (4178.26), ‘people’ (3902.24), ‘children’ (3569.74), ‘daily life’ (3455.80), and ‘use or usage’ (3248.42), ‘thought’ (3200.48), ‘writing’ (3007.06), ‘online’ (2824.05), ‘cafe’ (2645.63), ‘empathy’ (2323.39), ‘writer’ (2259.58), ‘time’ (2154.51), ‘program’ (2131.05), ‘COVID-19’ (2123.68), ‘space’ (06.67), ‘small library’ (1928.24), ‘study’ (1898.57), and ‘friend’ (1856.46). As a result of comparing the keywords based on the derived keyword's frequency of emergence and TF-IDF value, ‘reading’, ‘book’, ‘use or usage’, ‘story’, ‘online’, ‘daily life’, ‘people’, ‘writing’, ‘children’, ‘cafe’, ‘thought’, and ‘empathy’ turned out to have a high frequency of emergence and TF-IDF value. As for the other keywords, there was a difference in the ranking in terms of the frequency of emergence and the TF-IDF value, yet there was no difference in the keywords. It may be inferred that the derived keywords have a very important influence on the social perception of the public library (refer to Table 3).
Fig. 1 illustrates the results of the keyword network visualization according to the frequency analysis. The size of each node was adjusted according to the frequency of each keyword, and the strength of the connection between the keywords was expressed by varying the thickness of the line in line with the frequency of concurrent emergence by keyword.
5.2 Results of the centrality analysis of keywords related to public library
The results of surveying the centrality values of the keywords related to public library are as illustrated in Table 4.
The degree centrality is a measure which examines how many relationships a specific keyword forms with the other keywords, and a keyword with a high value of degree centrality is likely to have a high probability of being a core keyword. As a result of verifying the degree centrality, the keywords with the greatest influence turned out to be ‘reading’ (0.058), ‘book’ (0.043), ‘story’ (0.028), ‘people’ (0.023), and ‘daily life’ (0.023), respectively.
The eigenvector centrality is an index which examines and identifies the importance of the other keywords linked to a specific keyword within the network, and even if the degree centrality is low, if the betweenness centrality is high, it will have a high influence within the network. The eigenvector centrality turned out to be high for ‘reading’ (66.598), ‘book’ (60.047), ‘story’ (34.247), ‘people’ (29.221), and ‘daily life’ (28.225).
Other than which, the betweenness centrality is an index which measures the extent to which a specific keyword is located between the other keywords within the network, and if and where the betweenness centrality is high, it plays the role of broker between the other keywords. As a result of the analysis performed, in the case of betweenness centrality, ‘reading’, ‘book’, ‘use or usage’, ‘story’, ‘online’, ‘daily life’, ‘children’, ‘people’, ‘writing’, ‘writer’, ‘life’, ‘mind’, ‘health’, ‘method’, ‘nature’, ‘organization’, ‘space’, ‘registration’, ‘memory’, ‘travel’, ‘free’, ‘culture’, and ‘movie’, etc., turned out to be identical with a value of 0.025, respectively.
The closeness centrality is an index which verifies the distance between keywords by calculating the indirect connections between the keywords, and if the degree centrality verifies the directly connected keywords, the closeness centrality derives centrality through all of the indirectly connected keywords. In the case of closeness centrality, ‘reading’, ‘book’, ‘use or usage’, ‘story’, ‘online’, ‘daily life’, ‘people’, ‘writing’, ‘children’, ‘writer’, ‘mind’, ‘space’, ‘method’, ‘travel’, ‘organization’, ‘memory’, ‘life’, ‘registration’, ‘health’, and ‘nature’ turned out to have high for the keywords, yet no significant difference between the keywords was verified.
As a result of analyzing the centrality value of the keywords related to public library, it was verified that ‘reading’, ‘book’, ‘story’, ‘people’, ‘daily life’, and ‘online’, etc., play the role of core keywords related to public library. The keywords derived turned out to be very similar as with the result of the TF-IDF analysis, and hence, it may be inferred that it is a concept which accounts for the largest weight in forming the social perception towards public library.
5.3 Results of the CONCOR analysis for keywords related to public library
The CONCOR CONvergence of iteration CORrealtion) analysis was performed to verify the structural equivalence of the keywords related to public library. The setting values for the CONCOR analysis are as illustrated in Fig. 2.
As a result of performing the keywords’ CONCOR analysis, it was verified that 7 clusters and 1 independent group were formed (refer to Table 5). Among the clusters formed, it turned out that ‘Cluster 2’ has the keywords related to library service connected, and it was verified that ‘Cluster 3’ has the keywords related to the use of data formed in clusters. In the case of ‘Cluster 4’, many keywords form a cluster, yet among which, a large number of keywords related to ‘reading’ such as ‘reading’, ‘book’, ‘book recommendation’, ‘picture book’, and ‘audio book’ were demonstrated, while ‘Cluster 5’ was verified to have many keywords for the daily life related to the library. Furthermore, ‘Cluster 6’ was verified to have the keywords related to the library’s online services, and ‘Cluster 7’ turned out to have the keywords related to the library programs formed in clusters. The results of the keyword visualization through the CONCOR analysis are as illustrated in Fig. 3.
5.4 Results of the sentiment keyword analysis related to public library
Table 6 illustrates the results of the sentiment analysis of the keywords by utilizing the original text of the public library related posts. Among the keywords, the main keywords which demonstrated only positive sentiment were ‘reading’, ‘neighborhood’, ‘drive through’, ‘remodeling’, ‘free’, ‘shelf’, ‘facility’, ‘start’, ‘chair’, ‘exhibition’, ‘weekend’, ‘region’, ‘coffee’, ‘table’, and ‘convenience’, etc. Neutral keywords were verified to include ‘building’, ‘construction’, ‘sharing’, ‘digital’, ‘appearance’, ‘media’, ‘establishment’, ‘system’, ‘English’, ‘online’, ‘location’, ‘information’, ‘provision’, and ‘computer’, etc., and the main keywords with negative sentiment turned out to be ‘distance’, ‘search’, ‘upgrade’, ‘graffiti’, ‘reading room’, ‘prevention of epidemics’, ‘librarian’, ‘noise’, ‘application’, ‘postpone’, ‘error’, ‘usage guide’, ‘temporarily closed’, ‘small library’, ‘parking lot’, ‘closed’, ‘diffusion’, and ‘temporarily closed’, etc.
In the course of pre-processing the original text of the public library related text, keywords (nouns) and morphemes (adjectives) were extracted to analyze the frequency of emergence and intensity of sentiment related keywords and the detailed sentiment keywords. Table 7 illustrates the results of analyzing the frequency of emergence and sentimental intensity of the sentiment related keywords. Among the top 50 sentimental keywords with a high frequency of emergence, 36 positive keywords and 14 negative keywords emerged. More positive keywords were verified, and as a result of deriving the top 50 keywords with a high frequency of emergence, ‘good’ turned out to have the highest frequency. Other than which, ‘modern’, ‘cool’, ‘comfortable’, ‘disappointed’, ‘cry’, ‘want’, ‘recommend’, ‘difficult’, and ‘new’, etc., were verified to have the highest frequency in their order, respectively.
As a result of verifying the positive keywords according to the sentimental strength of the keywords, ‘convenient’, ‘comfortable’, ‘modern’, ‘fun’, ‘wonderful’, ‘cool’, and ‘beautiful’, etc., were verified to be high, and as for the negative keywords, ‘scary’, ‘cry’, ‘uncomfortable’, ‘strange’, and ‘severe’, etc., emerged.
After classifying the sentimental keywords into the positive and negative, the detailed sentiments were verified based on the frequency and sentimental intensity of each keyword. The positive keywords were sub-classified into the 3 sentiments of ‘like’, ‘interest’, and ‘joy’.2) The keywords which have the most influence on ‘like’ are ‘good’, ‘modern’, ‘comfortable’, ‘cool’, ‘recommend’, ‘pretty’, ‘convenient’, ‘clean’, ‘warm’, and ‘grow’, etc., respectively. The keywords which have much influence on ‘interest’ are ‘want’, ‘new’, ‘wonderful’, ‘interesting’, ‘fun’, ‘original’, ‘impressive’, ‘innovative’, ‘special’, and ‘unique’, etc., respectively. Lastly, the keywords which influence ‘joy’ are ‘grateful’, ‘joyful’, ‘pleasant’, ‘smile’, ‘nice to meet’, ‘best’, ‘funny’, ‘impressed’, ‘happy’, and ‘great work’, etc., respectively (refer to Table 8).
After classifying the detailed sentiments of the negative keywords into ‘disgust’, ‘sadness’, and ‘fear’, etc., the results of verifying the meaningful results from among the detailed words which significantly influence ‘disgust’ turned out to be ‘discomfort’, ‘difficult’, ‘severe’, ‘not attractive’, ‘strange’, ‘burden’, ‘not enough’, ‘hard to understand’, ‘dirty’, and ‘complicated’, etc., respectively. The keywords which have a great influence on the word of ‘sadness’ were ‘cry’, ‘disappointed’, ‘difficult’, ‘hurt’, ‘sad’, ‘pitiful’, ‘regret’, ‘blame self’, ‘apologize’, and ‘somber’, etc., and in the case of ‘fear’, it was verified that the influence was high in the order of ‘scary’, ‘worry’, ‘anxiety’, ‘dizzy’, ‘unstable’, and ‘caution’, etc., respectively (refer to Table 9).
6. Discussions
This study utilized ‘Library’ and ‘Lifelong Learning Center’ as search terms to verify the core keywords and sentiment towards public library, and has collected the public library related texts posted on blogs and cafes of Naver and Daum, the portal sites, for approximately 5 years from January 1, 2016 until November 30, 2020. The frequency of emergence and the TF-IDF values were verified by extracting the data collected through the data pre-processing. Based on the analytical results, the following matters may be discussed.
First, as a result of the keyword frequency and the TF-IDF analysis, it was verified that ‘reading’ and ‘book’ had a higher frequency and TF-IDF values than the other keywords, and the connection between the two keywords was also verified to be very high. Such results may be considered such that the social perception of public library is primarily formed by the keywords of ‘reading’ and ‘book’.
Second, as a result of verifying the keywords with a high frequency and TF-IDF values, excluding the keywords of ‘reading’ and ‘book’ mentioned in the above, the keywords of ‘use or usage’, ‘online’, ‘story’, ‘daily life’, ‘people’, ‘writing’, ‘children’, ‘cafe’, ‘thought’, and ‘empathy’, etc., emerged. Among which, the keyword of ‘use or usage’ has been interpreted to be the contents related to the general use of library, and it is presumed that interest may have been the cause given the operational restrictions since the outbreak of infectious diseases such as ‘COVID-19’,3) as well as ‘online’ and the contactless services.
Third, as a result of performing the sentiment analysis and classifying the keywords into positive, neutral, and negative, the positive sentiment analysis verified ‘reading’, ‘neighborhood’, ‘drive through’, ‘remodeling’, ‘free’, ‘shelf’, ‘facility’, ‘start’, ‘chair’, ‘exhibition’, ‘weekend’, ‘region’, ‘coffee’, ‘table’, and ‘convenience’, etc. As a result of comparing and analyzing the derived keywords with the original text, it was verified that most of the those posting the public library related articles posted their articles on after visit thoughts, or after visiting and using library utilizing their time of leisure, or experiencing the library’s contactless service during the pandemic situation. Public library would be a positive factor in the utilization of public library facilities, services, and individual’s leisure time. Examining ways to develop public library based on positive keywords, it seems that it would be necessary to create a new library’s image which transcends the existing role and image of the library and create continuous services which may enhance the convenience of use.
Fourth, in the case of neutral keywords, ‘building’, ‘construction’, ‘sharing’, ‘digital’, ‘appearance’, ‘media’, ‘establish men’, ‘system’, ‘English’, ‘online’, ‘location’, ‘information’, ‘provision’, and ‘computer’, etc., were verified. It is considered that the users demonstrated neutral sentiments because they did not contain any special sentiment or thoughts in terms of online and offline facilities and those for the library services. In terms of the library facilities, digital media devices, etc., are perceived to be universal facilities by the users, and a low satisfaction with the facilities could cause negative sentiments. It is also necessary to facilitate the deployment of public libraries for new devices which may utilize technologies such as the Internet of Things, AR/VR, and artificial intelligence, while proactively seeking to develop new services in connection thereto.
Fifth, the keywords with negative sentiments towards public library turned out to be ‘distance’, ‘search’, ‘upgrade’, ‘graffiti’, ‘reading room’, ‘prevention’, ‘librarian’, ‘noise’, ‘application’, ‘postpone’, ‘error’, ‘usage guide’, ‘temporarily closed’, ‘small library’, ‘parking lot’, ‘permanently closed’, ‘diffusion’, and ‘closed’, etc. The spread of infectious disease, social distancing, and temporary closure and the permanent closure of libraries, etc., would have affected the negative sentiments. Other negative keywords were derived such as graffiti, usage guide, noise, errors, search, and physical and human inconveniences which may be experienced while using the library. In particular, among the negative keywords, the fact that librarian was derived would require a discussion of greater depth for enhancing the user satisfaction.
Sixth, as a result of analyzing the sentiment related keywords for the public library, the positive keywords towards the library were verified to be ‘convenient’, ‘fun’, ‘comfortable’, ‘wonderful’, and ‘modern’, etc. Based on which, it would be necessary to consider ways to satisfy the user's sentiments of convenience and pleasure in order to increase the positive perception towards public library.
7. Conclusion & Recommendations
In such a rapidly evolving society, public library is faced with various demands as a cultural hub institution. In this study, the semantic connection network and the sentiment analysis were performed by utilizing the texts related to public library. Based on which, an attempt was made to provide the basic data to verify the core keywords and sentiment of public library and propose future directions of public library thereby.
For the study, Naver and Daum, which are major portal sites in Korea, were used as the channels of data collection. The data for the last 5 years were collected from January 1, 2016 until November 30, 2020 by using the keywords of ‘Library’ and ‘Lifelong Learning Center.’ A total of 8,258 counts of data were collected, of which 7,063 counts of data were used for the analysis. The data collected underwent data pre-processing, and the frequency of emergence and TF-IDF value were analyzed to derive the main keywords, and the centrality and network structure of the keywords were analyzed, while the structural equivalence and cluster characteristics of the keywords were verified. Furthermore, the sentiment analysis was performed for the collected texts to verify what keywords indicate positive and negative perceptions towards the public library.
As a result of verifying the frequency of emergence of the keywords and TF-IDF values, ‘reading’, ‘book’, ‘use or usage’, ‘story’, ‘online’, ‘daily life’, ‘people’, ‘writing’, ‘children’, ‘cafe’, ‘thought’, and ‘empathy’, etc., turned out to have a high frequency of appearance and the high TF-IDF values, which suggested that the derived keywords have a very important influence on the social perception of public libraries. As a result of verifying the centrality of the keywords related to the public library, it was verified that ‘reading’, ‘book’, ‘story’, ‘people’, ‘daily life’, and ‘online’, etc., play the role of core keywords in a network structure related to the public library such as degree centrality, eigenvector centrality, betweenness centrality, and closeness centrality, etc. In particular, such keywords as ‘reading’, ‘book’, ‘story’, ‘people’, and ‘daily life’ were derived as important keywords for both the TF-IDF analysis and centrality analysis, and were verified to be a concept which occupies a significant weight in forming the social perception toweards the public library. As a result of performing the keywords’ CONCOR analysis, 7 clusters and 1 independent group were verified. The clusters formed could be classified into the library access and use, services, data use, reading, daily life, online, and program, etc., respectively.
In the course of pre-processing the original text of the public library related text, keywords (nouns) and morphemes (adjectives) were extracted to analyze the frequency of emergence and intensity of sentiment related keywords and the detailed sentiment keywords, and consequently, 36 positive keywords and 14 negative kyewords emerged, where more positive keywords were verified. As for the frequency of emergency, the keywords emerged in the order of ‘good’, ‘modern’, ‘cool’, ‘comfortable’, ‘sorry’, ‘cry’, ‘want’, ‘recommend’, ‘difficult’, and ‘new’, etc., and as a result of verifying the keywords according to the sentimental intensity, the sentimental intensify of ‘convenient’ turned out to be the highest. Other than which, the positive keywords according to the sentimental intensity were verified in the order of ‘convenient’, ‘comfortable’, ‘modern’, ‘fun’, ‘wonderful’, ‘cool’, and ‘beautiful’, etc., and as for the negative keywords, ‘cry’, ‘discomfort’, ‘strange’, and ‘severe’, etc., were verified.
In addition, as a result of sub-classifying the positive keywords into the 3 sentiments of ‘like’, ‘interesting’, and ‘joy’, the keywords which influence the most influence on ‘like’ were verified to be ‘good’, ‘modern’, ‘comfort’, ‘cool’, ‘recommend’, ‘pretty’, ‘convenient’, ‘clean’, ‘warm’, and ‘grow’, etc. The keywords which have a great influence on ‘interest’ turned out in the order of ‘want’, ‘new’, ‘wonderful’, ‘interesting’, ‘fun’, ‘original’, ‘impressive’, ‘innovative’, ‘special’, and ‘unique’, etc. Lastly, the keywords which influence ‘joy’ turned out in the order of ‘grateful’, ‘joyful’, ‘pleasant’, ‘smile’, ‘nice to meet’, ‘best’, ‘funny’, ‘impressive’, ‘happy’, and ‘great job’, etc. The negative keywords were classified into ‘disgust’, ‘sadness’, and ‘fear’, then the detailed sentiments were verified. As a result of verifying the significant results, the keywords which have a large influence on ‘disgust’ were verified to be in the order of ‘inconvenient’, ‘difficult’, ‘severe’, ‘not attractive’, ‘strange’, ‘burden’, ‘not enough’, ‘difficult’, ‘dirty’, and ‘complicated’, etc. The keywords which have a great influence on the word of ‘sadness’ were verified to be ‘cry’, ‘disappointed’, ‘difficult’, ‘hurt’, ‘sad’, ‘pitiful’, ‘regret’, ‘blame self’, ‘apologize’, and ‘somber’, etc., and in the case of ‘fear’, it was verified that the influence was high in the order of ‘scary’, ‘worry’, ‘anxiety’, ‘dizzy’, ‘unstable’, and ‘caution’, etc., respectively.
Based on such analytical results, the social perception of public library was presented as follows.
First, ‘reading’ and ‘book’ turned out to have the higher frequency and TF-IDF values compared to the other keywords, which suggests that the social perception of public library is largely formed with the keywords of ‘reading’ and ‘book’.
Second, as a result of verifying the other keywords with high frequency and TF-IDF value, it was verified that a lot of keywords which assumed interest in the use of library and the contactless services were derived.
Third, as a result of performing the sentiment analysis and classifying the keywords into positive, neutral, and negative, it was verified that public library was a positive factor in terms of the utilization of public library facilities, services, and individual’s leisure time.
Fourth, in terms of the online and offline facilities and the facilities for library services, the users did not have any special feelings or thoughts, and hence, it was verified as a neutral sentiment, and in terms of the library facilities, digital media devices, etc., are considered to be perceived to be universal facilities by the users.
Fifth, the keywords with negative sentiments towards the public library turned out to be ‘distance’, ‘search’, ‘upgrade’, ‘graffiti’, ‘reading room’, ‘prevention’, ‘librarian’, ‘noise’, ‘application’, ‘postpone’, ‘error’, ‘usage guide’, ‘temporarily closed’, ‘small library’, ‘parking lot’, ‘permanently closed’, ‘diffusion’, and ‘temporarily closed’, etc. The spread of infectious disease, social distancing, and temporary closure and the permanent closure of libraries, etc., would have affected the negative sentiments, and the fact that librarian was derived as a negative keyword would require a discussion of greater depth for enhancing the user satisfaction.
On the basis of the results above, the following suggestions can be considered to discuss the orientation of public libraries.
First, social perceptions on public libraries were found to be strongly formed with key words such as ‘reading’ and ‘book’. This suggests that the basic factors such as ‘reading’ and ‘book’ need to be maintained in managing libraries in the future.
Second, it seems to be necessary to create sustainable services for establishing new library images and enhancing the convenience for users, by escaping from the fixed roles and images of the existing libraries, when the developmental plans for public libraries were made based on key words with higher occurrence frequency, such as ‘use’, ‘online’, ‘story’, ‘life’, ‘person’, ‘writing’, ‘child’, ‘cafe’, ‘thought’, ‘empathy’, etc. as well as positive ones, such as ‘reading’, ‘town’, ‘drive through’, ‘remodeling’, ‘free’, ‘bookshelf’, ‘facility’, ‘start’, ‘chair’, ‘exhibition’, ‘weekend’, ‘locality’, ‘coffee’, ‘table’, ‘convenience’, etc.
The key words drawn were analyzed by comparing them with original texts, to show that posters who wrote texts about public libraries were often found to upload postscripts written after visiting or using neighboring public libraries by using free time or experiencing non-contact library services during the pandemic. Public libraries are thought to be positive factors for public library facilities, services and the use of individual free time.
Third, users have neutral emotions about the onlne/offline facilities and equipment for library services, without special thought or feelings. This suggests that users perceive digital media devices in library facilities as general ones and their low satisfaction with them may cause negative emotions. It is thus necessary to actively place new devices for using IoT, AR/VR, A.I., etc. in public libraries and plan to develop new services connected to them.
Fourth, physical/personal complaints, such as graffiti, use guidance, noise, error, search, parking lot, etc., which users can encounter in using libraries were drawn as negative key words. In particular, some negative key words drawn by librarians may need to be more deeply discussed to enhance users' satisfaction.
Fifth, positive key words for libraries were found as follows: ‘convenient’, ‘enjoy’, ‘comfortable’, ‘wonderful’, ‘modernistic’, etc. It is thus necessary to consider how to satisfy users' needs for convenience and enjoyment, in order to increase their positive perceptions on public libraries.
This study attempted to verify key words and emotions about public libraries, by using semantic networks and the emotional analysis technique. It cannot present changes in emotional key words by periods, so future research should verify such changes in perceptions by analyzing social perceptions on public libraries by periods, based on the results of this study. In addition, it is necessary to verify changes in perceptions among interest groups, by analyzing the data with which library stakeholders' opinions can be verified and provide a new vision for managing public libraries, by analyzing groups' preferences for public libraries. through the emotional analysis.
References
- Choi, S. (2016). Network analysis for communication research. Seoul: CommunicationBooks.
- Chang, D. (2020). Perceived Needs of Users toward Public Library Services in Busan. Journal of the Korean Society for Library and Information Science, 54(1), 51-70. [https://doi.org/ 10.4275/KSLIS.2020.54.1.051]
- Ham, Y. J., & Oh, Y. K. (2016). A Study on Color Images and Emotional Evaluation of Them in University Library : Focusing on the Survey of the Situation of the H University Library. Korean Institute Of Interior Design, 24(5), 42-50. [https://doi.org/10.14774/JKIID.2015.24.5.042]
- Kim, S. A., & Kwon, N. (2020). Citizens’ Needs and Perceptions of their Municipal Public Library Services. Journal of the Korean Society for Library and Information Science, 54(2), 29-52. [https://doi.org/ 10.4275/KSLIS.2020.54.2.029]
- Lee, S. S. (2012). Network analysis methods. Seoul: Nonhyungbook.
- Lee, J. H., Lee, M., & Kim, J. W. (2019). A study on Korean language processing using TF-IDF. Journal of information systems, 28(3), 105-121. [https://doi.org/ 10.5859/KAIS.2019.28.3.105]
- Noh, D. J. (2015). A Study on the Emotional Vocabulary Based on Space Assessment of the Academic Library. Journal of the Korean Biblia Society for Library and Information Science, 26(4), 83-104. [https://doi.org/10.14699/kbiblia.2015.26.4.083]
- Noh, Y., & Kim, Y. (2019). A Study on the User Recognition of Library Complex Culture Space. Journal of the Korean Society for Library and Information Science, 53(4), 23-50. [https://doi.org/ 10.4275/KSLIS.2019.53.4.023]
- Pyo, S. H., & Cha, M. (2018). A Comparative Study of the Perceptions on Public Libraries between Librarians and Users: A Survey of the Seoul Metropolitan Office of Education Public Libraries. Journal of the Korean Society for Library and Information Science, 52(2), 221-244. [https://doi.org/ 10.4275/KSLIS.2018.52.2.221]
- Yim, D. (2015). Big data analysis using R, Paju: Jayuacademy.
- Yu, Y. (2017). Analysis of media coverage on 2015 revised curriculum policy using big data analysis. (Unpublished doctoral dissertation, Department of Education), Graduate School of Seoul Nation al University, Seoul.
Younghee Noh has an MA and PhD In Library and Information Science from Yonsei University, Seoul. She has published more than 50 books, including 3 books awarded as Outstanding Academic Books by Ministry of Culture, Sports and Tourism (Government) and more than 120 papers, including one selected as a Featured Article by the Informed Librarian Online in February 2012. She was listed in the Marquis Who’s Who in the World in 2012-2016 and Who’s Who in Science and Engineering in 2016-2017. She received research excellence awards from both Konkuk University (2009) and Konkuk University Alumni (2013) as well as recognition by “the award for Teaching Excellence” from Konkuk University in 2014. She received research excellence awards form ‘Korean Y. Noh and Y. Shin International Journal of Knowledge Content Development & Technology Vol.9, No.3, 75-101 (September 2019) 101 Library and Information Science Society’ in 2014. One of the books she published in 2014, was selected as ‘Outstanding Academic Books’ by Ministry of Culture, Sports and Tourism in 2015. She received the Awards for Professional Excellence as Asia Library Leaders from Satija Research Foundation in Library and Information Science (India) in 2014. She has been a Chief Editor of World Research Journal of Library and Information Science in Mar 2013 ~ Feb 2016. Since 2004, she has been a Professor in the Department of Library and Information Science at Konkuk University, where she teaches courses in Metadata, Digital Libraries, Processing of Internet Information Resources, and Digital Contents.
Dongseok Kim has an MA and Ph. D. In Library and Imgormation Science from Konkun University, in Korea. He has been a Professor in the Department of Library Media Information Science at Daelim University, where he teaches courses in Library Management, Content & Literary property, Database, Information service.